fitcox

Create Cox proportional hazards model

Since R2021a

Syntax

coxMdl = fitcox(X,T)

coxMdl = fitcox(X,T,Name,Value)

Description

The fitcox function creates a Cox proportional hazards model for lifetime data. The basic Cox model includes a hazard function h₀(t) and model coefficients b such that, for predictor X, the hazard rate at time t is

$h (X_{i}, t) = h_{0} (t) \exp [\sum_{j = 1}^{p} x_{i j} b_{j}],$

where the b coefficients do not depend on time. fitcox infers both the model coefficients b and the hazard rate h₀(t), and stores them as properties in the resulting CoxModel object.

The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.

coxMdl = fitcox(X,T) returns a Cox proportional hazards model object coxMdl using the predictor values X and event times T.

coxMdl = fitcox(X,T,Name,Value) modifies the fit using one or more Name,Value arguments. For example, when the data includes censoring (values that are not observed), the Censoring argument specifies the censored data.

example

Examples

collapse all

Estimate Cox Proportional Hazard Regression

Open Live Script

Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter $a$ and shape parameter $b$ at time $t$ is

$\frac{b}{a^{b}} t^{b - 1}$ .

Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter B.

rng default % For reproducibility
B = 2;
A = ones(100,1);
data1 = wblrnd(A,B);
A2 = 5*A;
data2 = wblrnd(A2,B);
A3 = A/3;
data3 = wblrnd(A3,B);

Create a table of data. The predictors are the three variable types, 1, 2, or 3.

predictors = categorical([A;2*A;3*A]);
data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);

Fit a Cox regression to the data.

mdl = fitcox(data,"Times")

mdl = 
Cox Proportional Hazards regression model

                     Beta        SE        zStat       pValue  
                    _______    _______    _______    __________

    Predictors_2    -3.5834    0.33187    -10.798    3.5299e-27
    Predictors_3     2.1668    0.20802     10.416    2.0899e-25


Log-likelihood: -1197.917

rates = exp(mdl.Coefficients.Beta)

rates = 2×1

    0.0278
    8.7301

Fit Cox Proportional Hazards Model to Lifetime Data

Open Live Script

Perform a Cox proportional hazards regression on the lightbulb data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.

Load the lightbulb data set.

load lightbulb

Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb.

coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ...
    'Censoring',lightbulb(:,3))

coxMdl = 
Cox Proportional Hazards regression model

           Beta       SE      zStat       pValue  
          ______    ______    ______    __________

    X1    4.7262    1.0372    4.5568    5.1936e-06


Log-likelihood: -212.638

Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating $\exp (B e t a)$ .

hr = exp(coxMdl.Coefficients.Beta)

hr = 
112.8646

The estimate of the hazard ratio is $e^{B e t a}$ = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of coxMdl.Coefficients.pValue indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean Beta = 0.

Input Arguments

collapse all

`X` — Predictor values
matrix | table

Predictor values, specified as a matrix or table.

A matrix contains one column for each predictor and one row for each observation.
A table contains one row for each observation. A table can also contain the time data as well as the predictors.

By default, if the predictor data is in a table, fitcox assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, fitcox assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the CategoricalPredictors name-value argument.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Data Types: double | table | categorical

`T` — Event times
real column vector | real matrix with two columns | name of column in table `X` | formula in Wilkinson notation for table `X`

Event times, specified as one of the following:

Real column vector.
Real matrix with two columns representing the start and stop times.
Name of a column in the table X.
Formula in Wilkinson notation for the table X. For example, to specify that the table columns 'x' and 'y' are in the model, use
'T ~ x + y'
See Wilkinson Notation.

For vector or matrix entries, the number of rows of T must be the same as the number of rows of X.

Use the two-column form of T to fit a model with time-varying coefficients. See Cox Proportional Hazards Model with Time-Dependent Covariates.

Data Types: single | double | char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: To fit data with censored values cens, specify 'Censoring',cens.

`Baseline` — `X` values at which to compute baseline hazard
`mean(X)`, the default for continuous predictors | `0`, the default for categorical predictors | real scalar | real row vector

X values at which to compute the baseline hazard, specified as a real scalar or row vector. If Baseline is a row vector, its length is the number of predictors, so there is one baseline for each predictor.

The default baseline for continuous predictors is mean(X), so the default hazard rate at X for these predictors is h(t)*exp((X – mean(X))*b). The default baseline for categorical predictors is 0. Enter 0 to compute the baseline for all predictors relative to 0, so the hazard rate at X is h(t)*exp(X*b). Changing the baseline changes the hazard ratio, but does not affect the coefficient estimates.

For the identified categorical predictors, fitcox creates dummy variables. fitcox creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: 'Baseline',0

Data Types: double

`Beta` — Coefficient initial values
`0.01/std(X)` (default) | numeric vector

Coefficient initial values, specified as a numeric vector of coefficient values. These values initiate the likelihood maximization iterations performed by fitcox.

Data Types: double

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `'all'`

Categorical predictors list, specified as one of the values in this table.

Value	Description
Vector of positive integers	Each entry in the vector is an index value corresponding to the column of the predictor data (`X`) that contains a categorical variable.
Logical vector	A `true` entry means that the corresponding column of predictor data (`X`) is a categorical variable.
Character matrix	Each row of the matrix is the name of a predictor variable in the table `X`. The names must match the entries in `PredictorNames`. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectors	Each element in the array is the name of a predictor variable in the table `X`. The names must match the entries in `PredictorNames`.
`'all'`	All predictors are categorical.

Example: 'CategoricalPredictors','all'

`Censoring` — Indicator for censoring
array of 0s (default) | array of 0s and 1s | name of a column in table `X`

Indicator for censoring, specified as a Boolean vector with the same number of rows as X or the name of a column in the table X. Use 1 for observations that are right censored and 0 for observations that are fully observed. By default, all observations are fully observed. For an example, see Cox Proportional Hazards Model for Censored Data.

Example: 'Censoring',cens

Data Types: logical

`Frequency` — Frequency or weights of observations
array of 1s (default) | vector of nonnegative scalar values

Frequency or weights of observations, specified as an array the same size as T containing nonnegative scalar values. The array can contain integer values corresponding to frequencies of observations or nonnegative values corresponding to observation weights.

The default is 1 per row of X and T.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Example: 'Frequency',w

Data Types: double

`OptimizationOptions` — Algorithm control parameters
structure

Algorithm control parameters for the iterative algorithm fitcox uses to estimate the solution, specified as a structure. Create this structure using statset. For parameter names and default values, see the following table or enter statset('fitcox').

In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop.

Field in Structure	Description	Values
`Display`	Amount of information returned to the command line	`'off'` — None (default) `'final'` — Final output `'iter'` — Output at each iteration
`MaxFunEvals`	Maximum number of function evaluations	Positive integer; default is `200`
`MaxIter`	Maximum number of iterations	Positive integer; default is `100`
`TolFun`	Termination tolerance on change in likelihood; see Cox Proportional Hazards Model	Positive scalar; default is `1e-8`
`TolX`	Termination tolerance for parameter (predictor estimate) change	Positive scalar; default is `1e-8`

Example: 'OptimizationOptions',statset('TolX',1e-6,'MaxIter',200)

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of 'PredictorNames' depends on how you supply the training data.

If you supply X as a numeric array, then you can use 'PredictorNames' to assign names to the predictor variables in X.
- The order of the names in PredictorNames must correspond to the column order of X. That is, PredictorNames{1} is the name of X(:,1), PredictorNames{2} is the name of X(:,2), and so on. Also, size(X,2) and numel(PredictorNames) must be equal.
- By default, PredictorNames is {'X1','X2',...}.
If you supply X as a table, then you can use 'PredictorNames' to choose which predictor variables to use in training. That is, fitcox uses only the predictor variables in PredictorNames and the time variable during training.
- PredictorNames must be a subset of X.Properties.VariableNames and cannot include the name of the time variable T.
- By default, PredictorNames contains the names of all predictor variables.
- Specify the predictors for training using either 'PredictorNames' or a formula in Wilkinson notation, but not both.

Example: 'PredictorNames',{'Sex','Age','Weight','Smoker'}

Data Types: string | cell

`Stratification` — Stratification variables
`[]` (default) | matrix of real values | name of column in table `X` | array of categorical variables

Stratification variables, specified as a matrix of real values, the name of a column in table X, or an array of categorical variables. The matrix must have the same number of rows as T, with each row corresponding to an observation.

The default [] is no stratification variable.

If X, T, the value of 'Frequency', or the value of 'Stratification' contains NaN values, then fitcox removes rows with NaN values from all data when fitting a Cox model.

Example: 'Stratification',Gender

Data Types: single | double | char | string | categorical

`TieBreakMethod` — Method to handle tied failure times
`'breslow'` (default) | `'efron'`

Method to handle tied failure times, specified as 'breslow' (Breslow's method) or 'efron' (Efron's method). See Partial Likelihood Function for Tied Events.

Example: 'TieBreakMethod','efron'

Data Types: char | string

Version History

Introduced in R2021a

fitcox

Syntax

Description

Examples

Estimate Cox Proportional Hazard Regression

Fit Cox Proportional Hazards Model to Lifetime Data

Input Arguments

`X` — Predictor values
matrix | table

`T` — Event times
real column vector | real matrix with two columns | name of column in table `X` | formula in Wilkinson notation for table `X`

Name-Value Arguments

`Baseline` — `X` values at which to compute baseline hazard
`mean(X)`, the default for continuous predictors | `0`, the default for categorical predictors | real scalar | real row vector

`Beta` — Coefficient initial values
`0.01/std(X)` (default) | numeric vector

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `'all'`

`Censoring` — Indicator for censoring
array of 0s (default) | array of 0s and 1s | name of a column in table `X`

`Frequency` — Frequency or weights of observations
array of 1s (default) | vector of nonnegative scalar values

`OptimizationOptions` — Algorithm control parameters
structure

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

`Stratification` — Stratification variables
`[]` (default) | matrix of real values | name of column in table `X` | array of categorical variables

`TieBreakMethod` — Method to handle tied failure times
`'breslow'` (default) | `'efron'`

Version History

See Also

Topics

fitcox

Syntax

Description

Examples

Estimate Cox Proportional Hazard Regression

Fit Cox Proportional Hazards Model to Lifetime Data

Input Arguments

X — Predictor values matrix | table

T — Event times real column vector | real matrix with two columns | name of column in table X | formula in Wilkinson notation for table X

Name-Value Arguments

Baseline — X values at which to compute baseline hazard mean(X), the default for continuous predictors | 0, the default for categorical predictors | real scalar | real row vector

Beta — Coefficient initial values 0.01/std(X) (default) | numeric vector

CategoricalPredictors — Categorical predictors list vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all'

Censoring — Indicator for censoring array of 0s (default) | array of 0s and 1s | name of a column in table X

Frequency — Frequency or weights of observations array of 1s (default) | vector of nonnegative scalar values

OptimizationOptions — Algorithm control parameters structure

PredictorNames — Predictor variable names string array of unique names | cell array of unique character vectors

Stratification — Stratification variables [] (default) | matrix of real values | name of column in table X | array of categorical variables

TieBreakMethod — Method to handle tied failure times 'breslow' (default) | 'efron'

Version History

See Also

Topics

`X` — Predictor values
matrix | table

`T` — Event times
real column vector | real matrix with two columns | name of column in table `X` | formula in Wilkinson notation for table `X`

`Baseline` — `X` values at which to compute baseline hazard
`mean(X)`, the default for continuous predictors | `0`, the default for categorical predictors | real scalar | real row vector

`Beta` — Coefficient initial values
`0.01/std(X)` (default) | numeric vector

`CategoricalPredictors` — Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | `'all'`

`Censoring` — Indicator for censoring
array of 0s (default) | array of 0s and 1s | name of a column in table `X`

`Frequency` — Frequency or weights of observations
array of 1s (default) | vector of nonnegative scalar values

`OptimizationOptions` — Algorithm control parameters
structure

`PredictorNames` — Predictor variable names
string array of unique names | cell array of unique character vectors

`Stratification` — Stratification variables
`[]` (default) | matrix of real values | name of column in table `X` | array of categorical variables

`TieBreakMethod` — Method to handle tied failure times
`'breslow'` (default) | `'efron'`