fitcox

Create Cox proportional hazards model

Syntax

``coxMdl = fitcox(X,T)``
``coxMdl = fitcox(X,T,Name,Value)``

Description

The `fitcox` function creates a Cox proportional hazards model for lifetime data. The basic Cox model includes a hazard function h0(t) and model coefficients b such that, for predictor `X`, the hazard rate at time t is

`$h\left({X}_{i},t\right)={h}_{0}\left(t\right)\mathrm{exp}\left[\sum _{j=1}^{p}{x}_{ij}{b}_{j}\right],$`

where the b coefficients do not depend on time. `fitcox` infers both the model coefficients b and the hazard rate h0(t), and stores them as properties in the resulting `CoxModel` object.

The full Cox model includes extensions to the basic model, such as hazards with respect to different baselines or the inclusion of stratification variables. See Extension of Cox Proportional Hazards Model.

````coxMdl = fitcox(X,T)` returns a Cox proportional hazards model object `coxMdl` using the predictor values `X` and event times `T`.```

example

````coxMdl = fitcox(X,T,Name,Value)` modifies the fit using one or more `Name,Value` arguments. For example, when the data includes censoring (values that are not observed), the `Censoring` argument specifies the censored data.```

Examples

collapse all

Weibull random variables with the same shape parameter have proportional hazard rates; see Weibull Distribution. The hazard rate with scale parameter $a$ and shape parameter $b$ at time $t$ is

$\frac{b}{{a}^{b}}{t}^{b-1}$.

Generate pseudorandom samples from the Weibull distribution with scale parameters 1, 5, and 1/3, and with the same shape parameter `B`.

```rng default % For reproducibility B = 2; A = ones(100,1); data1 = wblrnd(A,B); A2 = 5*A; data2 = wblrnd(A2,B); A3 = A/3; data3 = wblrnd(A3,B);```

Create a table of data. The predictors are the three variable types, 1, 2, or 3.

```predictors = categorical([A;2*A;3*A]); data = table(predictors,[data1;data2;data3],'VariableNames',["Predictors" "Times"]);```

Fit a Cox regression to the data.

`mdl = fitcox(data,"Times")`
```mdl = Cox Proportional Hazards regression model: Beta SE zStat pValue _______ _______ _______ __________ Predictors_2 -3.5834 0.33187 -10.798 3.5299e-27 Predictors_3 2.1668 0.20802 10.416 2.0899e-25 ```
`rates = exp(mdl.Coefficients.Beta)`
```rates = 2×1 0.0278 8.7301 ```

Perform a Cox proportional hazards regression on the `lightbulb` data set, which contains simulated lifetimes of light bulbs. The first column of the light bulb data contains the lifetime (in hours) of two different types of bulbs. The second column contains a binary variable indicating whether the bulb is fluorescent or incandescent; 0 indicates the bulb is fluorescent, and 1 indicates it is incandescent. The third column contains the censoring information, where 0 indicates the bulb was observed until failure, and 1 indicates the observation was censored.

Load the `lightbulb` data set.

`load lightbulb`

Fit a Cox proportional hazards model for the lifetime of the light bulbs, accounting for censoring. The predictor variable is the type of bulb.

```coxMdl = fitcox(lightbulb(:,2),lightbulb(:,1), ... 'Censoring',lightbulb(:,3))```
```coxMdl = Cox Proportional Hazards regression model: Beta SE zStat pValue ______ ______ ______ __________ X1 4.7262 1.0372 4.5568 5.1936e-06 ```

Find the hazard rate of incandescent bulbs compared to fluorescent bulbs by evaluating $\mathrm{exp}\left(Beta\right)$.

`hr = exp(coxMdl.Coefficients.Beta)`
```hr = 112.8646 ```

The estimate of the hazard ratio is ${e}^{Beta}$ = 112.8646, which means that the estimated hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs. The small value of `coxMdl.Coefficients.pValue` indicates there is a negligible chance that the two types of light bulbs have identical hazard rates, which would mean `Beta` = 0.

Input Arguments

collapse all

Predictor values, specified as a matrix or table.

• A matrix contains one column for each predictor and one row for each observation.

• A table contains one row for each observation. A table can also contain the time data as well as the predictors.

By default, if the predictor data is in a table, `fitcox` assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, `fitcox` assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the `CategoricalPredictors` name-value argument.

If `X`, `T`, the value of `'Frequency'`, or the value of `'Stratification'` contains `NaN` values, then `fitcox` removes rows with `NaN` values from all data when fitting a Cox model.

Data Types: `double` | `table` | `categorical`

Event times, specified as one of the following:

• Real column vector.

• Real matrix with two columns representing the start and stop times.

• Name of a column in the table `X`.

• Formula in Wilkinson notation for the table `X`. For example, to specify that the table columns `'x'` and `'y'` are in the model, use

`'T ~ x + y'`

For vector or matrix entries, the number of rows of `T` must be the same as the number of rows of `X`.

Use the two-column form of `T` to fit a model with time-varying coefficients. See Cox Proportional Hazards Model with Time-Dependent Covariates.

Data Types: `single` | `double` | `char` | `string`

Name-Value Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: To fit data with censored values `cens`, specify `'Censoring',cens`.

`X` values at which to compute the baseline hazard, specified as a real scalar or row vector. If `Baseline` is a row vector, its length is the number of predictors, so there is one baseline for each predictor.

The default baseline for continuous predictors is `mean(X)`, so the default hazard rate at `X` for these predictors is `h(t)*exp((X – mean(X))*b)`. The default baseline for categorical predictors is `0`. Enter `0` to compute the baseline for all predictors relative to 0, so the hazard rate at `X` is `h(t)*exp(X*b)`. Changing the baseline changes the hazard ratio, but does not affect the coefficient estimates.

For the identified categorical predictors, `fitcox` creates dummy variables. `fitcox` creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: `'Baseline',0`

Data Types: `double`

Coefficient initial values, specified as a numeric vector of coefficient values. These values initiate the likelihood maximization iterations performed by `fitcox`.

Data Types: `double`

Categorical predictors list, specified as one of the values in this table.

ValueDescription
Vector of positive integersEach entry in the vector is an index value corresponding to the column of the predictor data (`X`) that contains a categorical variable.
Logical vectorA `true` entry means that the corresponding column of predictor data (`X`) is a categorical variable.
Character matrixEach row of the matrix is the name of a predictor variable in the table `X`. The names must match the entries in `PredictorNames`. Pad the names with extra blanks so each row of the character matrix has the same length.
String array or cell array of character vectorsEach element in the array is the name of a predictor variable in the table `X`. The names must match the entries in `PredictorNames`.
`'all'`All predictors are categorical.

By default, if the predictor data is in a table, `fitcox` assumes that a variable is categorical if it is a logical vector, categorical vector, character array, string array, or cell array of character vectors. If the predictor data is a matrix, `fitcox` assumes that all predictors are continuous. To identify any other predictors as categorical predictors, specify them by using the `'CategoricalPredictors'` name-value argument.

For the identified categorical predictors, `fitcox` creates dummy variables. `fitcox` creates one less dummy variable than the number of categories. For details, see Automatic Creation of Dummy Variables.

Example: `'CategoricalPredictors','all'`

Data Types: `single` | `double` | `logical` | `char` | `string` | `cell`

Indicator for censoring, specified as a Boolean vector with the same number of rows as `X` or the name of a column in the table `X`. Use 1 for observations that are right censored and 0 for observations that are fully observed. By default, all observations are fully observed. For an example, see Cox Proportional Hazards Model for Censored Data.

Example: `'Censoring',cens`

Data Types: `logical`

Frequency or weights of observations, specified as an array the same size as `T` containing nonnegative scalar values. The array can contain integer values corresponding to frequencies of observations or nonnegative values corresponding to observation weights.

The default is 1 per row of `X` and `T`.

If `X`, `T`, the value of `'Frequency'`, or the value of `'Stratification'` contains `NaN` values, then `fitcox` removes rows with `NaN` values from all data when fitting a Cox model.

Example: `'Frequency',w`

Data Types: `double`

Algorithm control parameters for the iterative algorithm `fitcox` uses to estimate the solution, specified as a structure. Create this structure using `statset`. For parameter names and default values, see the following table or enter `statset('fitcox')`.

In the table, "termination tolerance" means that if the internal iterations cause a change in the stated value less than the tolerance, the iterations stop.

Field in StructureDescriptionValues
`Display`Amount of information returned to the command line
• `'off'` — None (default)

• `'final'` — Final output

• `'iter'` — Output at each iteration

`MaxFunEvals`Maximum number of function evaluationsPositive integer; default is `200`
`MaxIter`Maximum number of iterationsPositive integer; default is `100`
`TolFun`Termination tolerance on change in likelihood; see Cox Proportional Hazards ModelPositive scalar; default is `1e-8`
`TolX`Termination tolerance for parameter (predictor estimate) changePositive scalar; default is `1e-8`

Example: `'OptimizationOptions',statset('TolX',1e-6,'MaxIter',200)`

Predictor variable names, specified as a string array of unique names or cell array of unique character vectors. The functionality of `'PredictorNames'` depends on how you supply the training data.

• If you supply `X` as a numeric array, then you can use `'PredictorNames'` to assign names to the predictor variables in `X`.

• The order of the names in `PredictorNames` must correspond to the column order of `X`. That is, `PredictorNames{1}` is the name of `X(:,1)`, `PredictorNames{2}` is the name of `X(:,2)`, and so on. Also, `size(X,2)` and `numel(PredictorNames)` must be equal.

• By default, `PredictorNames` is `{'X1','X2',...}`.

• If you supply `X` as a table, then you can use `'PredictorNames'` to choose which predictor variables to use in training. That is, `fitcox` uses only the predictor variables in `PredictorNames` and the time variable during training.

• `PredictorNames` must be a subset of `X.Properties.VariableNames` and cannot include the name of the time variable `T`.

• By default, `PredictorNames` contains the names of all predictor variables.

• Specify the predictors for training using either `'PredictorNames'` or a formula in Wilkinson notation, but not both.

Example: `'PredictorNames',{'Sex','Age','Weight','Smoker'}`

Data Types: `string` | `cell`

Stratification variables, specified as a matrix of real values, the name of a column in table `X`, or an array of categorical variables. The matrix must have the same number of rows as `T`, with each row corresponding to an observation.

The default `[]` is no stratification variable.

If `X`, `T`, the value of `'Frequency'`, or the value of `'Stratification'` contains `NaN` values, then `fitcox` removes rows with `NaN` values from all data when fitting a Cox model.

Example: `'Stratification',Gender`

Data Types: `single` | `double` | `char` | `string` | `categorical`

Method to handle tied failure times, specified as `'breslow'` (Breslow's method) or `'efron'` (Efron's method). See Partial Likelihood Function for Tied Events.

Example: `'TieBreakMethod','efron'`

Data Types: `char` | `string`

Introduced in R2021a