Stepwise regression


b = stepwisefit(X,y)
[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(...)
[...] = stepwisefit(X,y,param1,val1,param2,val2,...)


b = stepwisefit(X,y) uses a stepwise method to perform a multilinear regression of the response values in the n-by-1 vector y on the p predictive terms in the n-by-p matrix X. Distinct predictive terms should appear in different columns of X.

b is a p-by-1 vector of estimated coefficients for all of the terms in X. The stepwisefit function calculates the coefficient estimate values in b as follows:

  • If a term is not in the final model, then the corresponding coefficient estimate in b results from adding only that term to the predictors in the final model.

  • If a term is in the final model, then the coefficient estimate in b for that term is a result of the final model, that is stepwise does not consider the terms it excluded from the model while computing these values.


stepwisefit automatically includes a constant term in all models. Do not enter a column of 1s directly into X.

stepwisefit treats NaN values in either X or y as missing values, and ignores them.

[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(...) returns the following additional information:

  • se — A vector of standard errors for b

  • pval — A vector of p-values for testing whether elements of b are 0

  • inmodel — A logical vector, with length equal to the number of columns in X, specifying which terms are in the final model

  • stats — A structure of additional statistics with the following fields. All statistics pertain to the final model except where noted.

    • source — The character vector 'stepwisefit'

    • dfe — Degrees of freedom for error

    • df0 — Degrees of freedom for the regression

    • SStotal — Total sum of squares of the response

    • SSresid — Sum of squares of the residuals

    • fstatF-statistic for testing the final model vs. no model (mean only)

    • pvalp value of the F-statistic

    • rmse — Root mean square error

    • xr — Residuals for predictors not in the final model, after removing the part of them explained by predictors in the model

    • yr — Residuals for the response using predictors in the final model

    • B — Coefficients for terms in final model, with values for a term not in the model set to the value that would be obtained by adding that term to the model

    • SE — Standard errors for coefficient estimates

    • TSTATt statistics for coefficient estimates

    • PVALp-values for coefficient estimates

    • intercept — Estimated intercept

    • wasnan — Indicates which rows in the data contained NaN values

  • nextstep — The recommended next step—either the index of the next term to move in or out of the model, or 0 if no further steps are recommended

  • history — Structure containing information on steps taken, with the following fields:

    • B — Matrix of regression coefficients, where each column is one step, and each row is one coefficient.

    • rmse — Root mean square errors for the model at each step.

    • df0 — Degrees of freedom for the regression at each step.

    • in — Logical array indicating which predictors are in the model at each step, where each row is one step, and each column is one predictor.

[...] = stepwisefit(X,y,param1,val1,param2,val2,...) specifies one or more of the name/value pairs described in the following table.


A logical vector specifying terms to include in the initial fit. The default is to specify no terms.


The maximum p value for a term to be added. The default is 0.05.


The minimum p value for a term to be removed. The default is the maximum of the value of 'penter' and 0.10.


'on' displays information about each step in the command window. This is the default.

'off' omits the display.


The maximum number of steps in the regression. The default is Inf.


A logical vector specifying terms to keep in their initial state. The default is to specify no terms.


'on' centers and scales each column of X (computes z-scores) before fitting.

'off' does not scale the terms. This is the default.


Load the data in hald.mat, which contains observations of the heat of reaction of various cement mixtures:

load hald
  Name          Size    Bytes   Class   Attributes

  Description   22x58   2552    char
  hald          13x5     520    double
  heat          13x1     104    double
  ingredients   13x4     416    double

The response (heat) depends on the quantities of the four predictors (the columns of ingredients).

Use stepwisefit to carry out the stepwise regression algorithm, beginning with no terms in the model and using entrance/exit tolerances of 0.05/0.10 on the p-values:

Initial columns included:  none
Step 1, added column 4, p=0.000576232
Step 2, added column 1, p=1.10528e-006
Final columns included:  1 4 
    'Coeff'      'Std.Err.'    'Status'    'P'          
    [ 1.4400]    [  0.1384]    'In'        [1.1053e-006]
    [ 0.4161]    [  0.1856]    'Out'       [     0.0517]
    [-0.4100]    [  0.1992]    'Out'       [     0.0697]
    [-0.6140]    [  0.0486]    'In'        [1.8149e-007]

stepwisefit automatically includes an intercept term in the model, so you do not add it explicitly to ingredients as you would for regress. For terms not in the model, coefficient estimates and their standard errors are those that result by adding the corresponding term to the final model.

The inmodel parameter is used to specify terms in an initial model:

initialModel = ...
           [false true false false]; % Force in 2nd term
Initial columns included:  2 
Step 1, added column 1, p=2.69221e-007
Final columns included:  1 2 
    'Coeff'      'Std.Err.'    'Status'    'P'          
    [ 1.4683]    [  0.1213]    'In'        [2.6922e-007]
    [ 0.6623]    [  0.0459]    'In'        [5.0290e-008]
    [ 0.2500]    [  0.1847]    'Out'       [     0.2089]
    [-0.2365]    [  0.1733]    'Out'       [     0.2054]

The preceding two models, built from different initial models, use different subsets of the predictive terms. Terms 2 and 4, swapped in the two models, are highly correlated:

term2 = ingredients(:,2);
term4 = ingredients(:,4);
R = corrcoef(term2,term4)
R =
    1.0000   -0.9730
   -0.9730    1.0000

To compare the models, use the stats output of stepwisefit:

[betahat1,se1,pval1,inmodel1,stats1] = ...
[betahat2,se2,pval2,inmodel2,stats2] = ...
RMSE1 = stats1.rmse
RMSE2 = stats2.rmse

The second model has a lower Root Mean Square Error (RMSE).


Stepwise regression is a systematic method for adding and removing terms from a multilinear model based on their statistical significance in a regression. The method begins with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:

  1. Fit the initial model.

  2. If any terms not in the model have p-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest p value and repeat this step; otherwise, go to step 3.

  3. If any terms in the model have p-values greater than an exit tolerance (that is, if it is unlikely that the hypothesis of a zero coefficient can be rejected), remove the one with the largest p value and go to step 2; otherwise, end.

Depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but may not be globally optimal.


[1] Draper, N. R., and H. Smith. Applied Regression Analysis. Hoboken, NJ: Wiley-Interscience, 1998. pp. 307–312.

Introduced before R2006a