# RegressionLinear class

Linear regression model for high-dimensional data

## Description

`RegressionLinear`

is a trained linear model object for regression; the linear model is a support vector machine regression (SVM) or linear regression model. `fitrlinear`

fits a `RegressionLinear`

model by minimizing the objective function using techniques that reduce computation time for high-dimensional data sets (e.g., stochastic gradient descent). The regression loss plus the regularization term compose the objective function.

Unlike other regression models, and for economical memory usage, `RegressionLinear`

model objects do not store the training data. However, they do store, for example, the estimated linear model coefficients, estimated coefficients, and the regularization strength.

You can use trained `RegressionLinear`

models to predict responses for new data. For details, see `predict`

.

## Construction

Create a `RegressionLinear`

object by using `fitrlinear`

.

## Properties

**Linear Regression Properties**

`Epsilon`

— Half of width of epsilon-insensitive band

nonnegative scalar

Half of the width of the epsilon insensitive band, specified as a nonnegative scalar.

If `Learner`

is not `'svm'`

, then `Epsilon`

is an empty array (`[]`

).

**Data Types: **`single`

| `double`

`Lambda`

— Regularization term strength

nonnegative scalar | vector of nonnegative values

Regularization term strength, specified as a nonnegative scalar or vector of nonnegative values.

**Data Types: **`double`

| `single`

`Learner`

— Linear regression model type

`'leastsquares'`

| `'svm'`

Linear regression model type, specified as `'leastsquares'`

or `'svm'`

.

In this table, $$f\left(x\right)=x\beta +b.$$

*β*is a vector of*p*coefficients.*x*is an observation from*p*predictor variables.*b*is the scalar bias.

Value | Algorithm | Loss Function | `FittedLoss` Value |
---|---|---|---|

`'svm'` | Support vector machine regression | Epsilon insensitive: $$\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,\left|y-f\left(x\right)\right|-\epsilon \right]$$ | `'epsiloninsensitive'` |

`'leastsquares'` | Linear regression through ordinary least squares | Mean squared error (MSE): $$\ell \left[y,f\left(x\right)\right]=\frac{1}{2}{\left[y-f\left(x\right)\right]}^{2}$$ | `'mse'` |

`Beta`

— Linear coefficient estimates

numeric vector

Linear coefficient estimates, specified as a numeric vector with length equal to the number of predictors.

**Data Types: **`double`

`Bias`

— Estimated bias term

numeric scalar

Estimated bias term or model intercept, specified as a numeric scalar.

**Data Types: **`double`

`FittedLoss`

— Loss function used to fit the linear model

`'epsiloninsensitive'`

| `'mse'`

Loss function used to fit the model, specified as `'epsiloninsensitive'`

or `'mse'`

.

Value | Algorithm | Loss Function | `Learner` Value |
---|---|---|---|

`'epsiloninsensitive'` | Support vector machine regression | Epsilon insensitive: $$\ell \left[y,f\left(x\right)\right]=\mathrm{max}\left[0,\left|y-f\left(x\right)\right|-\epsilon \right]$$ | `'svm'` |

`'mse'` | Linear regression through ordinary least squares | Mean squared error (MSE): $$\ell \left[y,f\left(x\right)\right]=\frac{1}{2}{\left[y-f\left(x\right)\right]}^{2}$$ | `'leastsquares'` |

`Regularization`

— Complexity penalty type

`'lasso (L1)'`

| `'ridge (L2)'`

Complexity penalty type, specified as `'lasso (L1)'`

or ```
'ridge
(L2)'
```

.

The software composes the objective function for minimization from the sum of the average loss
function (see `FittedLoss`

) and a regularization value from this
table.

Value | Description |
---|---|

`'lasso (L1)'` | Lasso (L_{1}) penalty: $$\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}$$ |

`'ridge (L2)'` | Ridge (L_{2}) penalty: $$\frac{\lambda}{2}{\displaystyle \sum _{j=1}^{p}{\beta}_{j}^{2}}$$ |

*λ* specifies the regularization term
strength (see `Lambda`

).

The software excludes the bias term (*β*_{0})
from the regularization penalty.

**Other Regression Properties**

`CategoricalPredictors`

— Categorical predictor indices

vector of positive integers | `[]`

Categorical predictor
indices, specified as a vector of positive integers. `CategoricalPredictors`

contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and `p`

, where `p`

is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty (`[]`

).

**Data Types: **`single`

| `double`

`ModelParameters`

— Parameters used for training model

structure

Parameters used for training the `RegressionLinear`

model, specified as a structure.

Access fields of `ModelParameters`

using dot notation. For example, access
the relative tolerance on the linear coefficients and the bias term by using
`Mdl.ModelParameters.BetaTolerance`

.

**Data Types: **`struct`

`PredictorNames`

— Predictor names

cell array of character vectors

Predictor names in order of their appearance in the predictor data, specified as a
cell array of character vectors. The length of `PredictorNames`

is
equal to the number of variables in the training data `X`

or
`Tbl`

used as predictor variables.

**Data Types: **`cell`

`ExpandedPredictorNames`

— Expanded predictor names

cell array of character vectors

Expanded predictor names, specified as a cell array of character vectors.

If the model uses encoding for categorical variables, then
`ExpandedPredictorNames`

includes the names that describe the
expanded variables. Otherwise, `ExpandedPredictorNames`

is the same as
`PredictorNames`

.

**Data Types: **`cell`

`ResponseName`

— Response variable name

character vector

Response variable name, specified as a character vector.

**Data Types: **`char`

`ResponseTransform`

— Response transformation function

`'none'`

| function handle

Response transformation function, specified as `'none'`

or a function handle.
`ResponseTransform`

describes how the software transforms raw
response values.

For a MATLAB^{®} function or a function that you define, enter its function handle. For
example, you can enter ```
Mdl.ResponseTransform =
@
```

, where
*function*

accepts a numeric vector of the
original responses and returns a numeric vector of the same size containing the
transformed responses.*function*

**Data Types: **`char`

| `function_handle`

## Object Functions

`incrementalLearner` | Convert linear regression model to incremental learner |

`lime` | Local interpretable model-agnostic explanations (LIME) |

`loss` | Regression loss for linear regression models |

`partialDependence` | Compute partial dependence |

`plotPartialDependence` | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |

`predict` | Predict response of linear regression model |

`selectModels` | Select fitted regularized linear regression models |

`shapley` | Shapley values |

`update` | Update model parameters for code generation |

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

## Examples

### Train Linear Regression Model

Train a linear regression model using SVM, dual SGD, and ridge regularization.

Simulate 10000 observations from this model

$$y={x}_{100}+2{x}_{200}+e.$$

$$X={x}_{1},...,{x}_{1000}$$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

*e*is random normal error with mean 0 and standard deviation 0.3.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Train a linear regression model. By default, `fitrlinear`

uses support vector machines with a ridge penalty, and optimizes using dual SGD for SVM. Determine how well the optimization algorithm fit the model to the data by extracting a fit summary.

[Mdl,FitInfo] = fitrlinear(X,Y)

Mdl = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0056 Lambda: 1.0000e-04 Learner: 'svm' Properties, Methods

`FitInfo = `*struct with fields:*
Lambda: 1.0000e-04
Objective: 0.2725
PassLimit: 10
NumPasses: 10
BatchLimit: []
NumIterations: 100000
GradientNorm: NaN
GradientTolerance: 0
RelativeChangeInBeta: 0.4907
BetaTolerance: 1.0000e-04
DeltaGradient: 1.5816
DeltaGradientTolerance: 0.1000
TerminationCode: 0
TerminationStatus: {'Iteration limit exceeded.'}
Alpha: [10000x1 double]
History: []
FitTime: 0.0810
Solver: {'dual'}

`Mdl`

is a `RegressionLinear`

model. You can pass `Mdl`

and the training or new data to `loss`

to inspect the in-sample mean-squared error. Or, you can pass `Mdl`

and new predictor data to `predict`

to predict responses for new observations.

`FitInfo`

is a structure array containing, among other things, the termination status (`TerminationStatus`

) and how long the solver took to fit the model to the data (`FitTime`

). It is good practice to use `FitInfo`

to determine whether optimization-termination measurements are satisfactory. In this case, `fitrlinear`

reached the maximum number of iterations. Because training time is fast, you can retrain the model, but increase the number of passes through the data. Or, try another solver, such as LBFGS.

### Predict Responses Using Linear Regression Model

Simulate 10000 observations from this model

$$y={x}_{100}+2{x}_{200}+e.$$

$$X=\{{x}_{1},...,{x}_{1000}\}$$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

*e*is random normal error with mean 0 and standard deviation 0.3.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Hold out 5% of the data.

rng(1); % For reproducibility cvp = cvpartition(n,'Holdout',0.05)

cvp = Hold-out cross validation partition NumObservations: 10000 NumTestSets: 1 TrainSize: 9500 TestSize: 500

`cvp`

is a `CVPartition`

object that defines the random partition of *n* data into training and test sets.

Train a linear regression model using the training set. For faster training time, orient the predictor data matrix so that the observations are in columns.

idxTrain = training(cvp); % Extract training set indices X = X'; Mdl = fitrlinear(X(:,idxTrain),Y(idxTrain),'ObservationsIn','columns');

Predict observations and the mean squared error (MSE) for the hold out sample.

idxTest = test(cvp); % Extract test set indices yHat = predict(Mdl,X(:,idxTest),'ObservationsIn','columns'); L = loss(Mdl,X(:,idxTest),Y(idxTest),'ObservationsIn','columns')

L = 0.1851

The hold-out sample MSE is 0.1852.

## Extended Capabilities

### C/C++ Code Generation

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

When you train a linear regression model by using

`fitrlinear`

, the following restrictions apply.If the predictor data input argument value is a matrix, it must be a full, numeric matrix. Code generation does not support sparse data.

You can specify only one regularization strength, either

`'auto'`

or a nonnegative scalar for the`'Lambda'`

name-value pair argument.The value of the

`'ResponseTransform'`

name-value pair argument cannot be an anonymous function.Code generation with a coder configurer does not support categorical predictors (

`logical`

,`categorical`

,`char`

,`string`

, or`cell`

). You cannot use the`'CategoricalPredictors'`

name-value argument. To include categorical predictors in a model, preprocess them by using`dummyvar`

before fitting the model.

For more information, see Introduction to Code Generation.

## Version History

**Introduced in R2016a**

## See Also

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)