## What Is a Linear Regression Model?

A linear regression model describes the relationship between a *dependent variable*, *y*, and one or more *independent variables*, *X*. The dependent variable is also called the *response variable*. Independent variables are also called *explanatory* or *predictor variables*. Continuous predictor variables are also called *covariates*, and categorical predictor variables are also called *factors*. The matrix *X* of observations on predictor variables is usually called the *design matrix*.

A multiple linear regression model is

$${y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{i1}+{\beta}_{2}{X}_{i2}+\cdots +{\beta}_{p}{X}_{ip}+{\epsilon}_{i},\text{\hspace{1em}}i=1,\cdots ,n,$$

where

*y*is the_{i}*i*th response.*β*_{k}is the*k*th coefficient, where*β*_{0}is the constant term in the model. Sometimes, design matrices might include information about the constant term. However,`fitlm`

or`stepwiselm`

by default includes a constant term in the model, so you must not enter a column of 1s into your design matrix*X*.*X*is the_{ij}*i*th observation on the*j*th predictor variable,*j*= 1, ...,*p*.*ε*is the_{i}*i*th noise term, that is, random error.

If a model includes only one predictor variable (*p* = 1), then the model is called a simple linear regression model.

In general, a linear regression model can be a model of the form

$${y}_{i}={\beta}_{0}+{\displaystyle \sum _{k=1}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i},\text{\hspace{1em}}i=1,\cdots ,n,$$

where *f* (.) is a scalar-valued function of the independent variables, *X** _{ij}*s. The functions,

*f*(

*X*), might be in any form including nonlinear functions or polynomials. The linearity, in the linear regression models, refers to the linearity of the coefficients

*β*

_{k}. That is, the response variable,

*y*, is a linear function of the coefficients,

*β*

_{k}.

Some examples of linear models are:

$$\begin{array}{l}{y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{3i}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{1i}^{3}+{\beta}_{4}{X}_{2i}^{2}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{1i}{X}_{2i}+{\beta}_{4}\mathrm{log}{X}_{3i}+{\epsilon}_{i}\end{array}$$

The following, however, are not linear models since they are not linear in the unknown coefficients, *β*_{k}.

$$\begin{array}{l}\mathrm{log}{y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+{\beta}_{2}{X}_{2i}+{\epsilon}_{i}\\ {y}_{i}={\beta}_{0}+{\beta}_{1}{X}_{1i}+\frac{1}{{\beta}_{2}{X}_{2i}}+{e}^{{\beta}_{3}{X}_{1i}{X}_{2i}}+{\epsilon}_{i}\end{array}$$

The usual assumptions for linear regression models are:

The noise terms,

*ε*, are uncorrelated._{i}The noise terms,

*ε*_{i}, have independent and identical normal distributions with mean zero and constant variance, σ^{2}. Thus,$$\begin{array}{l}E\left({y}_{i}\right)=E\left({\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i}\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+E\left({\epsilon}_{i}\right)\\ \text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}\end{array}$$

and

$$V\left({y}_{i}\right)=V\left({\displaystyle \sum _{k=0}^{K}{\beta}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)}+{\epsilon}_{i}\right)=V\left({\epsilon}_{i}\right)={\sigma}^{2}$$

So the variance of

*y*_{i}is the same for all levels of*X*_{ij}.The responses

*y*_{i}are uncorrelated.

The fitted linear function is

$${\widehat{y}}_{i}={\displaystyle \sum _{k=0}^{K}{b}_{k}{f}_{k}\left({X}_{i1},{X}_{i2},\cdots ,{X}_{ip}\right)},\text{\hspace{1em}}i=1,\cdots ,n,$$

where $${\widehat{y}}_{i}$$ is the estimated response and *b _{k}*s are the fitted coefficients. The coefficients are estimated so as to minimize the mean squared difference between the prediction vector $$\widehat{y}$$ and the true response vector $$y$$, that is $$\widehat{y}-y$$. This method is called the

*method of least squares*. Under the assumptions on the noise terms, these coefficients also maximize the likelihood of the prediction vector.

In a linear regression model of the form *y* = *β*_{1}*X*_{1} +* β*_{2}*X*_{2} + ... + *β*_{p}X_{p}, the coefficient *β*_{k} expresses the impact of a one-unit change in predictor variable, *X _{j}*, on the mean of the response E(

*y*), provided that all other variables are held constant. The sign of the coefficient gives the direction of the effect. For example, if the linear model is E(

*y*) = 1.8 – 2.35

*X*

_{1}+

*X*

_{2}, then –2.35 indicates a 2.35 unit decrease in the mean response with a one-unit increase in

*X*

_{1}, given

*X*

_{2}is held constant. If the model is E(

*y*) = 1.1 + 1.5

*X*

_{1}

^{2}+

*X*

_{2}, the coefficient of

*X*

_{1}

^{2}indicates a 1.5 unit increase in the mean of

*Y*with a one-unit increase in

*X*

_{1}

^{2}given all else held constant. However, in the case of E(

*y*) = 1.1 + 2.1

*X*

_{1}+ 1.5

*X*

_{1}

^{2}, it is difficult to interpret the coefficients similarly, since it is not possible to hold

*X*

_{1}constant when

*X*

_{1}

^{2}changes or vice versa.

## References

[1] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. *Applied Linear Statistical Models*. IRWIN, The McGraw-Hill Companies, Inc., 1996.

[2] Seber, G. A. F. *Linear Regression Analysis*. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, Inc., 1977.

## See Also

`LinearModel`

| `fitlm`

| `stepwiselm`