## Lasso Regularization of Generalized Linear Models

### What is Generalized Linear Model Lasso Regularization?

Lasso is a regularization technique. Use `lassoglm`

to:

Reduce the number of predictors in a generalized linear model.

Identify important predictors.

Select among redundant predictors.

Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Elastic net is a related technique. Use it when you have several
highly correlated variables. `lassoglm`

provides
elastic net regularization when you set the `Alpha`

name-value
pair to a number strictly between `0`

and `1`

.

For details about lasso and elastic net computations and algorithms, see Generalized Linear Model Lasso and Elastic Net. For a discussion of generalized linear models, see What Are Generalized Linear Models?.

### Generalized Linear Model Lasso and Elastic Net

#### Overview of Lasso and Elastic Net

*Lasso* is a regularization technique for estimating generalized linear
models. Lasso includes a penalty term that constrains the size of the estimated
coefficients. Therefore, it resembles Ridge Regression. Lasso is a *shrinkage
estimator*: it generates coefficient estimates that are biased to
be small. Nevertheless, a lasso estimator can have smaller error than an
ordinary maximum likelihood estimator when you apply it to new data.

Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to stepwise regression and other model selection and dimensionality reduction techniques.

*Elastic net* is a related technique. Elastic
net is akin to a hybrid of ridge regression and lasso regularization.
Like lasso, elastic net can generate reduced models by generating
zero-valued coefficients. Empirical studies suggest that the elastic
net technique can outperform lasso on data with highly correlated
predictors.

#### Definition of Lasso for Generalized Linear Models

For a nonnegative value of *λ*, `lassoglm`

solves the
problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta}_{0},\beta \right)+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

The function Deviance in this equation is the deviance of the model fit to the responses using the intercept

*β*_{0}and the predictor coefficients*β*. The formula for Deviance depends on the`distr`

parameter you supply to`lassoglm`

. Minimizing the*λ*-penalized deviance is equivalent to maximizing the*λ*-penalized loglikelihood.*N*is the number of observations.*λ*is a nonnegative regularization parameter corresponding to one value of`Lambda`

.The parameters

*β*_{0}and*β*are a scalar and a vector of length*p*, respectively.

As *λ* increases, the number of nonzero
components of *β* decreases.

The lasso problem involves the *L*^{1} norm
of *β*, as contrasted with the elastic net
algorithm.

#### Definition of Elastic Net for Generalized Linear Models

For *α* strictly between 0 and 1, and nonnegative *λ*,
elastic net solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta}_{0},\beta \right)+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. For other values of *α*,
the penalty term *P _{α}*(

*β*) interpolates between the

*L*

^{1}norm of

*β*and the squared

*L*

^{2}norm of

*β*. As

*α*shrinks toward 0, elastic net approaches

`ridge`

regression.### References

[1] Tibshirani, R. *Regression Shrinkage
and Selection via the Lasso.* Journal of the Royal Statistical
Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. *Regularization
and Variable Selection via the Elastic Net.* Journal of
the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320,
2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie.
*Regularization Paths for Generalized Linear Models via Coordinate
Descent.* Journal of Statistical Software, Vol. 33, No. 1, 2010.
`https://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning,* 2nd edition. Springer,
New York, 2008.

[5] McCullagh, P., and J. A. Nelder. *Generalized
Linear Models,* 2nd edition. Chapman & Hall/CRC Press,
1989.