Linear Mixed-Effects Models

Linear mixed-effects models are extensions of linear regression models for data that are collected and summarized in groups. These models describe the relationship between a response variable and independent variables, with coefficients that can vary with respect to one or more grouping variables. A mixed-effects model consists of two parts, fixed effects and random effects. Fixed-effects terms are usually the conventional linear regression part, and the random effects are associated with individual experimental units drawn at random from a population. The random effects have prior distributions whereas fixed effects do not. Mixed-effects models can represent the covariance structure related to the grouping of data by associating the common random effects to observations that have the same level of a grouping variable. The standard form of a linear mixed-effects model is

$y = \underset{f i x e d}{\underset{︸}{X β}} + \underset{r a n d o m}{\underset{︸}{Z b}} + \underset{e r r o r}{\underset{︸}{ε}},$

where

y is the n-by-1 response vector, and n is the number of observations.
X is an n-by-p fixed-effects design matrix.
β is a p-by-1 fixed-effects vector.
Z is an n-by-q random-effects design matrix.
b is a q-by-1 random-effects vector.
ε is the n-by-1 observation error vector.

The assumptions for the linear mixed-effects model are:

Random-effects vector, b, and the error vector, ε, have the following prior distributions:

$\begin{array}{l} b ~ N (0, σ^{2} D (θ)), \\ ε ~ N (0, σ {}^{2}I), \end{array}$
where D is a symmetric and positive semidefinite matrix, parameterized by a variance component vector θ, I is an n-by-n identity matrix, and σ² is the error variance.
Random-effects vector, b, and the error vector, ε, are independent from each other.

Mixed-effects models are also called multilevel models or hierarchical models depending on the context. Mixed-effects models is a more general term than the latter two. Mixed-effects models might include factors that are not necessarily multilevel or hierarchical, for example crossed factors. That is why mixed-effects is the terminology preferred here. Sometimes mixed-effects models are expressed as multilevel regression models (first level and grouping level models) that are fit simultaneously. For example, a varying or random intercept model, with one continuous predictor variable x and one grouping variable with M levels, can be expressed as

$\begin{array}{l} y_{i m} = β_{0 m} + β_{1} x_{i m} + ε_{i m}, i = 1, 2, .., n, m = 1, 2, ..., M, ε_{i m} ~ N (0, σ^{2}), \\ β_{0 m} = β_{00} + b_{0 m}, b_{0 m} ~ N (0, σ_{0}^{2}), \end{array}$

where y_im corresponds to data for observation i and group m, n is the total number of observations, and b_0m and ε_im are independent of each other. After substituting the group-level parameters in the first-level model, the model for the response vector becomes

$y_{i m} = \underset{f i x e d e f f e c t s}{\underset{︸}{β_{00} + β_{1} x_{i m}}} + \underset{r a n d o m e f f e c t s}{\underset{︸}{b_{0 m}}} + ε_{i m} .$

A random intercept and slope model with one continuous predictor variable x, where both the intercept and slope vary independently by a grouping variable with M levels is

$\begin{array}{l} y_{i m} = β_{0 m} + β_{1 m} x_{i m} + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M, ε_{i m} ~ N (0, σ^{2}), \\ β_{0 m} = β_{00} + b_{0 m}, b_{0 m} ~ N (0, σ_{0}^{2}), \\ β_{1 m} = β_{10} + b_{1 m}, b_{1 m} ~ N (0, σ_{1}^{2}), \end{array}$

$b_{m} = (\begin{array}{l} b_{0 m} \\ b_{1 m} \end{array}) ~ N (0, (\begin{matrix} σ_{0}^{2} & 0 \\ 0 & σ_{1}^{2} \end{matrix})) .$

You might also have correlated random effects. In general, for a model with a random intercept and slope, the distribution of the random effects is

$b_{m} = (\begin{array}{l} b_{0 m} \\ b_{1 m} \end{array}) ~ N (0, σ {}^{2}D (θ)),$

where D is a 2-by-2 symmetric and positive semidefinite matrix, parameterized by a variance component vector θ.

After substituting the group-level parameters in the first-level model, the model for the response vector is

$y_{i m} = \underset{f i x e d e f f e c t s}{\underset{︸}{β_{00} + β_{10} x_{i m}}} + \underset{r a n d o m e f f e c t s}{\underset{︸}{b_{0 m} + b_{1 m} x_{i m}}} + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M .$

If you express the group-level variable, x_im, in the random-effects term by z_im, this model is

$y_{i m} = \underset{f i x e d e f f e c t s}{\underset{︸}{β_{00} + β_{10} x_{i m}}} + \underset{r a n d o m e f f e c t s}{\underset{︸}{b_{0 m} + b_{1 m} z_{i m}}} + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M .$

In this case, the same terms appear in both the fixed-effects design matrix and random-effects design matrix. Each z_im and x_im correspond to the level m of the grouping variable.

It is also possible to explain more of the group-level variations by adding more group-level predictor variables. A random-intercept and random-slope model with one continuous predictor variable x, where both the intercept and slope vary independently by a grouping variable with M levels, and one group-level predictor variable v_m is

$\begin{array}{l} y_{i m} = β_{0 i m} + β_{1 i m} x_{i m} + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M, ε_{i m} ~ N (0, σ^{2}), \\ β_{0 i m} = β_{00} + β_{01} v_{i m} + b_{0 m}, b_{0 m} ~ N (0, σ_{0}^{2}), \\ β_{1 i m} = β_{10} + β_{11} v_{i m} + b_{1 m}, b_{1 m} ~ N (0, σ_{1}^{2}) . \end{array}$

This model results in main effects of the group-level predictor and an interaction term between the first-level and group-level predictor variables in the model for the response variable as

$\begin{array}{l} y_{i m} = β_{00} + β_{01} v_{i m} + b_{0 m} + (β_{10} + β_{11} v_{i m} + b_{1 m}) x_{i m} + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M, \\ = \underset{f i x e d e f f e c t s}{\underset{︸}{β_{00} + β_{10} x_{i m} + β_{01} v_{i m} + β_{11} v_{i m} x_{i m}}} + \underset{r a n d o m e f f e c t s}{\underset{︸}{b_{0 m} + b_{1 m} x_{i m}}} + ε_{i m} . \end{array}$

The term β₁₁v_mx_im is often called a cross-level interaction in many textbooks on multilevel models. The model for the response variable y can be expressed as

$\begin{array}{l} y_{i m} = [\begin{matrix} 1 & x_{1}_{i m} & v_{i m} & v_{i m} x_{1 i m} \end{matrix}] [\begin{matrix} β_{00} \\ β_{10} \\ β_{01} \\ β_{11} \end{matrix}] + [\begin{matrix} 1 & x_{1 i m} \end{matrix}] [\begin{matrix} b_{0 m} \\ b_{1 m} \end{matrix}] + ε_{i m}, i = 1, 2, ..., n, m = 1, 2, ..., M, \end{array}$

which corresponds to the standard form given earlier,

$y = X β + Z b + ε .$

In general, if there are R grouping variables, and m(r,i) shows the level of grouping variable r, for observation i, then the model for the response variable for observation i is

$y_{i} = x_{i}^{T} β + \sum_{r = 1}^{R} z_{i r} b_{m (r, i)}^{(r)} + ε_{i}, i = 1, 2, ..., n,$

where β is a p-by-1 fixed-effects vector, b^(r)_m(r,i) is a q(r)-by-1 random-effects vector for the rth grouping variable and level m(r,i), and ε_i is a 1-by-1 error term for observation i.

References

[1] Pinherio, J. C., and D. M. Bates. Mixed-Effects Models in S and S-PLUS. Statistics and Computing Series, Springer, 2004.

[2] Hariharan, S. and J. H. Rogers. “Estimation Procedures for Hierarchical Linear Models.” Multilevel Modeling of Educational Data (A. A. Connell and D. B. McCoach, eds.). Charlotte, NC: Information Age Publishing, Inc., 2008.

[3] Hox, J. Multilevel Analysis, Techniques and Applications. Lawrence Erlbaum Associates, Inc., 2002

[4] Snidjers, T. and R. Bosker. Multilevel Analysis. Thousand Oaks, CA: Sage Publications, 1999.

[5] Gelman, A. and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. New York, NY: Cambridge University Press, 2007.

Linear Mixed-Effects Models

References

See Also

Topics