# Multivariate General Linear Model

This example shows how to set up a multivariate general linear model for estimation using `mvregress`.

This data contains measurements on a sample of 205 auto imports from 1985.

Here, model the bivariate response of city and highway MPG (columns 14 and 15).

For predictors, use wheel base (column 3), curb weight (column 7), and fuel type (column 18). The first two predictors are continuous, and for this example are centered and scaled. Fuel type is a categorical variable with two categories (`11` and `20`), so a dummy indicator variable is needed for the regression.

```load('imports-85') Y = X(:,14:15); [n,d] = size(Y); X1 = zscore(X(:,3)); X2 = zscore(X(:,7)); X3 = X(:,18)==20; Xmat = [ones(n,1) X1 X2 X3];```

The variable `X3` is coded to have value `1` for the fuel type 20, and value `0` otherwise.

For convenience, the three predictors (wheel base, curb weight, and fuel type indicator) are combined into one design matrix, with an added intercept term.

### Set up design matrices.

Given these predictors, the multivariate general linear model for the bivariate MPG response is

`$\left[\begin{array}{cc}{y}_{11}& {y}_{12}\\ {y}_{21}& {y}_{22}\\ ⋮& ⋮\\ {y}_{n1}& {y}_{n2}\end{array}\right]=\left[\begin{array}{cccc}1& {x}_{11}& {x}_{12}& {x}_{13}\\ 1& {x}_{21}& {x}_{22}& {x}_{23}\\ ⋮& ⋮& ⋮& ⋮\\ 1& {x}_{n1}& {x}_{n2}& {x}_{n3}\end{array}\right]\left[\begin{array}{cc}{\beta }_{01}& {\beta }_{02}\\ {\beta }_{11}& {\beta }_{12}\\ {\beta }_{21}& {\beta }_{22}\\ {\beta }_{31}& {\beta }_{32}\end{array}\right]+\left[\begin{array}{cc}{ϵ}_{11}& {ϵ}_{12}\\ {ϵ}_{21}& {ϵ}_{22}\\ ⋮& ⋮\\ {ϵ}_{n1}& {ϵ}_{n2}\end{array}\right],$`

where ${ϵ}_{i}={\left({ϵ}_{i1},{ϵ}_{i2}\right)}^{\prime }-MVN\left(0,\Sigma \right)$. There are $K=8$ regression coefficients in total.

Create a length $n=205$ cell array of 2-by-8 (d-by-K) matrices for use with `mvregress`. The `i`th matrix in the cell array is

`$X\left(i\right)=\left[\begin{array}{cccccccc}1& 0& {x}_{i1}& 0& {x}_{i2}& 0& {x}_{i3}& 0\\ 0& 1& 0& {x}_{i1}& 0& {x}_{i2}& 0& {x}_{i3}\end{array}\right].$`

```Xcell = cell(1,n); for i = 1:n Xcell{i} = [kron([Xmat(i,:)],eye(d))]; end```

Given this specification of the design matrices, the corresponding parameter vector is

`$\beta =\left[\begin{array}{c}{\beta }_{01}\\ {\beta }_{02}\\ {\beta }_{11}\\ {\beta }_{12}\\ {\beta }_{21}\\ {\beta }_{22}\\ {\beta }_{31}\\ {\beta }_{32}\end{array}\right].$`

### Estimate regression coefficients.

Fit the model using maximum likelihood estimation.

```[beta,sigma,E,V] = mvregress(Xcell,Y); beta```
```beta = 8×1 33.5476 38.5720 0.9723 0.3950 -6.3064 -6.3584 -9.2284 -8.6663 ```

These coefficient estimates show:

• The expected city and highway MPG for cars of average wheel base, curb weight, and fuel type 11 are `33.5` and `38.6`, respectively. For fuel type 20, the expected city and highway MPG are `33.5476 - 9.2284 = 24.3192` and `38.5720 - 8.6663 = 29.9057`.

• An increase of one standard deviation in curb weight has almost the same effect on expected city and highway MPG. Given all else is equal, the expected MPG decreases by about `6.3` with each one standard deviation increase in curb weight, for both city and highway MPG.

• For each one standard deviation increase in wheel base, the expected city MPG increases `0.972`, while the expected highway MPG increases by only `0.395`, given all else is equal.

### Compute standard errors.

The standard errors for the regression coefficients are the square root of the diagonal of the variance-covariance matrix, `V`.

`se = sqrt(diag(V))`
```se = 8×1 0.7365 0.7599 0.3589 0.3702 0.3497 0.3608 0.7790 0.8037 ```

### Reshape coefficient matrix.

You can easily reshape the regression coefficients into the original 4-by-2 matrix.

`B = reshape(beta,2,4)'`
```B = 4×2 33.5476 38.5720 0.9723 0.3950 -6.3064 -6.3584 -9.2284 -8.6663 ```

### Check model assumptions.

Under the model assumptions, $z=E{\Sigma }^{-1/2}$ should be independent, with a bivariate standard normal distribution. In this 2-D case, you can assess the validity of this assumption using a scatter plot.

```z = E/chol(sigma); figure() plot(z(:,1),z(:,2),'.') title('Standardized Residuals') hold on % Overlay standard normal contours z1 = linspace(-5,5); z2 = linspace(-5,5); [zx,zy] = meshgrid(z1,z2); zgrid = [reshape(zx,100^2,1),reshape(zy,100^2,1)]; zn = reshape(mvnpdf(zgrid),100,100); [c,h] = contour(zx,zy,zn); clabel(c,h)``` Several residuals are larger than expected, but overall, there is little evidence against the multivariate normality assumption.