Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors
62 views (last 30 days)
Show older comments
I am probably doing something very dumb, however I cannot figure out my mistake.
I am trying to regress out some predictors from a data set -- I have two categorical predictors, A1 and A2 in a table, something like this:
It seems obvious to me that A1 and A2 are linearly independent. They are also linearly independent from the intercept, which I believe should be a categorical variable that looks like ones(1,11) ? But regardless, I want the global mean to not be removed from everything, so I don't include an intercept in the model.
Then, if I run something like this:
lme = fitlme('values ~ A1 + A2 -1, 'DummyVarCoding','full' )
I always get the same error :
Error using classreg.regr.lmeutils.StandardLinearLikeMixedModel/validateInputs (line 229)
Fixed Effects design matrix X must be of full column rank.
I don't understand why this is happening -- and probably this shows that I have a pretty big misunderstanding of what the dummy variables actually are.
However, if I run two fitlme's -- one on the subset A1==1 and one on A1==0, they both work, which just super confuses me.
0 Comments
Answers (1)
Ive J
on 29 Jan 2022
The error is self-explanatory, and the reason is full dummy variable scheme you're using (why?). See here https://mathworks.com/help/stats/dummy-indicator-variables.html
Note that the error has nothing to do with mixed-model design. Consider this example:
n = 100; % sample size
tab = table(randn(n,1), categorical(randi([0 1], n, 1)), ...
categorical(randi([0, 1], n, 1)),...
'VariableNames', {'value', 'A1', 'A2'});
mdl1 = fitlm(tab, 'value ~ A1 + A2 - 1', 'DummyVarCoding', 'full') % design matrix is rank deficient
So, what happened? Let's construct the design matrix:
X = [dummyvar(tab.A1), dummyvar(tab.A2)]; % DummyVarCoding -> full
disp(rank(X)) % 3 < size(X, 2) --> 3 < 4 --> rank deficient
% what about when considering them alone?
disp(rank(X(:, 1:2))) % full rank
disp(rank(X(:, 3:4))) % full rank
We can approximately find the problematic variable:
[~, R] = qr(X, 0);
find(abs(diag(R)) < 1e-6)
Therefore, don't set 'DummyVarCoding' in such cases (default is 'reference')
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!