stepwiselm: too many output arguments

Running stepwiselm, I get the following results (independent variables are contained in a 5-column matrix called "ingredients" and the dependent variable is Y):
ds = dataset(ingredients(:,1),ingredients(:,2),ingredients(:,3),ingredients(:,4),ingredients(:,5),Y,'Varnames',{'a','b','c','d','e','Growth rate'});
mdl=stepwiselm(ds,'interactions');
[b,se,pval,inmodel,stats,nextstep,history]=stepwiselm(ds,'interactions');
Warning: Variable names were modified to make them valid MATLAB identifiers.
In @dataset\private\setvarnames at 43
In dataset.dataset>dataset.dataset at 384
1. Removing a:e, FStat = 0.36439, pValue = 0.54775
2. Removing a:b, FStat = 0.33757, pValue = 0.56281
3. Removing a:d, FStat = 0.29478, pValue = 0.58861
4. Removing d:e, FStat = 1.3403, pValue = 0.25022
5. Removing b:c, FStat = 2.3391, pValue = 0.12983
6. Removing c:e, FStat = 0.93838, pValue = 0.33538
7. Removing a:c, FStat = 2.4256, pValue = 0.12296
Error using stepwiselm
Too many output arguments.
MY QUESTIONS ARE AS FOLLOWS--First of all, why does it complain about having too many output arguments? And then it returns something for mdl, which actually has more coefficient estimates than described by the growth equation:
mdl =
Linear regression model:
GrowthRate ~ 1 + a + b*d + b*e + c*d
Estimated Coefficients:
Estimate SE tStat pValue
__________ __________ _______ _________
(Intercept) -0.079748 0.042445 -1.8789 0.063534
a -0.22811 0.12251 -1.8619 0.065912
b 0.01196 0.0063717 1.8771 0.063783
c 0.11473 0.064456 1.7799 0.078499
d 0.026636 0.0098081 2.7157 0.007944
e 0.023101 0.008404 2.7489 0.0072412
b:d 0.0012314 0.00060937 2.0207 0.046312
b:e -0.0040564 0.0018181 -2.2311 0.028186
c:d -0.036097 0.012224 -2.953 0.0040246
Number of observations: 98, Error degrees of freedom: 89
Root Mean Squared Error: 0.0236
R-squared: 0.79, Adjusted R-Squared 0.771
F-statistic vs. constant model: 41.9, p-value = 5.73e-27
Any thoughts??? MANY, MANY, MANY thanks in advance!

 Accepted Answer

Brendan Hamm
Brendan Hamm on 14 Sep 2015
Edited: Brendan Hamm on 15 Sep 2015
It complains about having too many output arguments because stepwiselm only passes back one output argument (a LinearModel). The documentation mentions that the 'interactions' model includes an intercept, all linear terms, and all products of pairs of distinct predictors. If you want to consider only the terms you mention then pass this in as the modelspec.
mdl = stepwiselm(ds,'GrowthRate ~ 1 + a + b:d + b:e + c:d');
Here the colon notation means to only include that specific interaction term, whereby b*d would return the linear termsw for both b and d as well. Refer to the Wilkinson Notation at the bottom of the documentation.

6 Comments

Thank you so much for replying! It has been so difficult getting a straight answer on this issue... I want to create the best-fit linear model out of all possible factors and all possible pairwise combinations of the factors. I'm pretty sure the final model should not have all those coefficients, but it lists 8 significant coefficient estimates--clearly alot. Since I did use 'interactions' [as in: mdl=stepwiselm(ds,'interactions'), where ds contains all independent and dependent factors] is it safe to assume that the best-fit linear model is in fact GrowthRate ~ 1 + a + b + c + d + e + b:d + b:e + c:d?
Yes this is the best fit according to the default Criterion which is the Sum-Squared Error. You can always change the criterion to AIC or BIC (Akaike and Bayesian Information Criterion respectivelly), which I feel are more commonly used in practice. To change this it is just a Name-Value pair:
mdl = stepwiselm(ds,'interactions','Criterion','AIC');
Some of the p-values that you have are greater than 0.05, indicating that those coefficients are not statistically different than 0 at the 95% level, and you may want to consider removing them.
Note: I edited the above answer as I forgot the single-quotes ('') for the modelspec.
Ok you convinced me :) I used the line mdl=stepwiselm(ds,'interactions','Criterion','AIC'); and the results were as such:
mdl =
Linear regression model: growth ~ 1 + a*c + b*c + b*d + b*e + c*d + c*e
Estimated Coefficients: Estimate SE tStat pValue _______ ________ ______ ______
(Intercept) 0.074239 0.085914 0.86411 0.38993
a -1.4469 0.7451 -1.9419 0.055421
b 0.02536 0.011881 2.1346 0.035639
c -0.14029 0.1372 -1.0226 0.30939
d 0.029878 0.01224 2.4409 0.0167
e -0.00903 0.021162 -0.42671 0.67066
a:c 2.023 1.1941 1.6942 0.093851
b:c -0.022915 0.014983 -1.5294 0.12983
b:d 0.0010697 0.00061798 1.731 0.087035
b:e -0.003821 0.0018472 -2.0685 0.041593
c:d -0.040019 0.016212 -2.4685 0.015547
c:e 0.051924 0.031023 1.6737 0.097818
Number of observations: 98, Error degrees of freedom: 86 Root Mean Squared Error: 0.0232 R-squared: 0.803, Adjusted R-Squared 0.778 F-statistic vs. constant model: 31.9, p-value = 9.47e-26
BUT the question remains: why does the linear regression model not have b, d, etc. when the p-values indicate significance (and c:e which is not significant is listed in the final model)?
These are included simply because they do increase the information criterion, which does not necessarily mean that the pValue will indicate significance. If you wish to remove these you can do this:
reducedMdl = removeTerms(mdl,'a + e + a:c + b:d + c:e');
I should note that you may want to consider multiple models and compare them and not necessarily just remove all insignificant terms. You may also consider that a 90% Level is appropriate and not remove any terms.
Additionally when you call the stepwiselm there are options to change the tolerance on when to add/remove terms from a model ('PEnter' and 'PRemove' respectivelly). You would want REnter in this case.
Oooo, I see. One other question (just in case it comes up in some future interrogation session :): why does the final model say: linear regression model growth ~ 1 + a*c + b*c + b*d + b*e + c*d + c*e when some terms like b, d, etc. show significance but are not included in the equation?
In the Wilkinson notation:
'response ~ a*c'
is equivalent to:
'response ~ 1 + a + c + a:c'
That is 'a*b' means the interaction terms between a and c (denoted 'a:c') as well as all lower order terms ( a and b ).
Please refer to the documentation on Wilkinson Notation to ensure you understand all terms.

Sign in to comment.

More Answers (0)

Asked:

on 14 Sep 2015

Commented:

on 22 Sep 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!