Multiple Linear Regression creates same results as the Target values

Asked by mr mo

mr mo (view profile)

on 12 Feb 2018
Latest activity Commented on by Jelle

Jelle (view profile)

on 13 Feb 2018
Hi. I have to use the Multiple linear Regression in Matlab. I have the X matrix of size 10*1000 and Y Matrix of size 10*1. I use this code
b = regress(Y,X);
and then I use this
s=find(b~=0)
to find the values of vector b that are inequal to zero. assume that this is the result of above code
s= 2 7 95 172 290 333 471 560 680 890
then I use this
for i=1:size(X,1)
Z(i,1)=X(i,2)*b(2,1)+X(i,7)*b(7,1)+X(i,95)*b(95,1)+X(i,172)*b(172,1)+
X(i,290)*b(290,1)+X(i,333)*b(333,1)+X(i,471)*b(471,1)+X(i,560)*b(560,1)+
X(i,680)*b(680,1)+X(i,890)*b(890,1);
end
to create the output values.
But the output values is same as the Y values.
Did I make a mistake some where or the code is true ?
Thanks a lot.

Jelle (view profile)

on 13 Feb 2018

If you have more predictors in your regression than you have values to predict you will get a perfect prediction model.
In other words: If you have 10 measurements, and 10 variables that could be related to these measurements, you can fit a model that perfectly explains the variation in the measured variables. This is called overfitting. In practice, only 1 or 2 variables really explain what is going on. The rest is just 'filling the gaps'
I would recommend looking at stepwise regression if you do not know which of the 1000 predictors in X are the ones you need. The check the p-value for the predictor to see how likely it is that predictor is just a random variable.

mr mo

mr mo (view profile)

on 13 Feb 2018
Thanks for your help. How can I use stepwise regression in this case?
Jelle

Jelle (view profile)

on 13 Feb 2018
Cannot help you with the actual implementation: I have not used the stepwise model in matlab. It is here, and seems straightforward though: https://au.mathworks.com/help/stats/stepwisefit.html
Rather, I have build a stepwise fitter myself, as my goal was not the best fitting regression, but an estimate of the reliability of the fitting (https://au.mathworks.com/matlabcentral/answers/379468-vectorizing-a-repeated-regression).