Why RMSE obtained by fitlm in matlab does not match with RMSE calculated in EXCEL ؟
23 views (last 30 days)
Show older comments
Hi every one . I've used the mdl = fitlm(x,y) function to fit a linear regression model to my Dataset. I also calculate the RMSE in Excel by Known Formula . the fitlm function in matlab return the exact value of R-squared calculated in excel and the exact Coefficients of Trendline. but the Value of RMSE in matlab and excel does not match. i was made a wide search but I'm still in trouble with that . any idea ? thanks for help . with best regards .
3 Comments
the cyclist
on 27 May 2016
I would specifically suggest posting both a *.m file and a *.xls file that replicate the simplest example you can provide that exhibits the problem.
Answers (2)
John D'Errico
on 28 May 2016
Edited: John D'Errico
on 28 May 2016
Your known formula is not always the formula that one might use. In fact, there is a subtly different alternative.
You divided by the number of data points there. In fact, a rational formula for RMSE has one divide by the number of data, less the number of parameters estimated. So by the number of degrees of freedom. A simple test of this fact is often the easiest thing to do, then one can verify my thesis.
x = randn(100,1);
y = randn(100,1);
lm = fitlm(x,y,'linear')
lm =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
__________________ __________________ __________________ _________________
(Intercept) -0.060640930787764 0.0951675587117722 -0.637201706218221 0.52547928403413
x1 0.0370287221087163 0.0935517335328456 0.395810111799967 0.693105501934868
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 0.946
R-squared: 0.0016, Adjusted R-Squared -0.00859
F-statistic vs. constant model: 0.157, p-value = 0.693
lm.RMSE
ans =
0.946014427051301
sqrt(sum((y - lm.predict(x)).^2/100))
ans =
0.936506503055594
sqrt(sum((y - lm.predict(x)).^2/98))
ans =
0.946014427051301
As you can see, dividing by the degrees of freedom is what fitlm must be doing.
3 Comments
Greg Heath
on 2 Jun 2016
You are asking which one is correct.
Well, they all are correct. They are just different measures of the same model. You are free to choose any one you want. HOWEVER, if the differences are significant then you should be able to explain why.
Since I are un injuneer and not a statistician, I will refer you to Google and Wikipedia re the search words
tutorial degrees-of-freedom
Hope this helps.
Greg
Anurag Banerjee
on 4 Jul 2018
Engineers doing statistics. One day Statisticians will design cars
0 Comments
See Also
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!