Why RMSE obtained by fitlm in matlab does not match with RMSE calculated in EXCEL ؟

21 views (last 30 days)
Hi every one . I've used the mdl = fitlm(x,y) function to fit a linear regression model to my Dataset. I also calculate the RMSE in Excel by Known Formula . the fitlm function in matlab return the exact value of R-squared calculated in excel and the exact Coefficients of Trendline. but the Value of RMSE in matlab and excel does not match. i was made a wide search but I'm still in trouble with that . any idea ? thanks for help . with best regards .
  3 Comments
the cyclist
the cyclist on 27 May 2016
I would specifically suggest posting both a *.m file and a *.xls file that replicate the simplest example you can provide that exhibits the problem.
mr mo
mr mo on 28 May 2016
Edited: Walter Roberson on 28 May 2016
Hi
Thanks for help
this is the RMSE code i was written in matlab
and i calculate the RMSE value in Excel in the same way of this code shown below
function R = RMSE (data , estimate)
R = sqrt(sum((data(:) - estimate (:)).^2) / numel(data));
end
and this is the fitlm code
for kk=1:size(x,2)
lm=fitlm(x,y,'linear');
Rsq_TR=lm.Rsquared.Ordinary;
RMSE_TR=lm.RMSE;
end
and this is some of my sample data set
-0.29039678 -0.212428117
0.149798734 0.268486256
0.851351352 0.563358782
-0.694364578 -0.652148947
-0.326624496 -0.468058466
-0.019838988 0.209298441
-0.765669926 -0.828415796
-0.133410006 0.003000269
-0.122484186 -0.073538329
-0.471535366 -0.376903371
0.181426106 0.365548334
0.186889016 0.37947765
-0.260782058 -0.024155069
-0.215353652 -0.371639751
-0.062967222 0.092607096
-0.461759632 -0.56834877
0.074468086 0.154155435
0.624496838 0.529777368
-0.197814836 -0.096543508
-0.247268546 -0.136442585
0.106382978 0.28668492
-0.791259344 -0.718599416
-0.532489936 -0.572352826
-0.006325474 -0.010056861
0.48706153 0.485766679
-0.631397354 -0.619362709
-0.815698678 -0.823629139
0.272857964 0.354840024
0.080506038 0.220713202
the left column is the data values(x) and the right column is the estimate (y) values

Sign in to comment.

Answers (2)

John D'Errico
John D'Errico on 28 May 2016
Edited: John D'Errico on 28 May 2016
Your known formula is not always the formula that one might use. In fact, there is a subtly different alternative.
You divided by the number of data points there. In fact, a rational formula for RMSE has one divide by the number of data, less the number of parameters estimated. So by the number of degrees of freedom. A simple test of this fact is often the easiest thing to do, then one can verify my thesis.
x = randn(100,1);
y = randn(100,1);
lm = fitlm(x,y,'linear')
lm =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
__________________ __________________ __________________ _________________
(Intercept) -0.060640930787764 0.0951675587117722 -0.637201706218221 0.52547928403413
x1 0.0370287221087163 0.0935517335328456 0.395810111799967 0.693105501934868
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 0.946
R-squared: 0.0016, Adjusted R-Squared -0.00859
F-statistic vs. constant model: 0.157, p-value = 0.693
lm.RMSE
ans =
0.946014427051301
sqrt(sum((y - lm.predict(x)).^2/100))
ans =
0.936506503055594
sqrt(sum((y - lm.predict(x)).^2/98))
ans =
0.946014427051301
As you can see, dividing by the degrees of freedom is what fitlm must be doing.
  3 Comments
mr mo
mr mo on 30 May 2016
Edited: mr mo on 30 May 2016
Hello again
Thank you for your helps.
I must say this before but I thought that it is not important.
This Data set shown below is My Data set for function approximation with Neural Network in MATLAB. The left column is the Target vector and the right column is the model output vector.
-0.29039678 -0.212428117
0.149798734 0.268486256
0.851351352 0.563358782
-0.694364578 -0.652148947
-0.326624496 -0.468058466
-0.019838988 0.209298441
-0.765669926 -0.828415796
-0.133410006 0.003000269
-0.122484186 -0.073538329
-0.471535366 -0.376903371
0.181426106 0.365548334
0.186889016 0.37947765
-0.260782058 -0.024155069
-0.215353652 -0.371639751
-0.062967222 0.092607096
-0.461759632 -0.56834877
0.074468086 0.154155435
0.624496838 0.529777368
-0.197814836 -0.096543508
-0.247268546 -0.136442585
0.106382978 0.28668492
-0.791259344 -0.718599416
-0.532489936 -0.572352826
-0.006325474 -0.010056861
0.48706153 0.485766679
-0.631397354 -0.619362709
-0.815698678 -0.823629139
0.272857964 0.354840024
0.080506038 0.220713202
and I want to show the capability of my model by calculating the RMSE value between the Target & Output vectors in MATLAB .
my question is that what kind of RMSE should I use ?
The RMSE obtained by fitlm function or RMSE obtained by the formula shown below?
function R = RMSE (data , estimate)
R = sqrt(sum((data(:) - estimate (:)).^2) / numel(data));
end
and if I use the RMSE obtained by fitlm function, Is the RMSE value true ??
because the vector created by
lm.predict(x)
and my Output vector are different.
with best regards
Greg Heath
Greg Heath on 2 Jun 2016
You are asking which one is correct.
Well, they all are correct. They are just different measures of the same model. You are free to choose any one you want. HOWEVER, if the differences are significant then you should be able to explain why.
Since I are un injuneer and not a statistician, I will refer you to Google and Wikipedia re the search words
tutorial degrees-of-freedom
Hope this helps.
Greg

Sign in to comment.


Anurag Banerjee
Anurag Banerjee on 4 Jul 2018
Engineers doing statistics. One day Statisticians will design cars

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!