Original vectors and interp1 vectors result in a different correlation coefficients?
2 views (last 30 days)
Show older comments
I am curious as to why I can be receiving a different correlation coefficient (R) value between the original random vectors and their interpolation. When I investigate the interpolated values, both vectors have different numbers than the original yet plot the same. In calculating the R values of both original and both interpolated vectors, the results are not similar. Any ideas of why this is?
2 Comments
Jan Orwat
on 13 Jan 2016
Could you describe between what kind of data you are calculating this correlation? Is it like vector x and y and then x_interpolated and y_interpolated? You wrote about random vectors, then those are interpolated. It seems it may increasing the strength of relationship between the data thus changing correlation, even if interpolated data looks similar. In other words interpolation decreases randomness.
Answers (1)
John D'Errico
on 14 Jan 2016
Edited: John D'Errico
on 14 Jan 2016
There is NO expectation that a new set of interpolated points will have the same correlation coefficient as that for the base set. For example, suppose we have a set of points that follow a nonlinear relationship.
x = (0:1:10)';
y = exp(x);
We can compute the correlation coefficient.
corrcoef([x,y])
ans =
1 0.691404156500157
0.691404156500157 1
If I then interpolate the points, to get a NEW set of values, i.e.,
xi = linspace(0,10,50)':
yi = interp1(x,y,xi,'spline');
corrcoef([xi,yi])
ans =
1 0.687541374878435
0.687541374878435 1
As you see, the correlation coefficient is not the same for the new set as for the old one. That is as expected.
It would not matter had I used spline, cubic, or linear interpolation in interp1. The correlation coefficient will generally be close to the original set, but it need not be the same at all.
This is a nonlinear relationship. There is NO expectation that the correlation be the same for interpolated points as for the original. The correlation coefficient is NOT something that is maintained by interpolation. In fact, if I get creative in how I choose my data and the new set of points, I can trivially come up with an example where the correlation coefficient changes sign.
x = (0:1:10)';
y = sin(x);
corrcoef([x,y])
ans =
1 -0.116741765087288
-0.116741765087288 1
So a moderately small negative correlation.
xi = (0:.1:1)';
yi = interp1(x,y,xi,'spline');
corrcoef([xi,yi])
ans =
1 0.994300310123026
0.994300310123026 1
So a correlation coefficient that is near 1, on an interpolated set from the original data, when the original set had a negative correlation.
In all cases I have shown, the interpolated values will plot neatly on top of the curve from the original set.
0 Comments
See Also
Categories
Find more on Get Started with Curve Fitting Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!