Getting NaN when computing partialcorr (no NaNs in data)
Show older comments
Hi, I am using partialcorr on series of data and it sometimes results in NaNs. Why is that? I am sure I have no NaNs in my data and no missing or empty entries. Sometimes using partialcorr([x y], 'rows','complete') helps bot it does not always fix the problem. Thanks for help.
4 Comments
Kate
on 20 Jun 2016
Could you provide some sample data/code? Make sure that your columns are variables and rows are observations. It could be an effect of how many variables you are partially correlating, or filtering for statistical significance (which you should definitely do). Hope that helps.
Sarah-Sophie Weil
on 19 Jul 2018
Well, it's been a year since this question was asked, but there has never been an answer. I have the same problem: I'm using partialcorr to calculate the correlation between two variables (flowering date and cumulated temperature) while controlling for two other variables at the same time.
I calculated the cumulated temperature over different periods of the year (31 different periods altogether) and want to know which period of the year explains the greatest variance in the flowering date while I already have two other variables in the model.
For 30 of the 31 different periods I used, partialcorr runs without problem, however there is one where partialcorr returns NaN.
I provided the data (64 years/observations in total) and this is the command I used:
partialcorr([flower_date,cum_temp],[Var1,Var2])
I'd be greatful for any help!
Cheers, Sarah
ARIEL YEHUDA GOLDSTEIN
on 4 May 2021
encountering the same issue. I wish someone helps..
tF=readtable(websave('Test_data.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/125764/Test_data.txt'));
partialcorr([tF.flower_date,tF.cum_temp],[tF.Var1,tF.Var2])
fitlm(tF,'predictorVars',{'cum_temp','Var1','Var2'},'ResponseVar','flower_date','intercept',true)
So partialcorr isn't lying to us; let's see what's going on between the independent variables themselves...
corrcoef([tF.cum_temp,tF.Var1,tF.Var2])
OK, none of those are identically 1 altho cum_temp is very highly correlated with Var1 and Var1,Var2 are pretty high with each other, they aren't directly correlated. So, the conclusion has to be that cum_temp is a linear combination of the other two...let's check that out next--
fitlm(tF,'predictorVars',{'Var1','Var2'},'ResponseVar','cum_temp','intercept',true)
That last shows that cum_temp is identically predicted by a linear combination of Var1, Var2 leading to the given results before.
This probably means that Var1, Var2 were/are derived, not observed variables and may throw doubt on the rest of the prior analyses as well, depending on just how those corollary variables were/are defined and what it is that prevented the above result for other cases as well.
Answers (1)
Adam Danz
on 4 May 2021
0 votes
See similar question: getting a NaN in correlation coefficient
The same basic problem is happening with the partial correlation.
When correlating variable X with variable Y while controlling for variable Z, the X variable may be predicted by Z so their residuals would be 0 or very close to 0. To prevent returning a spurious correlation, the partialcorr function detects residuals close to 0 and sets them to 0 to avoid floating point roundoff error. If you look at the equation in the wiki article, it will be clear why NaN values are returned in those cases since 0/0=NaN.
The partialcorr.m file contains valuable comments by its authors explaining this just above the lines of code that compute the correlation coefficients (r2021a).
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!