PCA output gives NaN after normalizing input matrix

22 views (last 30 days)
Hello,
I have an array of 117 features and 125941 observations. I preformed principle component analysis via svd(x) (code below) and plotted the first three compoents in a scatter graph w.r.t clinical truth (true, false clinical data to see seperation). Seperation preformance was poor as I had forgotten to first normilise the data which is required for PCA.
After using normalize(obs) I recomputed the code again, I found that after normilisation;
[U,S,V] = svd(obs,'econ');
U and V return 125941x117 and 117x177 arrays of 'NaN' while S returns the (idetity matrix .* NaN). I do not understand how normilisng the data can change the output from a valid numerical output to NaN given the the 'new' input data is just the old data scaled.
I will include a copy of the data which works, if you use the normilise(x) command this data will return NaN values instead in the code below;
I do not understand why normalizing the data (obs) would cause this NaN output?
[U,S,V] = svd(obs,'econ'); % Preform svd
figure
subplot(1,2,1)
semilogy(diag(S),'k-o','LineWidth',2.5) %log graph matrix rank, quick drop off is better
set(gca,'FontSize',15), axis tight, grid on
subplot(1,2,2)
plot(cumsum(diag(S))./sum(diag(S)),'k-o','Linewidth',2.5) % Log graph of each compoent makeup
set(gca,'FontSize',15), axis tight, grid on
set(gcf,'Position',[1440 100 3*600 3*250])
figure, hold on
for i = 1:size(obs,1)
x = V(:,1)'*obs(i,:)'; % Generate the first three compoents and generate new vectors based on these
y = V(:,2)'*obs(i,:)';
z = V(:,3)'*obs(i,:)';
if (grp(i) == 1)
plot3(x,y,z,'rx','LineWidth',1); % If clinically positive plot as a red x
else
plot3(x,y,z,'bo','LineWidth',1);% If clinically negitive plot as a blue o
end
end
xlabel('PC1'), ylabel('PC2'), zlabel('PC3')
view(85,25), grid on, set(gca,'FontSize',15)
set(gcf,'Position',[1400 100 1200 900])

Accepted Answer

David Goodmanson
David Goodmanson on 17 Dec 2021
Hi Christopher,
I am not making any comment on the svd procedure, but
f = find(all(obs==0))
f =
79 80 81 82 83 84 85 86 87 88 89 90 91
says that the indicated columns consist of all zeros. For those columns, 'normalize' is trying to scale a standard deviation of 0 to a standard deviation of 1, so it gives up and puts NaNs.
  1 Comment
Christopher McCausland
Christopher McCausland on 17 Dec 2021
Hi David,
Thank you for taking a look!! You were absolutely correct, I have 13 channels and one of my features was returning all 0's on these channels (i'll need to look into why tommorrow). When normalsied the 0 columns are reduced to NaNs. I replaced these 0's with 1's and get a result. (So atleast I know this is what is causing the issue, a massive thank you!! )
The strange part though; svd(x) on 2020b (which is what I was using at the start of the week) will throw an error for inputing an array with any NaN values to svd(x). 2021b (Which i've been using the last few days) doesn't appear too. I think this is a legitimate bug, as the error handling behaviour should match in both releases.
Kind regards,
Christopher

Sign in to comment.

More Answers (0)

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!