Clustering 3D data based on Euclidean distance turns out insufficient
4 views (last 30 days)
Show older comments
Hi,
I have 3 variables that seem to cluster very well (the color code is the amplitude of an oscillation):
vars = [acc, fb.fullscore(:,1), fc.fullscore(:,1)];
So I tried to run a hierarchical clustering:
figure(1);
clf;
Z = linkage(vars, 'ward', 'euclidean');
cutvector = Z(~isnan(Z(:,3)),3);
cutoff = median(cutvector(end-10:end,1)); %define the cutoff
dendrogram(Z, 1000, 'ColorThreshold',cutoff); %hierarchical clustering
T1 = cluster(Z, 'cutoff', cutoff, 'Criterion', 'distance'); %define clusters
set(gca,'xticklabel',[])
title('Both Hemispheres');
So then, when I plot again the scatter plot now using the cluster ID as the color code, I get something like this:
As you can see, the colors are not grouped in the expected clusters, they are more like bands in the beta dimension. The clustering looks well in a certain projection of the plot, but not in all:
Do you have any idea how can I improve my clustering? I have tried linkage using the centroid and the median method with similar results.
Thank you very much!
Sebastian
0 Comments
Accepted Answer
Aditya
on 26 Feb 2024
Hierarchical clustering is a method that seeks to build a hierarchy of clusters based on a chosen distance metric and linkage criterion. However, the resulting clusters may not always align with the expected grouping, especially in higher-dimensional space where certain projections might not clearly show the clusters.
Here's an example of how you might implement some preprocessing and cluster validation in MATLAB:
% Standardize variables
vars_standardized = zscore(vars);
% Perform hierarchical clustering
Z = linkage(vars_standardized, 'ward', 'euclidean');
% Determine the cutoff using the inconsistency coefficient
inconsistency = inconsistent(Z);
cutoff = prctile(inconsistency(:,4), 75); % 75th percentile as an example
% Create dendrogram
figure(1);
clf;
dendrogram(Z, 1000, 'ColorThreshold',cutoff);
title('Both Hemispheres');
% Cluster assignment
T1 = cluster(Z, 'cutoff', cutoff, 'Criterion', 'distance');
% Silhouette analysis
figure(2);
silhouette(vars_standardized, T1, 'Euclidean');
title('Silhouette Analysis');
% Multi-Dimensional Scaling for visualization
distMatrix = pdist(vars_standardized);
[Y, stress] = mdscale(distMatrix, 2);
figure(3);
gscatter(Y(:,1), Y(:,2), T1);
title('MDS Plot of Clusters');
Remember that clustering is exploratory in nature, and there is no one-size-fits-all approach. It's often a good idea to combine domain knowledge with various clustering techniques to find the most meaningful groupings for your data.
0 Comments
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!