Clustering GMM quadratic Matrix

2 views (last 30 days)
Alexander Dreier
Alexander Dreier on 22 Feb 2024
Answered: Aman on 15 Mar 2024
Dear All,
my Code is not working how I want it. Clustering GMM seems not as easy as I thought. Perhabs you can help.
I have a 100x100 matrix with which I want to cluster the data points that a 100x100 represents. I want to use the soft GMM algorithm for this, but I don't know exactly how many clusters will come out in the end. What I can say, however, is that there is a certain relationship between the factors (x and y 2D Matrix). The entries in our matrix are not binary, so the input matrix is square. The result should be the number of clusters and which elements are mapped in this cluster. I would like to have this representation graphically derived. Furthermore, I would like the code to try all possible values for k (i.e. the number of clusters) once and then give me the best result. In other words, the result where the number of clusters is optimal.
Ideally, Matlab marks all data points that belong together in a coloured circle at the end of the plot.
My Code looks like that.
% Erstellen einer 100x100 Matrix mit Zufallszahlen
matrix_size = 100;
data_matrix = rand(matrix_size);
% Verschiedene Anzahlen von Clustern ausprobieren
min_clusters = 1;
max_clusters = 20;
AIC = zeros(1, max_clusters);
BIC = zeros(1, max_clusters);
gmds = cell(1, max_clusters);
options = statset('MaxIter', 100); % Maximale Iterationen für das Clustering
%Neu
for k = min_clusters:max_clusters
gmm = fitgmdist(data_matrix(:), k, 'Options', options);
gmds{k - min_clusters + 1} = gmm; % Änderung des Index für gmds
AIC(k) = gmm.AIC;
BIC(k) = gmm.BIC;
end
% Wählen Sie die Anzahl der Cluster basierend auf AIC oder BIC
[~, num_clusters_AIC] = min(AIC);
[~, num_clusters_BIC] = min(BIC);
disp(['Anzahl der Cluster basierend auf AIC: ', num2str(num_clusters_AIC)]);
disp(['Anzahl der Cluster basierend auf BIC: ', num2str(num_clusters_BIC)]);
% Wählen Sie die Anzahl der Cluster basierend auf einem der Kriterien
num_clusters = num_clusters_AIC; % oder num_clusters_BIC
% GMM mit der ausgewählten Anzahl von Clustern durchführen
gmm = gmds{num_clusters};
% Cluster-Zuweisungen erhalten
cluster_idx = cluster(gmm, data_matrix(:));
disp('mean points are at:');
disp(gmm.mu)
disp('covariances are:');
disp(gmm.Sigma)
disp('Components Proportions are:');
disp(gmm.ComponentProportion)
%% plot the results
x1 = linspace(min(X(:,1))-2, max(X(:,1))+2, 500);
x2 = linspace(min(X(:,2))-2, max(X(:,2))+2, 500);
[x1grid,x2grid] = meshgrid(x1,x2);
X0 = [x1grid(:) x2grid(:)];
mahalDist = mahal(gmfit,X0);
figure;
h1=gscatter(X(:,1),X(:,2),clusterind);
hold on
plot(gmfit.mu(:,1),gmfit.mu(:,2),'kx','LineWidth',2,'MarkerSize',10)
threshold = sqrt(chi2inv(0.99,2));
for m = 1:k
idx = mahalDist(:,m)<=threshold;
Color = h1(m).Color;
plot(X0(idx,1),X0(idx,2),'.','Color',Color,'MarkerSize',1);
end
legend off;
title('GMM fitted')
  3 Comments
Alexander Dreier
Alexander Dreier on 23 Feb 2024
Not sure, Chat GPT wrote the Code for me.
I am unfortunetly not familiar with coding at all. Basically I want to cluster a 100x100 Matrice with GMM and want to have a clear picture of it as an outcome, where I see the clustered Data points out of the Matrice.
Alexander Dreier
Alexander Dreier on 23 Feb 2024
Can you help me with providing the code i need? would be great!!

Sign in to comment.

Answers (1)

Aman
Aman on 15 Mar 2024
Hi Alexander,
As per my understanding, you want to find the ideal number of clusters and cluster the data that you have using the GMM (Gaussian Mixture Model).
The code that you have shared uses AIC and BIC matrices for finding out the ideal number of clusters, and then Mahalanobis distance for finding the distance of each point to the cluster center. The plotting part of the code is incorrect as it considers only the data matrix to have two features, which is incorrect as the data matrix has a hundred features.
Since you want to derive the number of clusters through graphical inference, it would be better to use the elbow curve using AIC and BIC matrices and then find the elbow in the curve to find the optimal number of clusters. You can refer to the below code, which does the same.
% Erstellen einer 100x100 Matrix mit Zufallszahlen
matrix_size = 100;
data_matrix = rand(matrix_size);
% Verschiedene Anzahlen von Clustern ausprobieren
min_clusters = 1;
max_clusters = 20;
AIC = zeros(1, max_clusters);
BIC = zeros(1, max_clusters);
gmds = cell(1, max_clusters);
options = statset('MaxIter', 100); % Maximale Iterationen für das Clustering
for k = min_clusters:max_clusters
gmm = fitgmdist(data_matrix(:), k, 'Options', options); % Notice data_matrix is directly used
gmds{k - min_clusters + 1} = gmm; % Änderung des Index für gmds
AIC(k) = gmm.AIC;
BIC(k) = gmm.BIC;
end
Warning: Failed to converge in 100 iterations for gmdistribution with 2 components
Warning: Failed to converge in 100 iterations for gmdistribution with 3 components
Warning: Failed to converge in 100 iterations for gmdistribution with 4 components
Warning: Failed to converge in 100 iterations for gmdistribution with 5 components
Warning: Failed to converge in 100 iterations for gmdistribution with 6 components
Warning: Failed to converge in 100 iterations for gmdistribution with 7 components
Warning: Failed to converge in 100 iterations for gmdistribution with 8 components
Warning: Failed to converge in 100 iterations for gmdistribution with 9 components
Warning: Failed to converge in 100 iterations for gmdistribution with 10 components
Warning: Failed to converge in 100 iterations for gmdistribution with 11 components
Warning: Failed to converge in 100 iterations for gmdistribution with 12 components
Warning: Failed to converge in 100 iterations for gmdistribution with 13 components
Warning: Failed to converge in 100 iterations for gmdistribution with 14 components
Warning: Failed to converge in 100 iterations for gmdistribution with 15 components
Warning: Failed to converge in 100 iterations for gmdistribution with 16 components
Warning: Failed to converge in 100 iterations for gmdistribution with 17 components
Warning: Failed to converge in 100 iterations for gmdistribution with 18 components
Warning: Failed to converge in 100 iterations for gmdistribution with 19 components
Warning: Failed to converge in 100 iterations for gmdistribution with 20 components
% Plotting the elbow curve for AIC
figure;
plot(min_clusters:max_clusters, AIC, '-o');
xlabel('Number of clusters (k)');
ylabel('AIC');
title('Elbow Curve using AIC');
% Optionally, also plot the elbow curve for BIC in a new figure
figure;
plot(min_clusters:max_clusters, BIC, '-o');
xlabel('Number of clusters (k)');
ylabel('BIC');
title('Elbow Curve using BIC');
I hope this helps!

Categories

Find more on Graph and Network Algorithms in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!