kmeans() in MATLAB

10 views (last 30 days)
Bahareh on 23 May 2011
I have 10 classes and would like to use kmeans command in MATLAB to find the centroid and indices for the 10 classes. when I use [ind c]=kmeans(Data,10) in which 'Data' represents my all data from the 10 classes, I'd like 'c' returns me the centroid of each class in order i.e. the first row of 'c' be the centroid of my first class and so on.
Can you please help me?
my code: clear; clc;
load alif.dat; load ba.dat;load ayn.dat;load gayn.dat;load ha.dat;load jim.dat;load kaf.dat;load lam.dat;load mim.dat;load sin.dat; load alif_t.dat; load ba_t.dat;load ayn_t.dat;load gayn_t.dat;load ha_t.dat;load jim_t.dat;load kaf_t.dat;load lam_t.dat;load mim_t.dat;load sin_t.dat;
alif1=alif; ba1=ba; ayn1=ayn; gayn1=gayn; ha1=ha; kaf1=kaf; lam1=lam; mim1=mim; sin1=sin; jim1=jim; Data=[alif1; ba1; ayn1; gayn1 ;ha1 ;jim1 ;kaf1; lam1; mim1; sin1]; [ind c ]=kmeans(Data,10); % [ind,idx] = sort(ind); % Data = Data(idx,:); % [val,reorder] = max(crosstab(ind,Data)); % % Reorder % c = c(reorder,:);
Data_t{1}=alif_t; Data_t{2}=ba_t; Data_t{3}=ayn_t; Data_t{4}=gayn_t; Data_t{5}=ha_t ; Data_t{6}=jim_t ; Data_t{7}=kaf_t; Data_t{8}=lam_t; Data_t{9}=mim_t; Data_t{10}=sin_t;
for k=1:10 [rd,cd]=size(Data_t{k}); for n=1:rd for m=1:rc d(m)=norm(Data_t{k}(n,:)-c(m,:)); end out_t{k}(n)=find(d==min(d)); end end
for k=1:10 for n=1:10 CF(k,n)=length(find(out_t{k}==n)); end end
u=sum(CF); u=sum(u'); er=(u-trace(CF))/u; er=er*100
But I don't know how to attach the loaded data. the dimension of data is:475x12 error: ??? Error using ==> grp2idx at 20 Grouping variable must be a vector or a character array.
Error in ==> crosstab at 29 [g1,g2] = grp2idx(varargin{j});
Error in ==> Untitled2 at 11 [val,reorder] = max(crosstab(ind,Data));
Bahareh on 23 May 2011
But I want the ith row of 'c' corresponds to the ith cluster. in the words, I want the 'ind' to be like: 1 1 1 1 2 2 2.....10 10
Oleg Komarov
Oleg Komarov on 23 May 2011
Can you please format the code?

Sign in to comment.

Answers (2)

Oleg Komarov
Oleg Komarov on 23 May 2011
[ind c sumd] = kmeans(rand(100,10),3);
c = is a n(3) by m(10) matrix with 3 ten-dimensional clusters. That is the 3 (n) clusters are positioned in a 10 (m) dimensional euclidean space and thus have 10 coordinates each.
sumd = each cluster is ordered by decreasing size of sumd which is "the within-cluster sums of point-to-centroid distances"
ind = is k(100) by 1 index indicating what cluster the actual point belongs to. If the fifth point (out of 100 from the example above) says 2, then it belongs to cluster number 2 which has centroid c(2,:).
The documentation is pretty clear. I recommend to re-read it if still not clear and try the examples proposed.
If you want to sort data according to ind, then:
[ind,idx] = sort(ind);
data = data(idx,:);
  1 Comment
Bahareh on 23 May 2011
Thanks a lot. but, in there any way to sort ind according to data?

Sign in to comment.

Matt Tearle
Matt Tearle on 23 May 2011
As I understand your question, you have a predetermined group numbering and you'd like kmeans to adhere to that. kmeans uses some randomness, so there's no guarantee which cluster will be assigned which number. You have two options, then, that I can think of:
  1. provide initial centroid location estimates
  2. reorder the centroids at the end (ie map your groupings 1:10 to whatever order kmeans gives you)
For the second option, you could do something like this:
% Make some clustered data
X = [randn(100,2)+2*ones(100,2);...
% Make a grouping variable
g = ones(100,1);
g = [g;2*g;3*g];
% Use kmeans to find centroids
[idx,ctrs] = kmeans(X,3);
% Determine the ordering, compared to my grouping
[~,reorder] = max(crosstab(idx,g));
% Reorder
ctrs = ctrs(reorder,:)
Bahareh on 23 May 2011
i did.
Matt Tearle
Matt Tearle on 23 May 2011
No, g is the grouping variable -- the group numbers you want assigned to the data -- not the data itself. My code makes 300 data points (in 2 dimensions). The first hundred are group 1, the second hundred are group 2, the rest group 3. The vector g has those group labels (1, 2, 3). Doing kmeans puts the data (X) into three groups, but there's no guarantee that the first hundred points will be called "group 1" (etc). So the line [~,reorder] = max(crosstab(idx,g)); determines the ordering that maps the groups that kmeans gives (which might be 2 1 3, stored in idx) to the groups that I want (1 2 3, stored in g). It does so by assuming that the points that I call "group 1" will mostly be assigned to one group (eg 2) by kmeans.

Sign in to comment.


No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!