How to use kmeans function on data stored by datastore function?

3 views (last 30 days)
I'm trying to cluster big data using kmeans, i found a code that can do something similar here you are
Mu = bsxfun(@times,ones(20,30),(1:20)'); % Gaussian mixture mean
rn30 = randn(30,30);
Sigma = rn30'*rn30; % Symmetric and positive-definite covariance
Mdl = gmdistribution(Mu,Sigma);
rng(1); % For reproducibility
X = random(Mdl,10000);
pool = parpool; % Invokes workers
stream = RandStream('mlfg6331_64'); % Random number stream
options = statset('UseParallel',1,'UseSubstreams',1,...
'Streams',stream);
tic; % Start stopwatch timer
[idx,C,sumd,D] = kmeans(X,20,'Options',options,'MaxIter',10000,...
'Display','final','Replicates',10);
toc % Terminate stopwatch timer
But as you can see, X is double.
My problem is that i have a file named HIS.csv and i used the datastore function to store it as follows
ds = datastore('HIS_all.csv', 'DatastoreType', 'tabulartext','TreatAsMissing', 'NA');
when i tried
[idx,C,sumd,D] = kmeans(ds,20,'Options',options,'MaxIter',10000, 'Display','final','Replicates',10);
i get the following error
Undefined function 'isnan' for input arguments of type 'matlab.io.datastore.TabularTextDatastore'.
Error in kmeans (line 158)
wasnan = any(isnan(X),2);
Any suggestions?

Answers (1)

Josh Meyer
Josh Meyer on 15 Jul 2017
Edited: Josh Meyer on 17 Jul 2017
Datastore is just a framework for loading small chunks of the data at a time, so you can't call generic functions directly on the datastore. Instead try converting the datastore into a tall array first:
T = tall(ds);
The kmeans function supports tall arrays, so once the data is in this format you can use the function. Note that there are some limitations to using kmeans on a tall array, so some of the NV pairs you specified might not work. The limitations are outlined here:

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!