How to use kmeans function on data stored by datastore function?
    3 views (last 30 days)
  
       Show older comments
    
I'm trying to cluster big data using kmeans, i found a code that can do something similar here you are
    Mu = bsxfun(@times,ones(20,30),(1:20)'); % Gaussian mixture mean
    rn30 = randn(30,30);
    Sigma = rn30'*rn30; % Symmetric and positive-definite covariance
    Mdl = gmdistribution(Mu,Sigma);
    rng(1); % For reproducibility
    X = random(Mdl,10000);
   pool = parpool;                      % Invokes workers
   stream = RandStream('mlfg6331_64');  % Random number stream
   options = statset('UseParallel',1,'UseSubstreams',1,...
    'Streams',stream);
   tic; % Start stopwatch timer
   [idx,C,sumd,D] = kmeans(X,20,'Options',options,'MaxIter',10000,...
    'Display','final','Replicates',10);
   toc % Terminate stopwatch timer
But as you can see, X is double.
My problem is that i have a file named HIS.csv and i used the datastore function to store it as follows
    ds = datastore('HIS_all.csv', 'DatastoreType', 'tabulartext','TreatAsMissing', 'NA');
when i tried 
     [idx,C,sumd,D] = kmeans(ds,20,'Options',options,'MaxIter',10000, 'Display','final','Replicates',10);
i get the following error
     Undefined function 'isnan' for input arguments of type 'matlab.io.datastore.TabularTextDatastore'.
     Error in kmeans (line 158)
     wasnan = any(isnan(X),2);
Any suggestions?
0 Comments
Answers (1)
  Josh Meyer
    
 on 15 Jul 2017
        
      Edited: Josh Meyer
    
 on 17 Jul 2017
  
      Datastore is just a framework for loading small chunks of the data at a time, so you can't call generic functions directly on the datastore. Instead try converting the datastore into a tall array first:
   T = tall(ds);
The kmeans function supports tall arrays, so once the data is in this format you can use the function. Note that there are some limitations to using kmeans on a tall array, so some of the NV pairs you specified might not work. The limitations are outlined here:
0 Comments
See Also
Categories
				Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
			
	Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
