Find the K most orthogonal vectors in a set of vectors

Question

Peter Cook on 19 May 2016

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/285024-find-the-k-most-orthogonal-vectors-in-a-set-of-vectors

Commented: Peter Cook on 8 Jun 2016

Hello All,

The context of this particular search is a step in tuning a spectral clustering routine a la Ng et al 2002. The purpose of this search is to give a initialization point for k-means clustering in higher dimensional space. In particular, well separated data ought to sit in K tight clusters on the surface of a hypersphere, so the purpose of this search is to find the locations of these cluster centroids with which to initialize k-means clustering of the same data.

I have a data matrix "Y" with O(10^4) rows and O(10^2-10^3) columns (the columns of this matrix are the [transformed and normalized a couple times] K largest eigenvectors of the affinity matrix).

Ng et al suggest "Briefly, we let the first cluster centroid be a randomly chosen row of Y, and then repeatedly choose as the next centroid the row of Y that is closest to being 90 degrees from all the centroids already picked." I translated this mathspeak to mean I need to take a bunch of dot products and look for values close to zero. This was quoted as computationally cheap (perhaps it is for clustering fewer points into say O(10^1) clusters or perhaps they meant it in the sense that it requires fewer iterations of k-means clustering once initialized), but my CPU is dragging ass at it.

So far I've tried 2 approaches: Approach #1 - Compute everything then search

% "cheap" initialization of k-means
dotProductY = zeros(length(Y)); %preallocate to make the parser turn green
% compute dot product of every row with every other row first
for k = 1:length(Y)
  dotProductY(:,k) = sum(bsxfun(@times,Y(k,:),Y),2);
end
dotProductY(logical(eye(length(Y)))) = nan; %exclude dot product of row with self
centroidIdx = randi(length(Y)); %initialize on a random row of Y
dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
dotProductY = abs(dotProductY); %use the absolute value because looking for closer to zero
for k = 2:K
  [~,im] = min(sum(dotProductY(:,centroidIdx),2)); %find next best centroid
  centroidIdx(k) = im; %reassign
  dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
end

Approach #2 - Simultaneous computation and search

    %try a cheaper one?
    centroidIdx = randi(length(Y)); %initialize on a random row of Y
    for k = 1:K-1
        dotProductY(:,k) = sum(bsxfun(@times,Y(centroidIdx(k),:),Y),2); %compute inner product 
        dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
        [~,im] = min(sum(abs(dotProductY),2)); %find next best centroid
        centroidIdx(k+1) = im; %reassign
    end

Neither of these approaches seems cheap to me. Anyone else take a stab at this before? Any suggestions?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Matt J on 19 May 2016

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/285024-find-the-k-most-orthogonal-vectors-in-a-set-of-vectors#answer_222786

Edited: Matt J on 19 May 2016

Open in MATLAB Online

Your computation of dotProductY could be more efficient. The most vectorized may of computing it, I believe is

dotProductY=abs(Y*Y.');

I expect that would have been the main bottleneck.

2 Comments
Show NoneHide None

Matt J on 19 May 2016

Edited: Matt J on 22 May 2016

Open in MATLAB Online

This part

for k = 2:K
    [~,im] = min(sum(dotProductY(:,centroidIdx),2)); %find next best centroid
    centroidIdx(k) = im; %reassign
    dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
  end

also looks like it could be incrementalized as follows

im=randi(size(Y,1));
centroidIdx(1)=im;
temp=dotProductY(:,im);
   for k = 2:K
      [~,im] = min(temp); %find next best centroid
      centroidIdx(k) = im; %reassign
      temp=temp+dotProductY(:,im); %update temp
        temp(im) = inf; %dont pick the same row twice
    end

Peter Cook on 8 Jun 2016

Thanks for the help, I can't believe I had that boneheaded dotProductY computation in there. The algorithm runtime is still quite slow, but that, I am accepting, is to be expected for most clustering algorithms with this amount of data.

Sign in to comment.

Find the K most orthogonal vectors in a set of vectors

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Find the K most orthogonal vectors in a set of vectors

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None