how to calculate cosine similarity on a codistributed array?

Question

Frank on 2 Jul 2012

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/42512-how-to-calculate-cosine-similarity-on-a-codistributed-array

I have to calculate the cosine similarity between the rows of an array. It works in a serial execution with pdist, but this is not working when working with codistributed arrays on MDCS. In the parallel setup, 4 compute nodes are used and the (large) array is distributed row-wise over the 4 nodes. I wrote a naive function to calculate the cosine similarity, but it takes for ages, even with a small array it takes (too) long.

This is the test I use currently: I generate a random array

r = floor(rand(100, codistributor('1d', 1)))
q = cosineSimilarityNaive(r)

the code of the function:

function [res] = cosineSimilarityNaive(data)
% get the dimensions
[n_row n_col] = size(data);
% calculate the norm for each row
%
norm_r = sqrt(sum(abs(data).^2,2));
%
for i = 1:n_row
    % 
    for j = i:n_row
        %
        res(i,j) = dot(data(i,:), data(j,:)) / (norm_r(i) * norm_r(j));
        res(j,i) = res(i,j);
    end
end

Currently I have no idea on how to make it run faster, codistributed arrays on different nodes are necessary since the array is so large that is does not fit on 1 compute node. I did some testing on with svd on a distributed array over 4 nodes, and this works fine. I think I am missing something in my code, but currently I have no clue. Any tips?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Jill Reese on 2 Jul 2012

2
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/42512-how-to-calculate-cosine-similarity-on-a-codistributed-array#answer_52333

Open in MATLAB Online

It would be much more efficient to lump all of the multiplications together. Also, when you use for loops with codistributed arrays you need to use the drange command to make sure that the workers only operate on the data that they own. I think rewriting your code a bit will speed things up:

spmd
   % Create the data.  Don't use floor because that will return all zeros.
   r = rand(100,codistributor1d(1));
end
% Find the norm of each row
norm_r = sqrt(sum(abs(r).^2,2));
% get the dimensions
[n_row n_col] = size(data);
% Scale each row by its norm first.  
% Use drange so that each worker operates only on the data it owns/
spmd
   for i=drange(1:n_row)
      r(i,:) = r(i,:)/norm_r(i);
   end
end
% Transpose the data so we can use matrix multiplication to 
% perform the dot products all at once.  A transpose is cheap and 
% incurs no communication.  Of course this is only useful if you have 
% enough memory to store another copy of the local part on each worker.
tr = transpose(r);
res = r*tr;

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

how to calculate cosine similarity on a codistributed array?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

how to calculate cosine similarity on a codistributed array?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments