Fast method for findings which rows of a matrix are contained in another

2 views (last 30 days)
I have two matrices, A and B. The rows of B are a subset of the rows of A, and I want to find the row indices of the rows in A that correpsond to rows in B. I can do this using the intersect or ismember functions, but these are both much too slow for my purposes. I need to perform this calculation hundreds of thousands of times.
A is typically varies from a matrix, and can go up to having several hundred rows (still with 8 columns). B also has 8 columns, and varies from 1 row up to the number of rows in A. Are there any other methods I can use to find the rows of A that are also in B?
Alternatively, I perform a set of operations to the matrix A to come up with the matrix B. Is there a way I can use this to my advantage to find the rows of the final matrix (which would be B) relative to the original matrix A.

Accepted Answer

Stephen23
Stephen23 on 27 Aug 2020
Edited: Stephen23 on 27 Aug 2020
>> A = [1,2,3,4;5,6,7,8;9,10,11,12]
A =
1 2 3 4
5 6 7 8
9 10 11 12
>> B = [5,6,7,8;9,10,11,12]
B =
5 6 7 8
9 10 11 12
Using permute and test for equality (for versions >=R2016b bsxfun is not required):
>> X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3)
X =
0
1
1
>> find(X) % optional
ans =
2
3

More Answers (1)

Bruno Luong
Bruno Luong on 27 Aug 2020
Edited: Bruno Luong on 27 Aug 2020
Not know how much my code is faster than ISMEMBER (which IMO pretty fast) (EDIT: it's indeed > 3 time faster)
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
tic
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
toc % Elapsed time is 0.000656 seconds.
Do you have something unchanged during the loop? Is there any special properties of A and B (sorted already)? If yes it might exist faster methods using those charracteristics.
  2 Comments
Alexander Holmes
Alexander Holmes on 27 Aug 2020
Edited: Alexander Holmes on 27 Aug 2020
Ismember probably is quite fast, it just wasn't fast enough for me the way I had written my code.
That said, I was being silly and could apply the same operations I used to get B to an array of indices without then having to use ismember at all. Thank you for your answer though!
Bruno Luong
Bruno Luong on 27 Aug 2020
Edited: Bruno Luong on 27 Aug 2020
My code is about 3.8 times faster than ismember for 2000/1000 rows
% Generate random A, B such that rows of B are subset of that of A
A = randi(10,1000,8);
B = randi(10,1000,8);
A = [A;B];
A = A(randperm(end),:);
ntries = 1000;
tic
for k=1:ntries
% iA contains rows in A that matches with B
[~,is] = sortrows([A;B]);
isB = is>size(A,1);
ismatch = ~isB(1:end-1)&isB(2:end);
iA = is(ismatch); % row index of A that match B
%iB = is([false;ismatch])-size(A,1); % matching index
%isequal(A(iA,:),B(iB,:)) % meaning isequal(A(iA,:),B(iB,:)) is TRUE
end
tsort=toc; % Elapsed time is 0.000656 seconds.
tic
for k=1:ntries
tf = ismember(A, B, 'rows');
iA = find(tf);
end
tismember=toc;
tic
for k=1:ntries
X = any(all(bsxfun(@eq,A,permute(B,[3,2,1])),2),3);
end
teq=toc;
fprintf('tsort = %f [s]\n', tsort);
fprintf('tismember = %f [s]\n', tismember);
fprintf('teq = %f [s]\n', teq);
% fprintf('tsort/tismember = %f\n', tsort/tismember);
% fprintf('tismember/tsort = %f\n', tismember/tsort);
Result for 2000/1000 rows
tsort = 0.252900 [s]
tismember = 0.934710 [s]
teq = 12.506212 [s]
Result of 12/6 rows
%A = randi(6,10,8);
%B = randi(6,10,8);
%A = [A;B];
%A = A(randperm(end),:);
tsort = 0.009770 [s]
tismember = 0.126905 [s]
teq = 0.006746 [s]

Sign in to comment.

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!