Matlab find unique column-combinations in matrix and respective index
113 views (last 30 days)
Show older comments
I have a large matrix with with multiple rows and a limited (but larger than 1) number of columns containing values between 0 and 9 and would like to find an efficient way to identify unique row-wise combinations and their indices to then build sums (somehwat like a pivot logic). Here is an example of what I am trying to achieve:
a =
1 2 3
2 2 3
3 2 1
1 2 3
3 2 1
uniqueCombs =
1 2 3
2 2 3
3 2 1
numOccurrences =
2
1
2
indizies:
[1;4]
[2]
[3;5]
From matrix a, I want to first identify the unique combinations (row-wise), then count the number occurrences / identify the row-index of the respective combination.
I have achieved this through generating strings with num2str and strcat, but this method appears to be very slow. Along these thoughts I have tried to find a way to form a new unique number through concatenating the values horizontally, but Matlab does not seem to support this (e.g. from [1;2;3] build 123). Sums won't work because they would remove the possibility to identify unique combinations. Any suggestions on how to best achieve this? Thanks!
0 Comments
Accepted Answer
Guillaume
on 22 Mar 2017
More or less the same as Jan's, using accumarray instead of splitapply (I'm still old school!):
A = [ 1 2 3
2 2 3
3 2 1
1 2 3
3 2 1];
[B, ~, ib] = unique(A, 'rows');
numoccurences = accumarray(ib, 1);
indices = accumarray(ib, find(ib), [], @(rows){rows}); %the find(ib) simply generates (1:size(a,1))'
4 Comments
Guillaume
on 23 Mar 2017
Edited: Guillaume
on 23 Mar 2017
I suspect that accumarray will be faster as it is built-in compiled code whereas splitapply is m code, but I haven't conducted any test.
Note: for the indices,
indices = accumarray(ib, (1:numel(ib))', [], @(rows){rows});
is probably slightly faster, just not as concise.
Jan
on 23 Mar 2017
Edited: Jan
on 23 Mar 2017
@Guillaume: I compare this with cellfun: In older versions Matlab contained the C-sources for this Mex function. Here calling a function handle is very expensive, because the Matlab tier has to be called. Therefore the implicitely defined methods provided by strings are much faster: 'length', 'isclass' etc.
Then using a compiled Mex function is not a real benefit, because mexCallMATLAB has some overhead. This might concern accumarray also. I guess that your accumarray approach is faster than the loop, but I know that it looks very cryptic ;-)
But now I can leave the speculations and run a test: With
A = randi([1, 100], 1e5, 3); % Test data
my loop takes 14.75 seconds, your accumarray approach takes 0.44 seconds. The results differ in the order of the indices. So perhaps this is wanted:
[B, iB, iA] = unique(A, 'rows');
indices = accumarray(iA, (1:numel(iA)).', [], @(r){sort(r)});
The result is clear: @Benvaulter, please unaccept my answer and select Guillaume's, and of course use it also to save time and energy.
More Answers (1)
Jan
on 22 Mar 2017
Edited: Jan
on 23 Mar 2017
A = [ 1 2 3; ...
2 2 3; ...
3 2 1; ...
1 2 3; ...
3 2 1];
[B, iB, iA] = unique(A, 'rows');
G = unique(iA);
numOccurrences = splitapply(@sum, iA, G);
I cannot test a method to obtain the indices list as wanted. I assume this works with splitapply also. A simple loop approach at least:
n = length(G);
indices = cell(1, n);
for k = 1:n
indices{k} = find(iA == G(k));
end
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!