Index to elements not listed in numeric index?

14 views (last 30 days)
Some functions return lists of indices, such as unique and ismember. Let's say I want to index to every element that isn't listed:
A = [1 1 2 2 3 3];
[uA, idxuA] = unique(A); % uA = [1 2 3], idxuA = [1 3 5]
idxDuplicates = true(length(A),1);
idxDuplicates(idxuA) = false;
duplicatesInA = A(idxDuplicates);
But it seems like that isn't very efficient and it would be nice to do something like-
duplicatesInA = A(~idxuA);
I really have two questions for the matlab/coding experts:
(1) Is there an efficient and direct way to use the '~' for a list of indices
(2) Is it worth it to optimize this or should I just deal with the extra few lines of code?
  2 Comments
Rik
Rik on 25 Nov 2018
I don't really consider myself to be an expert, but I'll still add my thoughts on this:
  1. Not that I know of. If it were a logical vector this would indeed be the way to do it, but since linear indices are returned, this might be the only way.
  2. Longer code can actually be more optimal, and more readable. That being said, as long as you are aware where the bottlenecks of your code are, you are miles ahead of many users. Unless your function is doing this millions of times in a loop, I don't think it is worth the extra effort to optimize this particular issue.

Sign in to comment.

Accepted Answer

Andrew Landau
Andrew Landau on 25 Nov 2018
Edited: Andrew Landau on 25 Nov 2018
Thanks everyone. I was looking for the function Matt J suggested - setdiff. However, I did a little profiling to check speeds. Making a true array and setting the indexed elements to false is faster than setdiff by an order of magnitude. So, right you are Rik. Longer code more optimal in this case.
Here's the code I used if you want to test it:
% Set up some random data for testing
% ** the result was robust to changing N and K
N = 10000;
K = 500;
data = randn(N,1);
idx = randperm(N,K);
% if anyone has a better way to preallocate cell arrays please tell me!
P = 1000;
timing = cell(1,2);
timing = cellfun(@(c) zeros(P,1), timing, 'uni', 0);
for p = 1:P
% Fastest by order of magnitude
tic
i1 = true(1,N); % define boolean array
i1(idx) = false; % set all elements from index to false
d11 = data(i1); % keep everything that wasn't in the index
timing{1}(p) = toc;
% Ten times slower
tic
i2 = setdiff(1:N,idx); % Get index of everything from 1:N not in idx
d12 = data(i2); % setdiff(1:N,idx) as argument to data() had comparable timing
timing{2}(p) = toc;
end
avgtime = cellfun(@mean, timing, 'uni', 1);
fprintf('Boolean array: %.2fµs -- Setdiff: %.2fµs -- Ratio: %.2f\n', avgtime(1)*1000000, avgtime(2)*1000000, avgtime(2)/avgtime(1));

More Answers (2)

Matt J
Matt J on 25 Nov 2018
Edited: Matt J on 25 Nov 2018
Your way is probably the most efficient, but an alternative with shorter syntax is,
duplicatesInA = A( setdiff(1:numel(A), idxuA) );
  1 Comment
Andrew Landau
Andrew Landau on 25 Nov 2018
Yeah, the boolean array is 10x faster. Thanks for your input though!

Sign in to comment.


Matt J
Matt J on 25 Nov 2018
Edited: Matt J on 25 Nov 2018
Is it worth it to optimize this or should I just deal with the extra few lines of code?
There's never a reason to deal with extra lines of code if it's an operation that you do often. That's what mfunctions are for.
function Ac = complement(A,idx)
Ic=true(numel(A),1);
Ic(idx)=false;
Ac=A(lc(idx));
end

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!