Can i vectorize my loop

1 view (last 30 days)
Nigel
Nigel on 27 Nov 2013
Commented: Nigel on 29 Nov 2013
I have an array of numbers that i wish to reduce based on whether the previous kept value is within a defined tolerance. I don't believe that the diff() function does what i need and therefore I think i will need something more bespoke. Currently I perform this task using the following for loop but this is very slow when processing large amounts of data.
MinInterval=0.1
KeptData(1,1)=OriginalData(1,1);
n=1;
for p=1:length(OriginalData)-1
difference=OriginalData(p+1,1)-KeptData(n,1);
if difference<=MinInterval
%do nothing
else
n=n+1;
KeptData(n,1)=OriginalData(p+1,1);
end
end
For example,
OriginalData=[1; 2; 3; 3.1; 3.2; 3.3; 3.4; 3.5; 3.6; 4; 5];
assuming a tolerance of 0.1 would become
KeptData=[1; 2; 3; 3.2; 3.4; 3.6; 4; 5]
  7 Comments
Nigel
Nigel on 29 Nov 2013
Sorry for the delay responding, I've been out of the office. The behavior I require is exactly that shown by the example. 3.1 is close enough to 3 to be ignored but 3.2 is has a difference of 0.2 to 3 and therefore i want to keep it. 3.3 is within 0.1 of 3.2 and is ignored but 3.4 has a difference of 0.2 and this rule needs to be applied throughout the dataset
Nigel
Nigel on 29 Nov 2013
I have followed your suggestion Alfonso. Great result. Processing reduced from 850 secs to 1.5. Thank you all

Sign in to comment.

Answers (1)

Jeremy Wurbs
Jeremy Wurbs on 27 Nov 2013
Very interesting question.
First off, running your solution as is I get:
[ 1.0000 2.0000 3.0000 3.1000 3.2000 3.4000 3.5000 3.6000 4.0000 5.0000]
It seems there are precision issues with MinInterval being 0.1 and your data being spaced "0.1" apart (which is impossible to represent in binary; you can see this by trying the statement (3*0.1==0.3) in the command window).
Anyway, adding a small wiggle factor to your MinInterval, I get the code:
% Method 1
tic
MinInterval=0.1+eps; % We will have issues with precision without eps
KeptData(1,1)=OriginalData(1,1);
n=1;
for p=1:length(OriginalData)-1
difference=OriginalData(p+1,1)-KeptData(n,1);
if difference<=MinInterval
%do nothing
else
n=n+1;
KeptData(n,1)=OriginalData(p+1,1);
end
end
t1 = toc;
A simple attempt to vectorize the above only using diff could look like:
% Method 2
tic
KeptData2 = OriginalData([true; diff(OriginalData)>MinInterval]);
t2 = toc;
But the above gives a different behavior around "stair-case" sections in your data (the [... 3.0; 3.1; 3.2; 3.3; ...] part). We can get more similar behavior to the original code with the following:
% Method 3
MinInterval = 0.1; % Shouldn't have any precision issues
tic
NewMinInterval = 2*MinInterval;
edges = min(OriginalData):NewMinInterval:max(OriginalData);
hc = histc(OriginalData,edges);
KeptData3 = edges(logical(hc));
t3 = toc;
Using the following:
KeptData'
KeptData2'
KeptData3
disp(['Using for loop: ' num2str(t1)])
disp(['Using diff alone: ' num2str(t2)])
disp(['Using histc + diff: ' num2str(t3)])
I get the results:
ans =
1.0000 2.0000 3.0000 3.2000 3.4000 3.6000 4.0000 5.0000
ans =
1 2 3 4 5
KeptData3 =
1.0000 2.0000 3.0000 3.2000 3.4000 3.6000 4.0000 5.0000
Using for loop: 0.017651
Using diff alone: 0.0047814
Using histc + diff: 0.0071429
Hopefully that helps. Let me know if that doesn't give the behavior you're looking for.
Cheers
  1 Comment
Nigel
Nigel on 29 Nov 2013
Thank you for your reply. Your code works very well for the example, but when I apply it to my dataset i get a significantly different set of data returned (my code returns 469966 datapoints and yours 34441). If i get chance i will have a look at the returned data to understand what is going on, however, i have preallocated the KeptData array as suggested by Alfonso and this has made an huge difference to the processing time. My loop originally ran for 850 secs and with preallocation it takes 1.5 secs so i will continue with this solution. Your code incidently was much faster at 0.03. Thank you for your help

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!