Is it possible to use Arrayfun across rows
Show older comments
Hi,
I currently have a FOR LOOP which works its way through a table with almost 20 million records. It is as expected pretty slow, I want to look into alternatives and I wondered if there is a way to use for arrayfun - or another MATLAB function - across rows which will work with high performance. The example below captures the issue of working across rows:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1)
A.Var3(1) = A.Var1(1)
for i = 2:height(A)
if A.Var1(i) == A.Var1(i-1)
A.Var3(i) = A.Var2(i) .* A.Var2(i-1);
else A.Var3(i) = A.Var2(i);
end
end
Any suggestions will be appreciated.
Kind regards,
William
11 Comments
Rik
on 6 Oct 2020
arrayfun (and cellfun and structfun) will simply hide the loop. They will not speed up your code, but they will actually cause a slowdown due to the extra overhead. If you want to speed this up, you need to go multi-threaded with parfor or find vectorized operations. In your example you can use logical indexing to perform the multiplication all at once.
William Ambrose
on 6 Oct 2020
Walter Roberson
on 6 Oct 2020
Michael Croucher
on 6 Oct 2020
Is it possible to share your real example somehow please?
William Ambrose
on 6 Oct 2020
Rik
on 6 Oct 2020
For this example it isn't too difficult:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
B=A;%make a copy to compare
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var2(n) .* A.Var2(n-1);
else
A.Var3(n) = A.Var2(n);
end
end
L = [false;B.Var1(2:end)==B.Var1(1:(end-1))];
ind = find(L);
B.Var3(ind) = B.Var2(ind) .* B.Var2(ind-1);
B.Var3(~L) = B.Var2(~L);
clc,isequal(A,B)
William Ambrose
on 6 Oct 2020
Edited: William Ambrose
on 6 Oct 2020
Please use the editing tools to format your code as code.
I don't see a way here how you could calculate the branches separately. You might have a performance increase by calculating the runs of true and false in A.Var1 == A.Var1, but the extra overhead might not be worth it.
William Ambrose
on 6 Oct 2020
Rik
on 6 Oct 2020
The longer the runs are, the more efficient calculating the runs will be. So if you have long stretches of true and/or long stretches of false it might be worth looking into. I think the first branch can also be vectorized (e.g. with cumprod), although I haven't tried yet.
William Ambrose
on 6 Oct 2020
Answers (1)
Mohammad Sami
on 6 Oct 2020
Something like this will work.
i = [false; A.Var1(1:end-1) == A.Var1(2:end)];
j = find(i);
A.Var3(i) = A.Var2(j) .* A.Var2(j-1);
A.Var3(~i) = A.Var2(~i);
5 Comments
William Ambrose
on 6 Oct 2020
Rik
on 6 Oct 2020
Mohammad Sami
on 6 Oct 2020
Edited: Mohammad Sami
on 6 Oct 2020
In that case you can use this
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
i = [true; A.Var1(1:end-1) ~= A.Var1(2:end)];
id = cumsum(i);
A.Var3 = grouptransform(A.Var2,id,@cumprod);
The above is assuming that Var1 maynot be in sequence e.g. [1 1 1 2 2 2 4 4 4] e.t.c
If it is always in sequence you can shorten it as follows.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A = grouptransform(A,'Var1',@cumprod,"ReplaceValues",false);
% or explicitly specify which variable to transform if you have other variables
% A = grouptransform(A,'Var1',@cumprod,"Var2","ReplaceValues",false);
William Ambrose
on 8 Oct 2020
Mohammad Sami
on 8 Oct 2020
Hi William,
For the updated problem as stated, grouptransform with cumprod will work just as well.
My testing shows the result is identical to the expected result.
A =
9×3 table
Var1 Var2 fun_Var2
____ ____ ________
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
Ofcourse if the formula changes, for loop may be more generalizable.
Categories
Find more on Performance and Memory in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!