Is it possible to use Arrayfun across rows

Hi,
I currently have a FOR LOOP which works its way through a table with almost 20 million records. It is as expected pretty slow, I want to look into alternatives and I wondered if there is a way to use for arrayfun - or another MATLAB function - across rows which will work with high performance. The example below captures the issue of working across rows:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1)
A.Var3(1) = A.Var1(1)
for i = 2:height(A)
if A.Var1(i) == A.Var1(i-1)
A.Var3(i) = A.Var2(i) .* A.Var2(i-1);
else A.Var3(i) = A.Var2(i);
end
end
Any suggestions will be appreciated.
Kind regards,
William

11 Comments

arrayfun (and cellfun and structfun) will simply hide the loop. They will not speed up your code, but they will actually cause a slowdown due to the extra overhead. If you want to speed this up, you need to go multi-threaded with parfor or find vectorized operations. In your example you can use logical indexing to perform the multiplication all at once.
Hi Rik,
thank you for the input. I am not familiar with the parfor function, or but after a quick check I see one needs to the Parallel Computing Toolbox and if I have understood it correctly some setup.
Maybe the vectorized approach makes more sense to pursue. Given that Var1 in effect is the index, but in my real example I have about 10,000 of them I wondered if you could provide some more guidance to how to proceed since I implemented the FOR LOOP as I didn't know of alternative approaches, and still cannot see it.
thanks and kind regards
william
Is it possible to share your real example somehow please?
Thank you Walter I will take look :-)
To Michael, this is the code:
  • vcode is a table
  • vcode in the vcode table is the index
  • cumExcReturn is the result of multiple operations which is called the Frongello adjustment
for i = 2:height(vcode)
if vcode.vcode(i) == vcode.vcode(i-1)
vcode.cumExcReturn(i) = vcode.cumExcReturn(i-1) .* (2 + vcode.dtrusd_pre(i) + vcode.dtrusd_post(i)) /2 ...
+ vcode.excReturn(i) .* (0 + vcode.indx_pre(i-1) + vcode.indx_post(i-1))./2 ;
else vcode.cumExcReturn(i) = vcode.excReturn(i) ;
end
end
Hope it helps.
For this example it isn't too difficult:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
B=A;%make a copy to compare
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var2(n) .* A.Var2(n-1);
else
A.Var3(n) = A.Var2(n);
end
end
L = [false;B.Var1(2:end)==B.Var1(1:(end-1))];
ind = find(L);
B.Var3(ind) = B.Var2(ind) .* B.Var2(ind-1);
B.Var3(~L) = B.Var2(~L);
clc,isequal(A,B)
I set the original example up incorrectly. Please see the corrected version.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var3(n-1) .* A.Var2(n);
else
A.Var3(n) = A.Var2(n);
end
end
Rik
Rik on 6 Oct 2020
Edited: Rik on 6 Oct 2020
Please use the editing tools to format your code as code.
I don't see a way here how you could calculate the branches separately. You might have a performance increase by calculating the runs of true and false in A.Var1 == A.Var1, but the extra overhead might not be worth it.
thanks Rik, I just worked out how to paste the code and format. Pitty on the code part, looping through the data takes such a long time. The challenge is the dependency on the value in the previous row.
The longer the runs are, the more efficient calculating the runs will be. So if you have long stretches of true and/or long stretches of false it might be worth looking into. I think the first branch can also be vectorized (e.g. with cumprod), although I haven't tried yet.
You are right with respect to the true/false, I will definitly use
i = [false; A.Var1(1:end-1) == A.Var1(2:end)]
to get the logical array, I suspect it will make a big impact.

Sign in to comment.

Answers (1)

Something like this will work.
i = [false; A.Var1(1:end-1) == A.Var1(2:end)];
j = find(i);
A.Var3(i) = A.Var2(j) .* A.Var2(j-1);
A.Var3(~i) = A.Var2(~i);

5 Comments

Thank you very much Mohammed, this is very neat. I see though that you solved the original question when I had misstated the problem. I do however really appreciate this, and it was enligtning.
The challenge is that Var3(n) = Var3(n-1) * Var2(n) .
This will only work for the originally posted problem, not the corrected problem.
In that case you can use this
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
i = [true; A.Var1(1:end-1) ~= A.Var1(2:end)];
id = cumsum(i);
A.Var3 = grouptransform(A.Var2,id,@cumprod);
The above is assuming that Var1 maynot be in sequence e.g. [1 1 1 2 2 2 4 4 4] e.t.c
If it is always in sequence you can shorten it as follows.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A = grouptransform(A,'Var1',@cumprod,"ReplaceValues",false);
% or explicitly specify which variable to transform if you have other variables
% A = grouptransform(A,'Var1',@cumprod,"Var2","ReplaceValues",false);
Hi Mohammed,
I like the use of grouptransform but unfortunately the use of @cumprod as the function isn't correct. What I need is an iterative cascade through the rows where the result in the previous row - in another column - is input into the calculation. Almost like a FOR LOOP solution for the Fibonnaci sequence.
To be sure I will include the corrected version of the code, with some of your suggestions embedded in, and expected outcome:
B = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
B.Var3 = zeros(height(B),1);
i = [false; B.Var1(1:end-1) == B.Var1(2:end)];
j = find(~i);
B.Var3(j) = B.Var2(j);
tic;
for n = 1:height(B)
if i(n) == 1
B.Var3(n) = B.Var3(n-1) .* B.Var2(n);
end
end ; toc
and the result
B =
9×3 table
Var1 Var2 Var3
____ ____ ____
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
I have appreciate all your suggestions so far so if you have input which could solve the above but without the FOR LOOP I would be very greatful for any input.
Hi William,
For the updated problem as stated, grouptransform with cumprod will work just as well.
My testing shows the result is identical to the expected result.
A =
9×3 table
Var1 Var2 fun_Var2
____ ____ ________
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
Ofcourse if the formula changes, for loop may be more generalizable.

Sign in to comment.

Categories

Products

Release

R2019b

Asked:

on 6 Oct 2020

Commented:

on 8 Oct 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!