How to store and reuse coefficients in a for loop

5 views (last 30 days)
My Matlab code has a subroutine that repeatedly executes a double for loop. While testing large simulations, this subroutine is called over 10^6 times. I am curious if I can change/reuse part of the subroutine so that I can improve performance.
My original Matlab code (with test data) is shown below.
% Test Data
clc; clear all;
M = 30;
N = 30;
Nx = 32;
f_n_m = rand(Nx,N+1,M+1);
epsilon = 0.3;
delta = 0.4;
SumType = randi(3,1);
f = zeros(Nx,1);
coeff = zeros(N+1,M+1);
% Original Implementation
for j=1:Nx
% Can this double for loop be performed once instead of Nx times?
for r=0:N
for s=0:M
coeff(r+1,s+1) = f_n_m(j,r+1,s+1);
end
end
if(SumType==1)
% My code calls seperate subroutines which need the value of coeff.
% For testing purposes I have put f(j) = test data. These three if
% statements call external functions in my code.
f(j) = coeff(j)*epsilon*delta*N*M;
elseif(SumType==2)
f(j) = 2*coeff(j)*epsilon*delta*N*M;
elseif(SumType==3)
f(j) = 3*coeff(j)*epsilon*delta*N*M;
end
end
My first (bad) idea was to change the above code to the code below.
f2 = zeros(Nx,1);
coeff2 = zeros(N+1,M+1);
% Move the double for loop here
for j=1:Nx
for r=0:N
for s=0:M
coeff2(r+1,s+1) = f_n_m(j,r+1,s+1);
end
end
end
% Then run the original for loop
for j=1:Nx
if(SumType==1)
f2(j) = coeff2(j)*epsilon*delta*N*M;
elseif(SumType==2)
f2(j) = 2*coeff2(j)*epsilon*delta*N*M;
elseif(SumType==3)
f2(j) = 3*coeff2(j)*epsilon*delta*N*M;
end
end
% However, this gives different answers since f(j) runs against a single
% value of j while f2(j) computes all the values of j.
diff = norm(f-f2,inf) % large
I'm curious if there is a more efficient way of writing
for j=1:Nx
% start inner double for loop
for r=0:N
for s=0:M
coeff(r+1,s+1) = f_n_m(j,r+1,s+1);
end
end
% end inner double for loop
if(SumType==1)
f(j) = % An external function involving coeff, epsilon, delta, N, M
elseif(SumType==2)
f2(j) = % A different external function involving coeff, epsilon, delta, N, M
elseif(SumType==3)
f2(j) = % A third external function involving coeff, epsilon, delta, N, M
end
end
so that the inner double for loop is only calculated once instead of Nx times. Is this a limitation of the way the function is written?

Accepted Answer

per isakson
per isakson on 23 Jul 2021
Caveat: I don't fully understand your code and what I say might not be relevant to your real project.
"% Can this double for loop be performed once instead of Nx times?" The short answer is no, because coeff is 2D and f_n_m is 3D. Maybe, you can make coeff 3D. Depends on how you will use coeff.
"f(jj) = coeff(jj)*epsilon*delta*N*M;" What is coeff(jj) supposed to return? This is linear indexing, which returns a scalar.
Proposal: Replace the double for-loop by alt_coeff = f_n_m( :, :, jj );. Notice that I have modified the indexing of f_n_m so that jj is the third index. Run the function cssm1() with profile(). alt_coeff improves speed significantly.
cssm1
ans = "Happy end"
function out = cssm1
% Test Data
% clc; clear all;
M = 30;
N = 30;
Nx = 32;
f_n_m = rand(N+1,M+1,Nx);
epsilon = 0.3;
delta = 0.4;
SumType = randi(3,1);
f = zeros(Nx,1);
coeff = zeros(N+1,M+1);
% Original Implementation
for jj=1:Nx
% Can this double for loop be performed once instead of Nx times?
for r=0:N
for s=0:M
coeff(r+1,s+1) = f_n_m(r+1,s+1,jj);
end
end
alt_coeff = f_n_m( :, :, jj );
if(SumType==1)
% My code calls seperate subroutines which need the value of coeff.
% For testing purposes I have put f(j) = test data. These three if
% statements call external functions in my code.
f(jj) = coeff(jj)*epsilon*delta*N*M;
elseif(SumType==2)
f(jj) = 2*coeff(jj)*epsilon*delta*N*M;
elseif(SumType==3)
f(jj) = 3*coeff(jj)*epsilon*delta*N*M;
end
end
out = "Happy end";
end
  3 Comments
per isakson
per isakson on 24 Jul 2021
Edited: per isakson on 25 Jul 2021
"I don't see why it is necessary to change the indexing." May be, "better" is a more appropriate word than "necessary". I'll try to explain. I start with a little bit of background.
Matlab uses column-major order to store arrays. Since The Mathworks don't want to bother the ordinary user with "low level stuff", the documentation doesn't discuss the importance of order to performance (I fail to find it anyhow). For example, it is not mentioned in Techniques to Improve Performance. (It's crucial in communication with other languages (e.g. C) as described in MATLAB Data.)
Wikipedia provides a good description at Row- and column-major order. I pick an important statement: "[...] modern CPUs process sequential data more efficiently than nonsequential data". (The larger the array the more "sequential" matters.)
My conclusion regarding column-major and performance is: Choose the order of the dimensions of large arrays so that data will be processed sequentially (or even better in big chunks of contiguous data). Below is a really simple example of picking the elements of an array in the order they are stored in memory:
cssm3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
%
function cssm3
vec = 1:24;
array = reshape( vec, 2,3,4 );
for kk = 1:4
for jj = 1:3
for ii = 1:2
fprintf( '%3d', array( ii, jj, kk ) );
end
end
end
fprintf('\n')
end
.
Why alt_coef_orig = f_n_m_orig(jj,:,:); doesn't work. alt_coef_orig is a 3D array with a leading singleton dimension. It works if you index it as a 3D. Or you can remove the singleton dimension with the function squezze(); i.e squezze(alt_coef_orig)==alt_coeff. Problem is, squezze takes time. alt_coeff is 2D because Matlab automagically removes trailing singleton dimensions in no time. (@Simon Chan proposes that you changes the order of the dimensions.)
Some comments on your code
  • f_n_m = rand(Nx,N+1,M+1); The order of the dimensions looks arbitrary to me. Is there a good reason for this order?
  • rand() is not a good choice to make test data in an early stage of code development. That's because it makes it difficult to judge whether the code produces the expected result.
  • why create coeff at all? In your example it's only used to supply a scalar value to the calculation of f(jj). That scalar value should be possible to extract with proper indexing of f_n_m.
Matthew Kehoe
Matthew Kehoe on 24 Jul 2021
Edited: Matthew Kehoe on 25 Jul 2021
Thanks for the helpful information (and comments). To address the comments -
First Comment: All of my Matlab code creates dimensions of order (Nx,N+1,M+1). This is arbitary and has no meaning. I consistently create for loops where Nx (the largest value) is before N and M which are smaller. A local test
M = 20;
N = 20;
Nx = 32;
f_n_m = rand(N+1,M+1,Nx);
f_n_m_2 = rand(Nx,N+1,M+1);
ntests = 10000;
% Method 1
tic
for ii = 1:ntests
for j=1:Nx
j_n_m = f_n_m(:,:,j);
end
end
toc
% Method 2
tic
for ii = 1:ntests
for j=1:Nx
j_n_m_2 = f_n_m_2(j,:,:);
end
end
toc
% Method 1 is faster
shows that Method 1 is faster. I should review all of my code and update the double/triple for loops so that the loop goes through the column last (I think this is how column-major works. I will look further into this tomorrow as it is 1:00am here and I need to sleep).
Second Comment: I agree that rand()/randi() is inefficient. It isn't part of my production code and was only created for putting code in the forum. My real code is thousands of lines long (with multiple functions). I don't create data through rand().
Third Comment: There is no need to create coefficient. Writing
for j=1:Nx
if(SumType==1)
% This is what my production code calls. I didn't show what
% taylor_sum2, pade_sum2, pasde_sum2_safe do as that would be too much
% detail and would make the question "not self-contained."
f(j) = taylorsum_2_coeff(f_n_m(:,:,j),Eps,delta,N,M);
elseif(SumType==2)
f(j) = padesum2(f_n_m(:,:,j),Eps,delta,N,M);
else
f(j) = padesum2_safe(f_n_m(:,:,j),Eps,delta,N,M);
end
end
is faster. I will review (and test locally) changing (Nx,N+1,M+1) to (N+1,M+1,Nx) as my production code has a large amount of for loops that loop through something similar to
for j=1:Nx
for n=1:N
for m=1:M
% Do a bunch of stuff and evenually get to
f_n_m = (j,n+1,m+1);
end
end
end
or
for j=1:Nx
f_n_m = (j,:,:);
end
which may be inefficient as the index j should go in the third column. It is interesting to observe that in this setup
M = 20;
N = 20;
Nx = 32;
Gnm = rand(N+1,M+1,Nx);
Gnm_2 = rand(Nx,N+1,M+1);
ntests = 10000;
% Method 1
tic
for ii = 1:ntests
for n=0:N
for m=0:M
f_n_m = Gnm(n+1,m+1,:);
end
end
end
toc
% Method 2
tic
for ii = 1:ntests
for n=0:N
for m=0:M
f_n_m_2 = Gnm_2(:,n+1,m+1);
end
end
end
toc
% Method 2 is faster even though it is of size (Nx,N+1,M+1)

Sign in to comment.

More Answers (1)

Simon Chan
Simon Chan on 23 Jul 2021
Edited: Simon Chan on 23 Jul 2021
The loop of finding the coefficient can be entirely replaced by:
new_coeff = permute(f_n_m,[2 3 1]);
Noticed that size of new_coeff is 31x31x32 and new_coeff(:,:,1) is equivalent to your coeff when j = 1.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!