Speeding up large array operations - vectorization?

11 views (last 30 days)
I am looking to significantly speed up the below code that involves operations on large arrays (specifically - the array called "proj"). The calculation of "proj" itself is fast enough compared to the rest of the code - see the "variables" section. However, operations with "proj" seem to be taking a lot of time.
There are the three versions of the code below, all in their own sections titled "original code, nested for loops", "making proj 1d before for loops", and "vectorization attempt, making proj 4d". All three methods seem to offer similar speeds (very slow). Although there are parts of the code that are not optimal, a time profile shows that the bottlenecks for all three versions of the code have to do with operations on "proj", which take up about >90% of the computation time (calculation of temp and result, lines 38-39, 61-62, 80-81).
I have also attempted to use parfor on some of the outer loops, with minimal improvement. In the future, I will be increasing the size of the variables D, A, B, C, and xyz, so time savings are critical here.
%% variables (this section is fast enough)
D = 5;
A = 360/D;
B = 0.2:0.5:5.2;
C = 100;
xyz = rand(500,3);
qx = cos(deg2rad(D).*(1:A));
qy = sin(deg2rad(D).*(1:A));
Rij = zeros(size(xyz,1),size(xyz,1),3);
for i1 = 1:size(Rij,3)
for i2 = 1:size(Rij,2)
for i3 = 1:size(Rij,1)
Rij(i3,i2,i1) = xyz(i2,i1)-xyz(i3,i1);
end
end
end
proj = zeros(size(xyz,1),size(xyz,1),A);
Ccount = zeros(A,C);
for i1 = 1:A
proj(:,:,i1) = Rij(:,:,1)*qx(i1) + Rij(:,:,2)*qy(i1);
[Ccount(i1,1:C),~,~] = histcounts(proj(:,:,i1),C);
end
%% original code, nested for loops (version 1)
tic;
result = zeros(A,length(B));
for i1 = 1:A
for i2 = 1:length(B)
temp = zeros(size(xyz,1),size(xyz,1));
for i3 = 1:C
temp = temp + (1./size(xyz,1)).*Ccount(i1,i3).*...
cos(B(i2).*proj(:,:,i1));
end
temp = sum(temp,[1 2],'omitnan');
result(i1,i2) = temp;
end
end
timer1 = toc;
disp(['time: ' num2str(round(timer1,1)) ' s']);
%% making proj 1D before for loops (version 2)
tic;
proj1d = proj(:);
result = zeros(A,length(B));
for i1 = 1:A
for i2 = 1:length(B)
temp = zeros(size(xyz,1)*size(xyz,1),1);
for i3 = 1:C
i4 = i1*size(xyz,1).^2;
i5 = i4 - size(xyz,1).^2 + 1;
temp = temp + (1./size(xyz,1)).*Ccount(i1,i3).*...
cos(B(i2).*proj1d(i5:i4));
end
temp = sum(temp,[1 2],'omitnan');
result(i1,i2) = temp;
end
end
timer1 = toc;
disp(['time: ' num2str(round(timer1,1)) ' s']);
%% vectorization attempt, making proj 4d (version 3)
tic;
proj4d = repmat(proj,1,1,1,length(B));
B4d = repmat((permute(B,[1 3 4 2])),size(xyz,1),size(xyz,1),A);
temp = zeros(A,length(B));
for i3 = 1:C
temp = temp + 1./size(xyz,1).*Ccount(:,i3).*...
squeeze(sum(cos(proj4d.*B4d),[1 2],'omitnan'));
end
result = temp;
timer1 = toc;
disp(['time: ' num2str(round(timer1,1)) ' s']);

Answers (1)

Adrián László Szemenyei
Edited: Adrián László Szemenyei on 27 Feb 2024
I have only checked version 1, but when you do loops, only calculate stuff inside the loop if it changes in the loop.
cos(B(i2).*proj(:,:,i1)) does not change in the i3 loop, this does an indexing and calling the cos function which is not cheap enough to dont care thus changing your code to
result = zeros(A,length(B));
onepersize=(1./size(xyz,1));
for i1 = 1:A
for i2 = 1:length(B)
temp = zeros(size(xyz,1),size(xyz,1));
tmp2=cos(B(i2).*proj(:,:,i1));
for i3 = 1:C
temp = temp + onepersize.*Ccount(i1,i3).*tmp2;
end
temp = sum(temp,[1 2],'omitnan');
result(i1,i2) = temp;
end
end
made the run time from 160 seconds to 6 seconds (in my computer).
the i3 loop can be easily vectorized and in the variable tmp2 indexing the variable proj can be moved to the loop i1
result = zeros(A,length(B));
onepersize=(1./size(xyz,1));
for i1 = 1:A
proj_tmp=proj(:,:,i1);
sum_tmp=sum(Ccount(i1,:),2);
for i2 = 1:length(B)
temp=onepersize*sum_tmp*cos(B(i2)*proj_tmp);
temp = sum(temp,[1 2],'omitnan');
result(i1,i2) = temp;
end
end
this changed the runtime from 6 seconds to 1.5 seconds.
now you can try to do the vectorization maybe for i2
  1 Comment
Adrián László Szemenyei
Checked the vectorization attempt, not calculating the squeeze(sum(cos(proj4d.*B4d),[1 2],'omitnan')) part inside the loop does the trick. You can also do something similar vectorization for the rest as I did in my second code.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!