Advice on Speeding up Loops

3 views (last 30 days)
Paul Safier
Paul Safier on 23 Sep 2021
Commented: Paul Safier on 23 Sep 2021
I need to run the code below with the parameter amnt equal to about 10^8. The loops are very expensive and I have written in a way to time the execution as a function of the parameter, amnt. The figure shows the execution time as a function of the matrix sizes (amnt) and was run with 12 helpers (cores). Via regression it seems when amnt=10^8 it will take over a month to run.
In a nutshell, the purpose of this code is to bin 4 matrices and then find the most common bin for each.
I am asking:
  1. Can anyone recommend a way to vectorize the loop(s) or another way to speed it up?
  2. The use of LocB in the ismember function is being flagged by Matlab as being inefficient when using a parfor loop. I don't understand how to make better use of it, i.e. fix it. Any suggestions?
Thanks!
% Test problem for troubleshooting
rang = 100;
sz = 10^8;
% Some random matrices for this test case.
mat1t = randi(rang,[sz 1]) + 50*rand([sz 1]);
mat2t = randi(rang,[sz 1]) + 40*rand([sz 1]);
mat3t = randi(rang,[sz 1]) + 30*rand([sz 1]);
mat4t = randi(rang,[sz 1]) + 20*rand([sz 1]);
times = [0,0,0,0,0,0,0,0];
for tm1 = 1:length(times) % Loop just used for timing purposes.
amnt = 5^tm1; % Ultimately I want this to be 10^8, i.e. the entire matrices.
%amnt = numel(mat1(:)); % ALL
mat1 = mat1t(1:amnt); mat2 = mat2t(1:amnt); mat3 = mat3t(1:amnt); mat4 = mat4t(1:amnt);
NP = 50;
dl = max([range(mat1),range(mat2),range(mat3),range(mat4)])/NP;
tol = dl/2;
low = min([min(mat1),min(mat2),min(mat3),min(mat4)]);
high = max([max(mat1),max(mat2),max(mat3),max(mat4)]);
gridp = [low:dl:high]';
datap = [mat1 mat2 mat3 mat4];
whatBin = zeros(size(datap));
tic
for k = 1:4 % Loop over the 4 matrices
gridp1 = [gridp zeros(size(gridp))];
datap1 = [datap(:,k) zeros(size(datap(:,k)))];
tolp1 = [tol Inf];
[LIA,LocB] = ismembertol(gridp1,datap1,1,'ByRows',true,'OutputAllIndices',true,'DataScale', tolp1);
parfor kj = 1:length(mat1(:)) % Loop over all matrix elements
for kkj = 1:numel(LIA) % Loop over all bins
if ismember(kj,LocB{kkj}) % The use of LocB here is flagged by Matlab as being inefficient. How to fix?
whatBin(kj,k) = kkj;
break
end
end
end
end % End of loop over all 4 matrices
% Find the unique rows of the whatBin matrix
[u,I,J] = unique(whatBin, 'rows', 'first');
npix = size(u,1);
paretoPix = zeros([npix 1]); % Number of pixel occurances for a vector of metrics.
parfor j = 1:npix
paretoPix(j) = nnz(all(whatBin == u(j,:),2));
end
Results = [u paretoPix]; Results = sortrows(Results,-1*(size(whatBin,2)+1));
times(tm1) = toc;
end % end of times loop
semilogx(5.^(1:length(times)),times,'-o')
xlabel('Matrix Elements')
ylabel('Time (s)')
In this figure, the matrix size is the parameter amnt. Using these values in the curve fit app, the regression equation (2nd order polynomial) indicates that when amnt=10^8 it will take ~month!
  4 Comments
Rik
Rik on 23 Sep 2021
I'm on mobile, so your image is not much help.
Can't you use the third output of histcounts? That would tell you which persons are in which bin.
Paul Safier
Paul Safier on 23 Sep 2021
@Rik and @Matt J That really made a difference. Thanks for the suggestion!
The full matrix will get done in a reasonable amount of time (hours) now. Thanks.

Sign in to comment.

Answers (0)

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!