Running Matlab in Parallel on local machine or another suggestion

1 view (last 30 days)
I have a matlab code that reads two sets of text files AAA_1.txt ......AAA_2000.txt and BBB_1.txt ......BBB_2000.txt . The files very smalls (100 lines ) but there are 1000s of them. The loop in the matlab reads the file AAA_1.txt and BBB_1.txt and do some operation on them. Each steps is couple of seconds. But due large number of files the total runtime is really large. I have i7 9th generation core and 64 gigs of memory and also 6 gibs of graphics card. Is there any ways I can use more of my memory and gpu to run the program faster. What will be the best way to run it faster?
  4 Comments
Voss
Voss on 28 Jan 2022
Agree with @Benjamin Thompson: Please share the code and files.
It's likely that your code can re-written to be more efficient. It probably shouldn't take 2 seconds to read a 100-line file; it should take more like 2 milliseconds.
Avik Mahata
Avik Mahata on 30 Jan 2022
Edited: Walter Roberson on 2 Mar 2022
Thanks for the reply. I really appreciate for taking time to reply.
It doesn't really take 2 seconds , its instanteneous. I just meant to say its really really fast. I have no doubt it will be lot faster in C/C++/Python. But I have lot more other things are written in Matlab related to this project that I use daily, and efficiency is not a problem. I have time, taking a little longer is not an issue. So below is the code, when the number of files for S and T is 5 or 10 its really fast, but when I start to increase the number of files to 100s it gets really slow. I have attached 10 text files.
clc
clear all
S = dir('F.*');
for k = 1:numel(S)
S(k).name;
T = dir('Nareplicate.*');
for k = 1:numel(T)
T(k).name;
Nadump = dlmread(T(k).name, ' ', 9, 0);
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = length(Nadump);
L2 = length(Fdump);
for i=1:L1
for j=1:L2
X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
X(i) = X(i)/10;
Y(j,i) = (X(i));
%X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
%X(i) = X(i)/10;
%Y(j,i) = (X(i));
end
end
%S = zeros(L2, L1);
%for j = 1:L2
%S(j,:) = sort(Y(j,:));
%end
%S1= S(:,1);
Y1 = Y';
Y2 = sort(Y1);
S1= sort ((Y2(1,:))');
%Find indices to elements in first column of A that satisfy the equalit
ind1 = S1(:,1) < .28;
ind2 = S1(:,1) < .55;
ind3 = S1(:,1) < .78;
%ind4 = S(:,1) > .79;
%Use the logical indices to index into A to return required sub-matrices
A1 = S1(ind1,:);
A2 = S1(ind2,:);
A3 = S1(ind3,:);
%A4 = S(ind4,:);
Q1(k,:) = [length(A1), (length(A2)-length(A1)), length(A3)-length(A2), 125-length(A3)];
end
end
W= sum(Q1)/(length(T));
W1 = W/125;
bar(diag(W1),'stacked', 'BarWidth', 1)
dlmwrite('Ion-Pair-Stat.txt',Q1,'delimiter','\t','precision',3)

Sign in to comment.

Answers (2)

Voss
Voss on 30 Jan 2022
You've got 2 nested for loops there, the outer one looping over all the F.* files and the inner one looping over all the Nareplicate.* files. So with 5 files of each type, the inner loop will run 25 times, performing the same set of operations 5 times on each pair of files. With 1000 files your inner loop would execute 1 million times (doing the same thing on each pair of files 1000 times), so fixing that should significantly speed things up, I would expect.
The current two-loop implementation with those 5 pairs of files takes ~1s:
clc
clear all
tic
S = dir('F.*');
for k = 1:numel(S)
S(k).name;
T = dir('Nareplicate.*');
for k = 1:numel(T)
T(k).name;
Nadump = dlmread(T(k).name, ' ', 9, 0);
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = length(Nadump);
L2 = length(Fdump);
for i=1:L1
for j=1:L2
X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
X(i) = X(i)/10;
Y(j,i) = (X(i));
%X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
%X(i) = X(i)/10;
%Y(j,i) = (X(i));
end
end
%S = zeros(L2, L1);
%for j = 1:L2
%S(j,:) = sort(Y(j,:));
%end
%S1= S(:,1);
Y1 = Y';
Y2 = sort(Y1);
S1= sort ((Y2(1,:))');
%Find indices to elements in first column of A that satisfy the equalit
ind1 = S1(:,1) < .28;
ind2 = S1(:,1) < .55;
ind3 = S1(:,1) < .78;
%ind4 = S(:,1) > .79;
%Use the logical indices to index into A to return required sub-matrices
A1 = S1(ind1,:);
A2 = S1(ind2,:);
A3 = S1(ind3,:);
%A4 = S(ind4,:);
Q1(k,:) = [length(A1), (length(A2)-length(A1)), length(A3)-length(A2), 125-length(A3)];
end
end
W= sum(Q1)/(length(T));
W1 = W/125;
toc
Elapsed time is 1.040420 seconds.
bar(diag(W1),'stacked', 'BarWidth', 1)
dlmwrite('Ion-Pair-Stat.txt',Q1,'delimiter','\t','precision',3)
Using one loop takes ~0.3s:
clc
clear all
tic
S = dir('F.*');
T = dir('Nareplicate.*');
for k = 1:numel(S)
S(k).name;
T(k).name;
Nadump = dlmread(T(k).name, ' ', 9, 0);
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = length(Nadump);
L2 = length(Fdump);
for i=1:L1
for j=1:L2
X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
X(i) = X(i)/10;
Y(j,i) = (X(i));
%X(i)= sqrt((Fdump(j,3)-Nadump(i,3))^2 + (Fdump(j,4)-Nadump(i,4))^2 + (Fdump(j,5)-Nadump(i,5))^2);
%X(i) = X(i)/10;
%Y(j,i) = (X(i));
end
end
%S = zeros(L2, L1);
%for j = 1:L2
%S(j,:) = sort(Y(j,:));
%end
%S1= S(:,1);
Y1 = Y';
Y2 = sort(Y1);
S1= sort ((Y2(1,:))');
%Find indices to elements in first column of A that satisfy the equalit
ind1 = S1(:,1) < .28;
ind2 = S1(:,1) < .55;
ind3 = S1(:,1) < .78;
%ind4 = S(:,1) > .79;
%Use the logical indices to index into A to return required sub-matrices
A1 = S1(ind1,:);
A2 = S1(ind2,:);
A3 = S1(ind3,:);
%A4 = S(ind4,:);
Q1(k,:) = [length(A1), (length(A2)-length(A1)), length(A3)-length(A2), 125-length(A3)];
end
W= sum(Q1)/(length(T));
W1 = W/125;
toc
Elapsed time is 0.323742 seconds.
bar(diag(W1),'stacked', 'BarWidth', 1)
dlmwrite('Ion-Pair-Stat.txt',Q1,'delimiter','\t','precision',3)
And simplifying the computation cuts the time down again by almost half (if anything is not clear about what I did here, you can put a break point and inspect the variables and convince yourself that it's doing the same thing it used to do, and/or come back here and post a comment and I'll explain it):
clc
clear all
tic
S = dir('F.*');
T = dir('Nareplicate.*');
N = numel(S);
Q1 = zeros(N,4);
for k = 1:N
Nadump = dlmread(T(k).name, ' ', 9, 0).';
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = size(Nadump,2);
L2 = size(Fdump,1);
Y = zeros(L2,L1,3);
for m = [3 4 5]
Y(:,:,m-2) = Fdump(:,m)-Nadump(m,:);
end
S1 = min(sqrt(sum(Y.^2,3))/10,[],2);
N1 = nnz(S1 < 0.28);
N2 = nnz(S1 < 0.55);
N3 = nnz(S1 < 0.78);
Q1(k,:) = [N1 N2-N1 N3-N2 L2-N3];
end
W1 = sum(Q1,1)/N/125;
toc
Elapsed time is 0.181081 seconds.
bar(diag(W1),'stacked', 'BarWidth', 1)
dlmwrite('Ion-Pair-Stat.txt',Q1,'delimiter','\t','precision',3)
If you take any or all of those changes, I bet you will see significant improvement in the speed of your code when you run it on the real (1000s of files) case.
  1 Comment
Avik Mahata
Avik Mahata on 30 Jan 2022
@Benjamin Thanks a lot for taking time to reply on a Sunday morning. Its really fast now. This really helps.

Sign in to comment.


Walter Roberson
Walter Roberson on 28 Jan 2022
∫Are the files already stored on an SSD ?
If not then are they split between two (or more) hard drives? Preferably on different controllers?
Generally speaking, the peak performance for hard drives is typically two reading processes per drive, one (sometimes two) drives per controller.
I have been testing some Samsung BAR+ USB 3.1 Flash Drives (makre sure you get 128 Gb or later version, the smaller ones are slower.) On a very new M1 MacBook, I am reading about 304 megabits/second from them; on my 2013 iMac and an external USB3 hub, I am reading about 386 megabits/second on them. Write speed is only on the order of 60 megabits/second but the read speed is very nice.
A little over a year ago, I connected an external thunderbolt external drive bay to my 2013 iMac; with it and WD Red drives or HG Star drives, I am about to write about 225 megabits/second and read about 285 megabits/second . Reading speed is not as good as those new flash drives... on the other hand I am running it through a Thunderbolt 2 <-> Thunderbolt 3 convertor, and would likely get a significant performance improvement if I were to switch it to my newer iMac .
My Samsung EVO (SSD) drive in the same external enclosure is giving me write speeds about 490 megabits/second and read speeds about 530 megabits/second
... The point being that paying attention to what kind of drives you have and how they are connected can help gain a significant performance improvement.
If you are using a USB 2 drive, then if you have a USB 3 controller, pick up a quality SSD drive, or a quality flash drive. The Samsung BAR+ 128 Gb drives cost me only $C30 each.

Categories

Find more on Python Package Integration in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!