Using Parfor when calling thousands of .mat files

5 views (last 30 days)
Hi,
I am creating a script that will look into a folder that houses a series of folders and then sub folders that decribes a test run. I need to be able to grab a specific set of sub folders that contain an 'identifier' in the folder name, in this case it is '_15degrees'. I have achieved this. However, with the script I have created it is taking absurdly long to run through a for loop which directs matlab to the necessary files to load the .mat files and save the neccesary information to then be plotted against each other.
I am trying to set up a Parfor loop so this can (presumably) run a lot faster, however Matlab returns an error saying the way i am using variables Ly, Ld, and t are incombatible.
If direction can be provided to make the parfor work or adivce on optimising the way I am loading the .mat miles it would be greatly appreciated. Futhermore, advice on how to make sure the legend only reports once per folder would be fantastic. I chose to do it in a cell for the legend as I beleived that would be the correct way to go about it.
Happy to answer any questions.
close all, clc, clear all
flocation = "D:\Git Repo\FYP\Results\400x400 Chanaging Topography\";
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% FILE LOCATION %%%%%
topLevelFolder = flocation;
dinfo = dir(flocation);
dirFlags = [dinfo.isdir];
folders = find(dirFlags > 0);
foldernames = fullfile({dinfo.folder}, {dinfo.name});
identifier = '_15degrees';
counter =0;
for folderidx = 3:length(folders)
location = sprintf("%s%s",flocation,dinfo(folderidx).name);
fileinfo = dir(location);
filenames = fullfile({fileinfo.folder}, {fileinfo.name});
Ly = cell(1,length(folders)-2); %dispersion
Ld = cell(1,length(folders)-2); %length
t = cell(1,length(folders)-2); %time
lgd = cell(1,length(folders)-2); %legend
for j = 3:length(filenames)
if contains(filenames(j), identifier)
counter = counter +1;
end
end
for j = 1:(counter)
Ly{1,j} = cell(1,counter);
Ld{1,j} = cell(1,counter);
t{1,j} = cell(1,counter);
lgd{1,j} = cell(1,counter);
end
counter =0;
for i = 1:length(filenames)
if contains(filenames(i), identifier)
counter = counter +1;
folderdata = dir(char(filenames(i)));
lgd{1,folderidx-2}{1,counter} = folderdata(i).name;
Ly{1,folderidx-2}{1,counter} = cell(1,length(folderdata)-2);
Ld{1,folderidx-2}{1,counter} = cell(1,length(folderdata)-2);
t{1,folderidx-2}{1,counter} = cell(1,length(folderdata)-2);
%% here is where when i make the following for loop a Parfor loop i get the error messages
for fileidx = 3:length(folderdata) % LOADS ALL THE .MAT FILES IN THE FOLDER
filedata = load(sprintf("%s\\%s",(folderdata(fileidx).folder),(folderdata(fileidx).name)));
Ly{1,folderidx-2}{1,counter}{1,fileidx-2} = filedata.Ly;
Ld{1,folderidx-2}{1,counter}{1,fileidx-2} = filedata.Ld;
t{1,folderidx-2}{1,counter}{1,fileidx-2} = filedata.t;
end
end
end
end
%% plotting
figure
plot(cell2mat(t),cell2mat(Ly));
title(sprintf('Listers Tranverse Dispersion on the %s topology',filedata.topography))
subtitle(sprintf('For an incline angle of %d degrees & an initial source fluid thickness of %1.0fm', filedata.angle,filedata.flow_thickness))
legend('Listers Transverse Dispersion', 'Lister Flow Length')
ylabel('Transverse Flow Dispersion (m)')
xlabel('Time (s)')
  2 Comments
Stephen23
Stephen23 on 25 Apr 2022
Edited: Stephen23 on 25 Apr 2022
Note that this line:
for folderidx = 3:length(folders)
is a buggy/fragile attempt to remove the dot-directory names. The robust approach is to use ISMEMBER or SETDIFF.
Your code would be simpler and more efficient if you
  • included the "identifier" in the DIR search string (and got rid of all of those CONTAINS+IFs).
  • used REPMAT instead of the loops for preallocation.
  • used FULLFILE consistently instead of SPRINTF/concatenation.
  • simplify the nested nested nested cell arrays.
Johnny Dessoulavy
Johnny Dessoulavy on 25 Apr 2022
Ok! i will update my local code to reflect that. I didnt know about those functions. Thank you

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 25 Apr 2022
Edited: Stephen23 on 25 Apr 2022
Your code is overly-complex: you need to let DIR to do more of your work for you.
P = "D:\Git Repo\FYP\Results\400x400 Chanaging Topography";
S = dir(fullfile(P,'*_15degrees*','*.mat'));
for k = 1:numel(S)
F = fullfile(S(k).folder,S(k).name);
D = load(F);
S(k).Ly = D.Ly;
S(k).Ld = D.Ld;
S(k).t = D.t;
end
All of the imported data are in S. You can access them easily using indexing, e.g. the 2nd file:
S(2).Ly
S(2).name
S(2).folder
  6 Comments
Johnny Dessoulavy
Johnny Dessoulavy on 2 May 2022
I figured it out by sorting the data to its respective "time" field by using:
parfor i = 1:length(sims)
[~,index] = sortrows([sims{1, i}.time].');
sims{1, i} = sims{1, i}(index);
end
where sims is simply C from your above explanation.
Now that I got the import working I tired using the full data set (importing additional fields), where there are several fields each that have a value that is a 400x400 matrix. I am now running out of memory. I am getting
"Error using parallel.internal.pool.deserialize
Out of Memory during deserialization"
do you know a way that I can get around this? I will need a structure of:
name | time | Ly | Ld| h | umx | umy | X | Y
where the final 5 fields are large matrices (400x 400)
Stephen23
Stephen23 on 3 May 2022
"I figured it out by sorting the data to its respective "time" field..."
That is not robust because when files are copied or modified there is absolutely no guarantee that it will be in the same order as some numeric values represented in the filenames. Best avoided.
" I am now running out of memory."
How many files are there?
If you do not have sufficiient memory then you have two basic approaches:

Sign in to comment.

More Answers (0)

Categories

Find more on Environment and Settings in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!