Clear Filters
Clear Filters

Question on saving and reading .mat files

2 views (last 30 days)
Kai Wu
Kai Wu on 5 Nov 2016
Commented: Walter Roberson on 8 Nov 2016
I'm currently working on dynamic programming algorithm but encountered a very interesting and wired problem. To say my algorithm has three steps, and the first step creates N .mat files and the second step reads the N .mat files one by one and creates another N files in the same directory. If N is less than 10000, the running time of my second step is affordable. It costs about 10 seconds to save a new file in my second step. However, if N is 60000. My second step goes to crazy, it may take several minutes to save a new file into the existed folder.
My observation is to add a new file into 10k files is ok, but to add a new file into 60k files is impossible. I checked my memory of my computer, it only takes 50% of the memory and the disk drive has enough space. Each of my first step .mat files is about 3M and each of my second .mat files is about 300k. Is there anybody can help me to explain this problem?
Thanks,

Answers (1)

Walter Roberson
Walter Roberson on 5 Nov 2016
Have you experimented with creating folders for each group of (say) 10000, like
per_folder = 10000;
for K = 1 : N
...
this_folder = sprintf('subset_%05d', floor(K/per_folder));
if ~exist(this_folder, 'dir'); mkdir(this_folder); end
this_filename = sprintf('output_%08d.mat', K)
mat_filename = fullfile(this_folder, this_filename);
save(mat_filename, 'Variables_to_save');
end
That is, if the problem has to do with the number of files in the folder, then write fewer files into any one folder. You can merge the folders afterwards.
  2 Comments
Kai Wu
Kai Wu on 8 Nov 2016
Dear Walter,
Thanks so much for your answer. I test this way and it will cost me more time in the first step. I really curious on why that happened. Do you think it may because of the virtual memory or other reasons that I can set a test.
Thanks so much for your time!
Kai
Walter Roberson
Walter Roberson on 8 Nov 2016
Sorry I am confused about which version turned out to be faster, and how much of a difference it was?
If my test code turned out to be slightly slower then I would suggest reducing to 5000 per folder.
It probably is not virtual memory, but it might be file system limitations. But there is a possibility that you have a memory leak; check to see if the used memory keeps going up as the program runs.

Sign in to comment.

Categories

Find more on Manage Products in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!