MATLAB Answers

Using matfile to partially extra data still loads entire file into memory

26 views (last 30 days)
Justin Brooks
Justin Brooks on 8 Apr 2021
Commented: Justin Brooks on 9 Apr 2021 at 11:27
I have a .mat file saved in the -7.3 format. The content of the file is a large cell array. I am using (,) indexing to retrieve a single row:
obj = matfile('File.mat');
Data = obj.CellArray(RowNum,:);
I've done some investigating on the memory usage and when I run that command, it works, I get the line out of the cell array.
However, it takes the same amount of time as loading the .mat file into the workspace and it uses the same amount of memory. From the MATLAB help files I thought this syntax was designed to only partially load files into memory. Am I doing something incorrect or does the feature not work the way I hoped it would?
Thank you for your help.
dpb on 8 Apr 2021
I guess with cell arrays I'm not terribly surprised that it might have to dereference them to get stuff out -- and so, while what it returns is only what is asked for, it took the same or more effort to produce than just the straight load and then clearing what don't want.
In straight arrays, the direct location can be computed and memcopy() invoked on a buffer and stuff can be streamlined; I've no idea what the actual memory structure of cell arrays is having never poked around in the innards, but there's a whole lot of overhead associated with them and tables add yet another layer on top.

Sign in to comment.

Answers (2)

Matt J
Matt J on 9 Apr 2021
Edited: Matt J on 9 Apr 2021
We can run a test right here. The one below suggests there is some benefit, though perhaps not as much benefit as I would have expected given the size of the data being loaded. You're sure the format of your File.mat is v7.3?
save -v7.3 File CellArray
Elapsed time is 2.161086 seconds.
Elapsed time is 0.587641 seconds.
  1 Comment
Justin Brooks
Justin Brooks on 9 Apr 2021 at 11:27
I ran the following as my test:
fid = fopen('TestPartialLoadCell.mat');
txt = char(fread(fid,[1,40],'*char'));
txt = [txt,0];
txt = txt(1:find(txt==0,1,'first')-1);
That came back with:
'MATLAB 7.3 MAT-file, Platform: PCWIN64, Created on: Thu Apr 8 08:50:48 2021 HDF5 schema 1.00'
I then ran the following code on the paritial cell array ( a reduced number of rows from the true data set, but the same data):
clear all;
obj = matfile('TestPartialLoadCell.mat')
Test = obj.EventCell(46,:);
The output to console was:
1.871945 seconds % Loading whole file into the workspace
80,3 % Size of the cell array
1.824823 seconds % Creating matfile object
3.131165 seconds % Pulling single row
I also did the same test with the entire data file just to see. Output to console was:
280.921421 seconds % Loading whole file into the workspace
5421,3 % Size of the cell array
279.005530 seconds % Creating matfile object
624.437779 seconds % Pulling single row
I was also watching the memory usage and it was all over the place, but each operation peaked at the same amount of memory. Now with creating the object and pulling a single row the memory went back down, but that was the whole point of the exercise to begin with was to try and reduce memory usage and time for an end user.
So I just don't think it's going to work for my application which is fine. There are other ways to accomplish the same task, they just arn't as "fancy".

Sign in to comment.

Stephen Cobeldick
Stephen Cobeldick on 9 Apr 2021
Edited: Stephen Cobeldick on 9 Apr 2021
Transpose the cell array (when it is created), so that you are accessing a contiguous part of the cell array:
Data = obj.CellArray(:,ColNum);
% ^^^^^^^^ first index is colon!




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!