You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
matfile and half inefficient storage
8 views (last 30 days)
Show older comments
Dear MATLAB users,
I have encountered the following inefficient storage problem:
delete('myfile.mat')
handle = matfile('myfile.mat')
handle.X = half(X); % X is big
handle.Y = half(Y); % Y is big
handle.a = a;
handle.b = b;
%%% the size of myfile.mat is 2.4Gb %%%
data = load('myfile');
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
%%% the size of mynewfile1.mat is 1.2Gb %%%
data = load('myfile');
save('mynewfile2.mat', '-struct', 'data')
%%% the size of mynewfile2.mat is 1.2Gb %%%
What could be causing this doubling of storage and how can I avoid it without loading and resaving the file.
Update: the problem does not seem to be caused by the -v7.3 flag. I updated the code above to show this.
Thank you for your help.
30 Comments
Image Analyst
on 24 Jul 2021
Why are you saving it in 7.3 (old) format?
dpb
on 24 Jul 2021
-v7,3 is lastest version; -v7 is the default https://www.mathworks.com/help/matlab/ref/save.html set by TMW in the preferences, apparently for compatibility.
There's a note in the doc under the 'version' named parameter that says--
"Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files."
The blowup is something I've noted in some other Q? over last few months -- there was another conversation just the other day it seems where a file was saved also at something like 2X the size w/ -v7.3 flag but the save command w/o the flag was half the size. Turned out it's in the preferences that the -v7 flag is set by default on initial install.
Seems as though this needs some attention from TMW -- the huge blow-up in size indicates something's not kosher/as intended in the implementation.
Mika
on 24 Jul 2021
Edited: Mika
on 24 Jul 2021
Thanks for your comments. I just tried to resave my file without the -v7.3 flag and got the same result. So it seems that, in least in this case, the problem is not caused by the -v7.3 flag, but possibly by matfile? I will update the question accordingly.
dpb
on 24 Jul 2021
Edited: dpb
on 24 Jul 2021
Quite possible; there's got to be overhead with the matifle object in order to be able to access pieces-parts.
Alternatively, what does half actually do? Does it create some object or what? I don't have any of the TBs that have it so not sure.
Just for checking, what is the settings in Preferences--General-MAT-files? Just so we know for sure what version is used with no explicit flag on the command line.
dpb
on 24 Jul 2021
OK, that the default is -v7 and that both
save('mynewfile1.mat', '-v7.3', '-struct', 'data')
save('mynewfile2.mat', '-struct', 'data')
returned the same size file shows the different file size is not related to the version for whatever data actually is.
Now, what we (at least me, since I can't test) don't know yet is what half actually returns -- the doc above was unclear.
What does
x=half(X);
whos x X
return?
Walter Roberson
on 25 Jul 2021
But I need matfile to save in a parfor loop.
I have not seen any guarantee that two different processes writing to the same matfile() will not interfere with each other.
The file structure designed for simultaneous access is memmapfile() .
Walter Roberson
on 25 Jul 2021
Please explain more about why using parfor requires you to use matfile? As opposed to just saving (possibly using 7.3 if you have big objects)?
Mika
on 25 Jul 2021
Edited: Mika
on 25 Jul 2021
save cannot be called in a parfor loop, https://www.mathworks.com/help/parallel-computing/transparency.html
yes i could write a separate function, but matfile is a more elegant solution if it worked as expected.
so i guess this is my main concern, the unexpected behavior of matfile (maybe in combination with half).
dpb
on 25 Jul 2021
Edited: dpb
on 26 Jul 2021
We've eliminated everything on the size conundrum excepting matfile with the exception that haven't seen the explict result of a save statment for the half object (that I can't test). We got so far as to show it didn't show extra memory used via whos but that doesn't prove save didn't need some extra info to go with it. One presumes not, but it hasn't been proven.
If performance is a Q? as I would presume it would be using parfor anyways, the matfile solution may seem "elegant" in minimizing source code, but I think it would still be a sizable time hit even without the the file size issue as compared to the suggested workaround.
dpb
on 26 Jul 2021
Edited: dpb
on 26 Jul 2021
I had presumed that would be the result, but since I couldn't/can't test, just for the record... :)
I agree, I think it's well worth bringing to their explicit attention (altho I would presume they're already aware of it) as it appears they may need to re-examine just what is causing such a huge blowup and rethink what they're doing going forward.
While they probably won't classify it as a bug since it seems to still work to provide the documented functionality, certainly from a performance and quality of implementation POV it deserves to be flagged.
Walter Roberson
on 26 Jul 2021
Using a small auxillary function to do the save() is what is recommended.
dpb
on 26 Jul 2021
That avoids it, but doesn't resolve that storage requirements blow up remarkably with matfile which seems to me at least to be a problem even if one can get around it in some instances by not using it. If never going to use it, isn't much point in having it in the language... :)
James Tursa
on 27 Jul 2021
For the record, half data types are stored as opaque classdef objects. They are fundamentally different from the other native numeric types such as double and single. Whether this has anything to do with the behavior I don't know.
Walter Roberson
on 27 Jul 2021
Good point, James. The representation of classdef objects can end up being quite different in HDF5 .
Eike Blechschmidt
on 28 Jul 2021
You could do the following and see if there is a difference in how the files are stored as hdf5 files:
h5disp('myfile.mat');
h5disp('mynewfile1.mat');
Mika
on 29 Jul 2021
Thank you, here is a note from mathworks support:
The MAT-file v7.3 is based on HDF5, and HDF5 does not manage free space as effectively as it should. If a dataset in a HDF5 file is frequently added and written, the files can grow unnecessarily large.
A possible work around, as you have demonstrated, is to use the “save” function. This allows MATLAB to compress the data more efficiently, since there is no repetitive write to the MAT file. Please see this documentation link, specifically the tips section for information about efficiently storing to a MAT file in this way:
Mika
on 16 Aug 2021
Edited: Mika
on 16 Aug 2021
Just to follow up, this small function should do the job:
function mysave(filename, varargin)
data = struct(varargin{:});
save(filename, '-v7.3', '-struct', 'data');
end
Here is an example usage:
% run and check result
mysave('myfile.mat', 'a', 1, 'b', 2, 'c', 3);
whos -file myfile.mat
Name Size Bytes Class Attributes
a 1x1 8 double
b 1x1 8 double
c 1x1 8 double
Q490
on 16 Aug 2021
As a side note, and not sure if this is directly related to an answer to your question, a function I've found very useful that can be a good substitute for using matfile is "savefast", written by Tim Holy (https://www.mathworks.com/matlabcentral/profile/authors/1337381) and which can be downloaded at:
For the file sizes you are talking about it saves it extremely quickly and in the smallest possible file size. I highly recommend it.
Pavithra Jayachandran
on 21 Aug 2021
Thank you I will try
cui,xingxing
on 24 Aug 2021
Edited: cui,xingxing
on 24 Aug 2021
Similar questions here ,TMW should provide an effective solution.
S Priyadharshini
on 30 Aug 2021
Myfile and mynewfile2.mat
Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)