Save 'v7.3' apparently uses no compression - how to turn it on?

Hi,
How do I switch on compression when using 'save -v7.3'? This morning I ran into a problem using 'save'. For the first time I tried saving large variables which then forced me to use 'v7.3'. I thought this might be the end of it but apparently not.
Two problems with v7.3:
  1. It's VERY slow
  2. Files are HUGE compared to normal saving, which might explain why reading/writing is so slow
Externally zipping the produced *.mat file results in a file about the size of the one produced by just using 'save'. For example:
ones(15000);
save('normal.mat');
save('new.mat','-v7.3');
The first file is 778kB in size, the second one a whopping 11.4MB. Zipping the 11.4MB file results in a 228kB file, so theres much room for improvement. While 11MB could be handled, the same happens of course to larger variables. I just edited one of my physics simulations with many multidimensional arrays. Saving via the normal method gives a 500MB file, doing the same using '-v7.3' gives me a 6.3GB file. Zipping this one gives me a 480MB file. This is unacceptable, it can't be how this was intended to be used.
So apparently using 'save -v7.3' just doesn't compress the file. This makes no sense to me. Escpecially if this was specifically implemented to be used to save large variables, why is the compression not on?
How do I switch this on? Going through the documentation, I haven't found an option.

9 Comments

strangely enough, other posts here and elsewhere seem to be complaining about the opposite. V7.3 being slow BECAUSE of compression.
I'm puzzled. Everything I save using v7.3 results in drastically larger files, which is contradicting what others say.
I've obviously looked at that. Where does it state how to turn on compression when using v7.3? If simply zipping the file manually decreases the file size so drastically you can't tell me that it was compressed in the first place.
Compression is as I read it, automagically "on" w/ V7.3; there is no user-settable parameter.
I do note that the above page has the following note of its own...
Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For complex nested cell or structure arrays, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files.
Perhaps this is your problem/issue?
Looks like time to submit a request to official TMW support for clarification of the situation you're running into at www.mathworks.com
"obviously" &nbsp - not to me. Matlab documentation seldom states that features are missing.
You can use h5info and http://www.hdfgroup.org/documentation/ to find out more about the -v7.3 format. The data is compressed on a "chunk-level" using gzip.
The MathWorks seems to think it is "acceptable".
save does not let the user switch compression on and off.
no matter what, the HDF implementation here sucks, it's slow as hell yet not compressed at all (or VERY badly). This machine has a 1.6TB intel PCIe SSD... it can't be the disk throughput!
Adding the '-v7.3' switch to my save command turned a 2GB struct into a 18GB file on the hard drive! I feel something is very wrong with the -v7.3 switch...
I have this exact same problem. When I use the '-v7.3' switch my files get enormously larger. Something is broken... I have yet to find a workaround. Very frustrating.
Yep I have the same problem in r2016a and r2018a, files that are smaller than 0.5GB otherwise turn into a 9.8GB sized filed and it's supposedly compressed...
Had to switch to 2016a because it's what we run on our linux servers and I was out of memory on my laptop.

Sign in to comment.

Answers (2)

Testing compression with ones(15e3) gives unrealistic results.
Instead test with random numbers
m = rand(15e3);
tic,save('normal.mat', 'm');toc
tic,save('new.mat','-v7.3','m');toc
sad = dir('n*.mat');
[sad.bytes]/1e9
or with a matrix that is closer to your real data.

2 Comments

why? Just kidding :)
I do get what you mean but tons of people use or generate data which is not very random, even if the inital data might be. I.e. in some of the analysis I do discard most data and set it to nan or I use masking arrays which are logicals, yet huge with vast connected chunks. This type of data is ideal for compression, yet the v7.3 does nothing with it. I am using sparse matrices too but not knowing what the result might be, sparse matrices are not always ideal and can't replace everything.
In this case storing using this option is not a very good idea.
"yet the v7.3 does nothing with it" &nbsp -v7.3 does indeed compress ones(15000).
>> 15e3^2*8/11.4e6
ans =
157.8947
"arrays which are logical" &nbsp HDF5 doesn't have logical. I guess Matlab may use uint8 to store logical in -v7.3.
"In this case storing using this option is not a very good idea." &nbsp With 2GB+ items (cannot find find a better word), using HDF5 directly might be better. IMO: the Matlab support of HDF5 works well enough.

Sign in to comment.

Products

Asked:

on 6 Sep 2014

Commented:

on 27 Mar 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!