File size increase 100 fold when going to MAT-File 7.3

11 views (last 30 days)
I have some software that comes to me as pcode from another team. The save command has not changed between deliveries. It is simply
save(fullfile(pathname,filename),'udata');
After running this command I noticed that the file sizes have grown from 7MB to 722MB. Here are the 2 files in question. Step2_All2.dvset is from last year and dvsetOldudata.dvset is from today.
type 'dvsetOldudata.dvset'
MATLAB 7.3 MAT-file, Platform: GLNXA64, Created on: Thu Oct 3 07:51:37 2019 HDF5 schema 1.00 .
type 'Step2_All2.dvset'
MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Mon Oct 8 10:29:43 2018
If I compare the files using Matlab 2018a I get this (renaming the files .mat):
File Comparison - dvsetOldudata.mat vs. Step2_All2.mat
Left file C:\Users\ml692c\Documents\SLS\Step3\Correlation\dvsetOldudata.mat
Right file C:\Users\ml692c\Documents\SLS\Step3\Correlation\Step2_All2.mat
The variables in these files are equivalent, but the files are not identical. Possible causes of the differences include: file formats, file timestamps, NaN patterns, field ordering, or the order in which the variables are stored. For details see Comparing MAT-Files.Click on a column header to sort the table
Variables in dvsetOldudata.matVariables in Step2_All2.mat
NameSizeClassNameSizeClassDifference SummaryMerge (no undo)
udata1x1structudata1x1structEquivalent
Then again if I load the files into the workspace with load('Step2_All2.dvset', '-mat') and load('dvsetOldudata.dvset','-mat') I get this (after renaming one udata to udataOld to avoid clobbering):
whos
Name Size Bytes Class Attributes
udata 1x1 233821424 struct
udataOld 1x1 233821424 struct
udata is a structure with many nested structures in it. Still this doesn't explain to me why the file size would balloon from 7MB to 722MB. Changing the switch in the save statement makes a huge difference. Using '-v4' doesn't work because of the struct. '-v6' produces a file size of 125MB. '-v7' produces the 7MB file and '-v7.3' produces the massive 722MB file. This is true of 2017a and 2018a.
Is this behavior expected? Is there a way to default to using the smaller of the save options based on the variable being saved?

Accepted Answer

Walter Roberson
Walter Roberson on 3 Oct 2019
Is this behavior expected?
Unfortunately, Yes. HDF5 format is not always especially efficient, especially in representing compound types. Each member of a struct can theoretically be a different data type, so HDF5 can end up creating a separate HDF5 variable for each structure member of each struct array element, or at the very least a separate HDF5 variable for each struct array element.
Sometimes HDF5 format gets notably worse after the built-in compression. R2019a (I think it was) added a -nocompression flag that affects all of the variables (in earlier versions, compress or not was only decided for the very first variable to be saved.)
Is there a way to default to using the smaller of the save options based on the variable being saved?
No :(

More Answers (0)

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!