netCDF writing large files when created with data writing in finite blocks

3 views (last 30 days)
The following code creates two netCDF containing the same thing (rand(30,N)), and on Windows 10 runing Matlab 2021b, the first netCDF file created is 103 MB while the second file, which contains the same information, written using 100 blocks, is 1200MB, and when I load from both files and test to see if they are difference, they are identical.
If I run these on linux, I get the same result for the first file written in one block of 103 MB, but the other file written in 100 blocks is 474 MB.
Any ideas on this would be most welcome.
fclose all;close all;clear;clc
ncFormat='netcdf4'; %classic 64bit netcdf4_classic netcdf4
fn='test1.nc';
N=1000000;
delete(fn)
nccreate(fn,'test','Dimensions',{'x' 30 'y' N},'Datatype','single','DeflateLevel',9,'Format',ncFormat);
test=rand(30,N);
ncwrite(fn,'test',single(test))
A=dir(fn);fprintf('size = %g MB\n',A(1).bytes/1024^2)
fn='test2.nc';
delete(fn)
nccreate(fn,'test','Dimensions',{'x' 30 'y' N},'Datatype','single','DeflateLevel',9,'Format',ncFormat);
n=N/100;
for i=1:n:N
ncwrite(fn,'test',single(test(:,i:i+n-1)),[1 i])
end
A=dir(fn);fprintf('size = %g MB\n',A(1).bytes/1024^2)
fprintf('files are same if result is zero: %g\n',sum(sum(abs(ncread('test1.nc','test')-ncread('test2.nc','test')))))

Accepted Answer

Gyan Vaibhav
Gyan Vaibhav on 1 Feb 2024
Hello Dave,
I came across your question and it's true what you have stated. However, I tried it on a Linux machine too, and it shows the sizes similar to windows, i.e. around 110 MB and 1200 MB for the respective methods.
  1. Upon a little prodding, I went on to change the "DeflateLevel" to 0, which turns off the compression, and the sizes become identical for both the methods.
  2. With the above insight the reason for the behaviour could be due to writing in blocks. In the second file (test2.nc), you're writing the data in blocks of 100. This could potentially affect the compression efficiency. When you write data in smaller blocks, the compression algorithm has less context to efficiently compress the data compared to writing in one large block. This can result in larger file sizes because the algorithm cannot leverage patterns in the data over larger spans.
PS - However, the size shouldn't have increased from the original, i.e. with 0 DeflateLevel.
Hope it gives some insight.
Thanks

More Answers (0)

Tags

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!