read and divide HDF5 data into chunks
22 views (last 30 days)
Show older comments
I have 1000 + HDF5 files of 1800 by 3600 matrix. I want to divide the 1800 * 3600 matrices into 4 chunks and store with a ID into an array. I want to repeat this process for 1000 + files. Can someone help how to use H5P.set_chunk OR H5S.select_hyperslab ? I used H5S.select_hyperslab to get only one slab, how should I repeat this process ?
0 Comments
Accepted Answer
Dinesh Iyer
on 12 Oct 2018
Edited: Dinesh Iyer
on 12 Oct 2018
The H5P.set_chunk is used to specify the chunk dimensions of a dataset i.e. what should the size of each chunk when it is is stored in the file. The H5S.select_hyperslab is used to specify the portion of the dataset that you want to read. If you are reading data a portion of the data from a dataset, this is probably what you need to do.
When you say that you want to store each chunk with an ID into an array, do you mean you want to read it into MATLAB or do you want to store it again into another HDF5 file?
For starters, you can use the high-level h5read function to read a portion of the dataset. I am not sure how you want to divide the data into 4 chunks but I am going to assume that each chunk is 1800x900. This does not impact the code.
The code below provides an idea on how you can do this.
fileNames = dir('*.h5');
fileNames = {fileNames.name}'
numChunks = 4;
chunkSize = [1800 900];
for cnt = 1:numel(fileNames)
fileToRead = fileNames{cnt};
s = struct();
for cnt = 1:numChunks
ID = sprintf('%s_Chunk_%02d', matlab.land.makeValidname(fileToRead), cnt);
startLoc = [1 chunkSize(2)*(cnt-1)+1];
s.(ID) = h5read(fileToRead, '/mydataset', startLoc, chunkSize);
end
end
I have not run the above code and so apologies for any errors but it does give an idea of how you can do this.
If you want to use the low-level functions such as H5D.read, you have to loop and update the h5_start input argument to point to the location of the dataset that you want to read.
3 Comments
Dinesh Iyer
on 12 Oct 2018
The code that I have provided should help you get started. It results in 4 chunks because I have taken a chunk size of [1800 900]. You can modify this.
If you want to speed up the operation, you can use PARFOR loops to parallelize the files that you are processing.
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!