Is matfile read speed affected by how file is constructed?
1 view (last 30 days)
Show older comments
I have a dataset that is 259000x94000x6 of int16 data. Obviously, this is way too big to fit into memory (about 276 GB) or load at once. The main issue is that the data can only be downloaded in 94000 separate chunks that are 259000x6 each, but I need to analyze the data in 259000 separate chunks of 94000x6 arrays.
For the past two weeks I have been trying various big data techniques in Matlab to optimize the way to read all of this data. The fastest way seems to be to turn it into one large file with all the data, which MUST be built by appending 94000 files of 259000x6 arrays (and not the other way around, due to the native structure of the data). However, one very peculiar thing that I have found is that no matter how I build my giant .mat file (e.g. 259000x94000x6 or 94000x259000x6) the read speed using matfile is ALWAYS an order of magnitude quicker when reading it in 259000x6 chunks rather than 94000x6 chunks. I've tried using '-v7.3' with and without compression, I've tried chunking it into smaller files of 3GB each and for-looping through these files, I've tried turning it into a fileDataStore, and nothing seems to allow me to read the data in 94000x6 chunks as fast as I can in 259000x6 chunks! Has anyone else experienced this, know why this is, and/or know a workaround?
Thanks!
1 Comment
Rik
on 2 Nov 2018
Is it possible to either share some of the data or to write some code that generates representative data?
Answers (1)
See Also
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!