Loading data to/from network drives
52 views (last 30 days)
I have a lot of data in multiple files produced by a simulation that I now want to process in Matlab. I've looked before at the fastest ways to load the data, but I had not considered where I'm loading it from. Is it possible that the bottleneck in processing is that the data is stored on a network drive rather than locally? Similarly, will saving the processed data as a .mat file be slower if I'm saving it to a network drive?
Walter Roberson on 6 Oct 2020
Is it possible that the bottleneck in processing is that the data is stored on a network drive rather than locally?
Unless you are using a USB thumb drive (or a USB 2.0 drive attached to a hub) your hard drive speed is probably 100+ megaBYTES per second.
Network speeds can vary, but network drives are often located on public networks, so effective rates of 100 megaBITS per second are not uncommon.
Also, it is common with network filesystems that there has to be a bunch of back and forth transmission per file and that adds in the network latency several times. There can be a big difference between loading a number of files separately, compared to loading down a .zip of the same files.
Similarly, will saving the processed data as a .mat file be slower if I'm saving it to a network drive?
Yes, for sure. As far as I can tell, MATLAB does not prepare an exact copy of the final .mat file in memory and then write it all out in one go (a process that would involve a few network file setup calls and then just a bunch of data streaming.) Instead, MATLAB appears to write one variable at a time -- a process that can involve going back and updating parts of the file that are already written.
You will almost always get significantly better performance if you can write the file first locally (especially on to an SSD) and then copy the completed file to the remote drive.