Is mat.h thread-safe?

10 views (last 30 days)
leptogenesis
leptogenesis on 3 Feb 2013
So here's the problem--I have a directory containing about 10,000 matlab files, and I need to read them all into matlab. The thing is, this directory is mounted over a network. These are small files, and the overall network nandwidth is pretty good. However, the latency is terrible--on the order of 300ms per file. I've tested it, and asynchronous file reading gets me a speed-up of 10x or more in reading files like these. The thing is, matlab dosn't support asynchronous file reads out of the box, for reasons we can only imagine.
So, I'm going to try writing it myself in mex. The matlab docs say that you can get threading to work in a mex file, but that you can't interact with matlab, since the mex interface isn't thread-safe:
However, the interface for file reading in c doesn't actually use mex.h, it uses mat.h:
Furthermore, the examples they provide don't actually show any interaction with matlab--instead they load a .mat file inside an ordinary C main() function. This suggests that I can spawn multiple threads and perform multiple asynchronous loads via the matGetVariable interface, and then collect them back into the main mex workspace at the end. However, I'm not sure whether these calls to matGetVariable are going to step on eachother. (I'm also not terribly familiar with pthreads; I know they CAN do stuff like this, but suggestions on how would be appreciated)
To save everyone time, I've already thought about copying the files to the local filesystem (e.g. via the unix() command) as well as using the matlab parallel toolbox. Neither of these are viable options for various reasons.
  4 Comments
Jan
Jan on 3 Feb 2013
I do not dare to post this as an answer, but it is at least an obvious and successful strategy: Do not store small data in 10'000 file, when you access them over a network with a high latency. Better use one or a few large MAT files instead.
leptogenesis
leptogenesis on 5 Feb 2013
I wish I could do that, but these files are being written in parallel by hundreds of machines. My setup is actually a lot like hadoop/mapreduce: I have thousands of objects of type X, thousands of objects of type Y, all of which are distributed among hundreds of machines. Each object of type X needs to send a small message to every object of type Y. In theory this could all be done over the network directly, but it turns out that's even more of a nightmare in matlab :-/

Sign in to comment.

Accepted Answer

James Tursa
James Tursa on 3 Feb 2013
Edited: James Tursa on 3 Feb 2013
The restriction about not "interacting with MATLAB", as you put it, has to do mainly with the MATLAB memory manager, which is not thread-safe. Basically, anything you do that would cause the MATLAB memory manager to allocate/deallocate memory, or to alter memory that another thread depends on, will likely cause MATLAB to not work properly or crash. The function matGetVariable causes the MATLAB memory manager to allocate memory for a new variable, so it would likely not be thread-safe. Stuff you can get away with inside a thread in a mex routine would be functions like mxGetPr, mxGetPi, mxIsDouble, etc. I.e., functions that simply interrogate MATLAB memory and return a value without allocating/deallocating memory will work. Bottom line is I would be very surprised if your scheme works.
  1 Comment
leptogenesis
leptogenesis on 5 Feb 2013
Ok, fine. Actually trying to compile their example convinces me that yes, using matGetVariable actually requires an entire matlab running in the background. Thanks for the help, James.
Most likely I will just use the parallel computing toolbox--that'll get me at least 12 parallel loads. But if anyone else has a better idea, I'd still like to hear it.

Sign in to comment.

More Answers (0)

Categories

Find more on Enterprise Deployment with MATLAB Production Server in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!