Parfor reports error which does not exist when running as a for-loop

4 views (last 30 days)
Hi,
To speed up some calculations I am using a parfor-loop. I have to run calculations on many files and I made a simple parfor-loop which runs a function on all these files. When analysis of one file is finished, the results are saved on disk. So, in principle, there is no communication between the different workers.
I have 12 workers (local) and for each worker the first run goes without problems. Then however I always get an error message like this (where this happens exactly can vary, but the type of message is always the same):
Error using parallel_function (line 598)
In an assignment A(:) = B, the number of elements in A and B
must be the same.
Error stack:
myfunc.m at 162
func>(parfor body) at 45
Error in func (line 14)
parfor ii=151:303
When I run the code in a for-loop, there is no error-message.
I have tried several things, but did not find a solution. The problem is that I can't debug this error, because it does not happen when I don't use parfor.
The only thing that works is to reduce the amount of workers. When I choose 6 workers, the error doesn't show up.
My temporary solution was to start 2 Matlab sessions, give them each a pool of 6 workers and divide the work manually between the 2 Matlab sessions.
This solution however does not work. In the 2nd Matlab session, the old error appears again after a short while. I really don't understand what the problem is...
  10 Comments
Matt J
Matt J on 25 Aug 2013
Edited: Matt J on 25 Aug 2013
Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations... It can't have anything to do with this C{ii}.
It's still conceivable that both of the above are true simultaneously, i.e., a difference between parallel and serial modes of computation is causing the C{ii} to be read in corrupted in some cases.
We have to start by examining the C{ii} because we have nowhere else to start, and because ample evidence you provided points to it. The error message you posted says there is a dimension mismatch error. Furthermore, you insisted that this error is occurring in the line
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
That has to mean that prevmax is for some reason either empty or non-scalar some of the time. We must seek ways to trap that condition.
Laurent
Laurent on 28 Aug 2013
Thanks for all your suggestions. In the end I decided to put everything in a try - catch - continue sequence, while storing the ID's of the failed files.
In this way an error does not stop the parfor loop and when it's finished I can just restart it on the failed files. Normally it runs perfectly fine the second time on these failed files.

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 25 Aug 2013
You would get that problem if C{ii-1} was empty, leading to prevmax being empty.
Remember, when you have a parfor loop, the iteration for the any particular value (e.g., #9) might be done at any time relative the iteration for the previous value (#8 in this example), so the assignment to C{8}(C{8}>0) might not have been performed before iteration #9 that calls upon C{8}. Indeed, parfor usually starts from the end. This differs from regular for.
  3 Comments
Walter Roberson
Walter Roberson on 25 Aug 2013
Put in a try/catch that reports the size of prevmax when the problem is triggered
Matt J
Matt J on 25 Aug 2013
In parallel mode, you'll probably need to do
disp(prevmax)
to report prevmax.

Sign in to comment.


Matt J
Matt J on 25 Aug 2013
You might also consider using PMODE to troubleshoot. This will allow you to step through different commands and see their results in the parallel command window.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!