Parfor reports error which does not exist when running as a for-loop
4 views (last 30 days)
Show older comments
Hi,
To speed up some calculations I am using a parfor-loop. I have to run calculations on many files and I made a simple parfor-loop which runs a function on all these files. When analysis of one file is finished, the results are saved on disk. So, in principle, there is no communication between the different workers.
I have 12 workers (local) and for each worker the first run goes without problems. Then however I always get an error message like this (where this happens exactly can vary, but the type of message is always the same):
Error using parallel_function (line 598)
In an assignment A(:) = B, the number of elements in A and B
must be the same.
Error stack:
myfunc.m at 162
func>(parfor body) at 45
Error in func (line 14)
parfor ii=151:303
When I run the code in a for-loop, there is no error-message.
I have tried several things, but did not find a solution. The problem is that I can't debug this error, because it does not happen when I don't use parfor.
The only thing that works is to reduce the amount of workers. When I choose 6 workers, the error doesn't show up.
My temporary solution was to start 2 Matlab sessions, give them each a pool of 6 workers and divide the work manually between the 2 Matlab sessions.
This solution however does not work. In the 2nd Matlab session, the old error appears again after a short while. I really don't understand what the problem is...
10 Comments
Matt J
on 25 Aug 2013
Edited: Matt J
on 25 Aug 2013
Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations... It can't have anything to do with this C{ii}.
It's still conceivable that both of the above are true simultaneously, i.e., a difference between parallel and serial modes of computation is causing the C{ii} to be read in corrupted in some cases.
We have to start by examining the C{ii} because we have nowhere else to start, and because ample evidence you provided points to it. The error message you posted says there is a dimension mismatch error. Furthermore, you insisted that this error is occurring in the line
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
That has to mean that prevmax is for some reason either empty or non-scalar some of the time. We must seek ways to trap that condition.
Answers (2)
Walter Roberson
on 25 Aug 2013
You would get that problem if C{ii-1} was empty, leading to prevmax being empty.
Remember, when you have a parfor loop, the iteration for the any particular value (e.g., #9) might be done at any time relative the iteration for the previous value (#8 in this example), so the assignment to C{8}(C{8}>0) might not have been performed before iteration #9 that calls upon C{8}. Indeed, parfor usually starts from the end. This differs from regular for.
3 Comments
Walter Roberson
on 25 Aug 2013
Put in a try/catch that reports the size of prevmax when the problem is triggered
Matt J
on 25 Aug 2013
In parallel mode, you'll probably need to do
disp(prevmax)
to report prevmax.
Matt J
on 25 Aug 2013
You might also consider using PMODE to troubleshoot. This will allow you to step through different commands and see their results in the parallel command window.
2 Comments
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!