Parfor overhead: local cores vs. cluster core
1 view (last 30 days)
Show older comments
I have a parfor loop that takes as inputs data from a very large cell array, where all elements of the cell array are eventually used over the loop This process takes about 150 seconds when computed on 20 local cores, but about 500 seconds when computed on 20 clustered cores (I have 100 on the cluster, for which I would like to use for scaling).
Two questions:
1) Is it safe to assume that this time difference is due to network communication latency?
2) If the answer to (1) is yes, then is there any way to send the data in the cell array in a more efficient way ? As a highly simplified example of what I currently have:
for model_it = 1:100
% some operations to create cell1, which is of length k.
parfor ih=1:k
temp=cell1{ih}
out = f(temp); % some operations done to temp
output_store{ih} = out;
end
% some operations that use output_store to create inputs to for cell1 on the next model_it
end
I do not believe parallel.pool.Constant is an option here because the data in cell1 changes every model iterations. Do I have other options for setting up this problem?
Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!