Parfor overhead: local cores vs. cluster core

1 view (last 30 days)
Brandon
Brandon on 19 May 2021
Commented: Edric Ellis on 20 May 2021
I have a parfor loop that takes as inputs data from a very large cell array, where all elements of the cell array are eventually used over the loop This process takes about 150 seconds when computed on 20 local cores, but about 500 seconds when computed on 20 clustered cores (I have 100 on the cluster, for which I would like to use for scaling).
Two questions:
1) Is it safe to assume that this time difference is due to network communication latency?
2) If the answer to (1) is yes, then is there any way to send the data in the cell array in a more efficient way ? As a highly simplified example of what I currently have:
for model_it = 1:100
% some operations to create cell1, which is of length k.
parfor ih=1:k
temp=cell1{ih}
out = f(temp); % some operations done to temp
output_store{ih} = out;
end
% some operations that use output_store to create inputs to for cell1 on the next model_it
end
I do not believe parallel.pool.Constant is an option here because the data in cell1 changes every model iterations. Do I have other options for setting up this problem?
  1 Comment
Edric Ellis
Edric Ellis on 20 May 2021
Try using ticBytes and tocBytes to see just how much data is being sent. Is there any way you can invert things to run parfor as the outer loop?

Sign in to comment.

Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!