- For very small operations, like A = B + C (B = 2x2, C = 2x2), the compiler would need to make arrangements for parallel code, running the code on separate cores, and then joining the threads. This can have a little more overhead than the serial execution.
- For very large number of threads, the OS would be busy in switching the threads, saving the state of each thread, and then loading the thread of each state again. This would result in heavy time loss.
Pcg and Parallel Computing Toolbox
3 views (last 30 days)
Show older comments
Hi MATLAB community, I know that function pcg is supported in the Parallel Computing Toolbox for use in data parallel computations with distributed arrays, i am using a HPC architecture that it's made of 8 nodes, each blade consists of 2 quadcore processors sharing memory for a total of 8 cores and of 64 cores, in total. I run pcg on 1 core and pcg with distributed arrays on 32 cores.
tic
[y]=pcg(A,b,[],100); %first case
toc
A=distributed(A);
b=distributed(b);
tic
[x,flagCG_1,iter] = pcg(@(x)gather(A*x),b,[],100); %second case on 32 cores
toc
i obtained that Elapsed time is 0.001279 seconds. %first case Elapsed time is 0.316632 seconds. %second case on 32 cores
why the time in second case is greater than the time in first case? what am I doing wrong? I tried with larger size matrices but the time in second case is always greater than the time in first case, i probably don't use pcg correctly for distributed arrays. Thanks for your help
0 Comments
Answers (2)
Gaurav Garg
on 16 Sep 2020
Hey Rosalba,
Using Parallel Computing toolbox for very small problems or for large number of threads can prove to be of no use. That's because -
In your problem, the former argument seems to be the case.
As a workaround, you can either run the code without use of distributed arrays or try using parfor loop (code snippet given below) -
parfor i = 1:iter
[x] = f(k,l);
end
Note that the function f should not have any data dependency among the iterations.
0 Comments
Oli Tissot
on 11 Sep 2020
You can simply do:
dA = distributed(A);
db = distributed(b);
[dx, flag, iter] = pcg(dA, db, [], 100); % dx is a distributed array
x = gather(dx);
But you should be aware that distributed arrays are not designed to be faster than in-memory arrays, they are designed to process arrays that would not fit in your local memory. In practice, operations on distributed arrays are usually slower because of the extra cost for communication, but if the matrix is large enough these operations could not be performed at all otherwise! If you are interested in performance, you may try to use gpuArray -- pcg is supported for gpuArray.
4 Comments
Steven Lord
on 14 Sep 2020
How large a problem are you trying to solve? If you're trying to solve small problems, using parallel code may not save you any time as you've seen because the overhead of setting up the problem in parallel may outweigh any savings you get from running in parallel.
Picture you and three young children go to the grocery store (before COVID-19 of course.) If you split the four items on your shopping list among your group are you really saving time? Maybe, but if you do save time you probably don't save a lot of time. And depending on how mature the children are, you may very well lose time. How about if you split the 100 items on your list into four segments? Then you may save more time than the four item list case.
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!