Parfor with GPUs crashes

5 views (last 30 days)
Anton Baranikov
Anton Baranikov on 27 Feb 2023
Commented: Raymond Norris on 14 Mar 2023
Hello, everobody!
I have a code, that uses GPUs. I would like to use this code in parallel for different settings, i.e. code(setting=1),code(setting=2),code(setting=3) etc. For that I am implementing a parfor loop on a Linux-based high performance cluster (HPC).:
parfor i=1:N
code(setting=i)
end
However, it often crashes, especially when number of workers N is larger (more than 4-5). Typically, the crash is followed by shutting down Matlab with "Bus error" or "Fatal error" in the terminal.
What I do in general is the following. Firstly, I request the necessary resources: N workers with sufficient memory and a gpu per worker. Then I check that I do have a GPU per worker by :
spmd
gpuDeviceCount
end
After that, I initialzie the parpool with:
c=parcluster;
c.NumWorkers=N;
parpool(N)
And then I run my code. Note that an individual job with one GPU (without parfor loop) works perfectly. Also, it almost always work for 2-3 workers in parallel.
  3 Comments
Anton Baranikov
Anton Baranikov on 27 Feb 2023
@Raymond Norris, this is the command I use e.g. for 5 workers:
qsub -I -X -lselect=5:ncpus=4:ngpus=1:mem=20gb,software=matlab
yes I do a local pool. I tried to make PBSProProfile but the outcome was the same.
Raymond Norris
Raymond Norris on 14 Mar 2023
This is requesting 5 chunks, with 4 cores and 1 GPU per chunk. But this doesn't ensure that the 5 chunks are on the same node. I also wonder why you're requesting 5 chunks? If you're running a local pool, you only need 1 chunk. Try the following:
qsub -I -X -l select=1:ncpus=4:ngpus=1:mem=20gb,software=matlab
Then in MATLAB run
pctconfig('preservejobs',true);
setenv('MDCE_DEBUG','true')
local = parcluster("local");
pool = local.parpool(4);
% Run your parallel code
If/when the pool crashes,
local.getDebugLog(local.Jobs(end))

Sign in to comment.

Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!