Matlab parfor uses fewer cores than the allocated number of cores

50 views (last 30 days)
I'm running a parallel Matlab (R2020a) job on a single node of a remote cluster. Each node of the cluster has 2 processors with 24 cores each, for a total of 48 cores per node. The job contains some sequential code followed by a single parfor loop. I run it using a slurm bash script.
The bash script test.sh is:
#!/bin/bash
#
########## Begin Slurm header ##########
#
# Give job a reasonable name
#SBATCH -J test_1
#
# Request number of nodes and CPU cores per node for job
#SBATCH --nodes=1
# Request number of tasks/process per nodes
# (determines number of workers in processed based parpool)
#SBATCH --tasks-per-node=48
# Estimated wallclock time for job
#SBATCH -t 1-00
#
# Send mail when job begins, aborts and ends
#SBATCH --mail-type=ALL
#
########### End Slurm header ##########
echo "Submit Directory: $SLURM_SUBMIT_DIR"
echo "Working Directory: $PWD"
echo "Running on host $HOSTNAME"
echo "Job id: $SLURM_JOB_ID"
echo "Job name: $SLURM_JOB_NAME"
echo "Number of nodes allocated to job: $SLURM_JOB_NUM_NODES"
echo "Number of cores allocated to job: $SLURM_NPROCS"
echo "Number of requested tasks per node: $SLURM_NTASKS_PER_NODE"
# Load module
module load math/matlab/R2020a
# Create a local working directory on scratch
mkdir -p $SCRATCH/$SLURM_JOB_ID
# Start a Matlab program
matlab -nodisplay -batch test_1 > test_1.out 2>&1
# Cleanup local working directory
rm -rf $SCRATCH/$SLURM_JOB_ID
exit
The Matlab script is:
% Create parallel pool
pc = parcluster('local');
pc.JobStorageLocation = strcat(getenv('SCRATCH'),'/',getenv('SLURM_JOB_ID'));
num_workers = str2double(getenv('SLURM_NPROCS'));
parpool(pc,num_workers);
% Body of the script
% Choose deterministic parameters
free_points = 845000;
pulse_points = 1300000;
dt = 2e-11;
num_freqs = 200;
freqs = linspace(-1,1,200);
rhoi = rand(72);
rhoi = rhoi + rhoi';
rhoi = rhoi/trace(rhoi);
% Iterate over random parameters
num_pars = 5;
res = zeros(num_pars,num_freqs);
for n=1:num_pars
disp('=====');
disp(['N = ',num2str(n)]);
disp('=====');
timer = tic;
% Random parameters
H = rand(size(rhoi));
H = (H + H')/2;
L1 = rand(size(rhoi));
L2 = rand(size(rhoi));
L3 = rand(size(rhoi));
L4 = rand(size(rhoi));
L5 = rand(size(rhoi));
% Equation to solve
ME = @(rhot, t, w) -1i*w*(H*rhot - rhot*H) + (L1*rhot*L1' - (1/2)*rhot*L1'*L1 - (1/2)*L1'*L1*rhot) ...
+ (L2*rhot*L2' - (1/2)*rhot*L2'*L2 - (1/2)*L2'*L2*rhot) ...
+ (L3*rhot*L3' - (1/2)*rhot*L3'*L3 - (1/2)*L3'*L3*rhot) ...
+ (L4*rhot*L4' - (1/2)*rhot*L4'*L4 - (1/2)*L4'*L4*rhot) ...
+ (L5*rhot*L5' - (1/2)*rhot*L5'*L5 - (1/2)*L5'*L5*rhot);
% Solve equation
% IF I CHANGE TO 'for j = 1:1', ALL WORKERS ARE USED!!! MEMORY?
for j = 1:free_points
rhoi = RK4(@(rho, t) ME(rho, t, 0), rhoi, j, dt);
end
t = toc(timer);
disp(['Mid duration ',num2str(t),'s']);
parfor k=1:num_freqs
w = freqs(k);
rhop = rhoi;
for j=1:pulse_points
rhop = RK4(@(rho, t) ME(rho, t, w), rhop, j, dt);
end
for j=1:free_points
rhop = RK4(@(rho, t) ME(rho, t, 0), rhop, j, dt);
end
occ(k) = rhop(1,1);
end
% Store result
res(n,:) = occ;
end
save('res','res');
% Delete the parallel pool
delete(gcp('nocreate'));
% Local functions
function [rho] = RK4(F, rho, k, h)
k1 = F(rho, k*h);
k2 = F(rho+h*k1/2, (k+1/2)*h);
k3 = F(rho+h*k2/2, (k+1/2)*h);
k4 = F(rho+h*k3, (k+1)*h);
rho = rho+(1/6)*h*(k1+2*k2+2*k3+k4);
end
The slurm output is:
#
# SOME PERSONAL INFO HERE...
#
Number of nodes allocated to job: 1
Number of cores allocated to job: 48
Number of requested tasks per node: 48
IMPORTANT: The MATLAB Academic site license is available to employees and
enrolled students of the the universities of (CENSORED).
The license is available for teaching or research only.
Commercial applications are not permitted.
and the Matlab output is:
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 48).
=====
N = 1
=====
Mid duration 3608.9535s
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 12).
#
# REST OF OUTPUT HERE...
#
You see that when the Matlab script starts, a pool of 48 workers is created. But then as the parfor loop finally starts, parpool restarts and the number of workers gets downgraded to 12.
I noticed that this only happens if the size of the loops is sufficiently large, even the non-parfor loops. For instance, if I change the size of the first for loop to 1, then parpool does not restart. So I think it may have to do with memory usage somehow...?
Any idea what is happening and how I can get Matlab to use all 48 cores that were allocated?
EDIT: Another thing I've tried is to remove the `parpool` command and specify the cluster in the `parfor` loop as `parfor (k=1:num_freqs,pc)`. When I do this Matlab uses one fourth of the workers no matter the size of my loop. I'll just try to contact the admins directly...

Accepted Answer

Edric Ellis
Edric Ellis on 28 Jun 2021
I bet your parallel pool is timing-out in between your parfor loops. It then gets auto-created with size 12, as that is the default preference for "preferred number of workers in a parallel pool" (doc). (Personally, I don't much care for that preference, and always set the value to 99999 and let other things control the size of the pool, but in your case you might not be able to if your SLURM workers don't share a MATLAB preferences directory (prefdir) with your client).
I suggest you create your pool of size 48 with an IdleTimeout of Inf, like this:
num_workers = str2double(getenv('SLURM_NPROCS'));
parpool(pc,num_workers,'IdleTimeout',Inf);

More Answers (0)

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!