Cluster uses only one node, even though 5 nodes Running (parfor)

Question

Science Machine on 4 Oct 2022

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1817550-cluster-uses-only-one-node-even-though-5-nodes-running-parfor

Commented: Raymond Norris on 17 Oct 2022

I am running on our cluster an .m file which contains parfor loops.

Using a cluster config file (below), I request 5 nodes, each nodes has 48 procs, which I open with parpool(5*48) workers.

I am able to logon to each node individually and monitor its use with htop.

I see that on each node indeed matlab is running. But it seems that only 1 node is being used by looking at ram usage. If I increase the parfor loop, the process crashes.

I was looking at other posts, that mention setting a parameter LASTN = maxNumCompThreads('automatic'). This is equal to 30 before setting anything. I have tried setting this equal to the total number of workers, eg numProcs*numNodes=48*5.

Here is the node that's being used: 111-150GB being used. When the parfor loop is too big, that node's ram maxes out at 187 and matlab crashes. Also, perhaps noteworthy this is not the head node.

2. Here is an example of an unused node: Note that 34 G is the total ram usage when matlab is idle

This is the config file I run, before I run my job. To confirm stuff before the run I can do eg c.AdditionalProperties.NumNodes and ... .numWorkers and get the expected amount (numnodes=numworkers=5*48)

clear;clc
allNames = parallel.clusterProfiles()
rehash toolbox
configCluster
%/usr/local/MATLAB/R2019
c=parcluster;
c.AdditionalProperties.WallTime = '24:20:0';
c.AdditionalProperties.QueueName = 'CAC48M192_L';
c.AdditionalProperties.AccountName = 'redacted';
nn = 5;
pp = 48;
c.AdditionalProperties.NumNodes = nn;
c.AdditionalProperties.ProcsPerNode = pp; 
c.AdditionalProperties.NumWorkers = pp*nn;
c.saveProfile
%end

1 Comment
Show -1 older commentsHide -1 older comments

Science Machine on 4 Oct 2022

I'm starting to think this has to do with what parfor does, which is calling simply calling omp. Isn't something more needed than omp for multiple nodes, like actual mpi calls? In that case, it seems matlab is incapable of running on more than 1 node, even with parallel server, or does PS somehow extend parfor (and thus omp)?

Sign in to comment.

Sign in to answer this question.

Answer 1

Science Machine on 17 Oct 2022

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1817550-cluster-uses-only-one-node-even-though-5-nodes-running-parfor#answer_1077443

Open in MATLAB Online

I realized, I needed to explicitly pass the number of procs, such as

totalProcCount=2*48 % 2 nodes, each node has 48 processors
parfor (i=1:100, totalProcCount) % explicitly pass the total proc count 
    % do something
end 

This is the only way I'm able to get it to act on more than 1 node. I monitored that with

bb = get(getCurrentTask(),'ID');
display(bb)

2 Comments
Show NoneHide None

Raymond Norris on 17 Oct 2022

Open in MATLAB Online

In this case, totalProcCount tells parfor the number of workers to allocate to the loop. You might do this in the case where 100 workers are running, but you only want 50 to be used. If you don't specify this, all 100 workers maybe used.

Take the following example of running a local pool, which clearly can't run across >1 node.

local = parcluster("local");
pool = local.parpool(4);
tic
parfor (idx = 1:16,2)
    pause(2)
end
toc

Run this and you'll see it takes ~16.5s (instead of the 8+s). You can verify that all the workers are on the same node (which of course they have to be)

spmd, [~,hn] = system('hostname'), end

This is just to explain that totalProcs doesn't control this.

Science Machine on 17 Oct 2022

Edited: Science Machine on 17 Oct 2022

Open in MATLAB Online

I see, 100 workers is less efficient because in general we want the number of workers, to be the number of procs right? But, if I am using lets way 96 processors (allocated on two nodes, each node having 48 procs), then I want 96 workers, right? Without specifying the totalProcs parameter in the parfor, I was able to see that 96 matlab processes were spawned, howeber, only 48 processors were used -- based on htop's CPU and ram output. With using the totProcs parameter in parfor, I was able to see that indeed not only eg 98 processes were spawned, but also that ram and cpu on each node were being used by matlab

I am using:

rehash toolbox
configCluster
%/usr/local/MATLAB/R2019
c=parcluster;
nn = 6;
pp = 48;
c.AdditionalProperties.NumNodes = nn;
c.AdditionalProperties.ProcsPerNode = pp; 
c.AdditionalProperties.NumWorkers = pp*nn;
c.AdditionalProperties.MemUsage= '2Gb';

Sign in to comment.

Answer 2

Raymond Norris on 6 Oct 2022

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1817550-cluster-uses-only-one-node-even-though-5-nodes-running-parfor#answer_1068400

Open in MATLAB Online

Starting a parpool will create very little activity for the workers. It's only once you run a parallel construct (e.g., parfor) will any work be given to the workers.

Increasing the parfor loop shouldn't crash the workers. What are working are you doing? Did you request enough memory (c.AdditionalProperties.MemUsage)? Or do you mean increasing the parallel pool?

I can't tell where you're running MATLAB (your machine, head node, compute node?), but I suspect if the maxNumCompThreads is 30, and there are 48 cores per node, you're not running MATLAB on the compute node. You don't want thread count to be greater than the number of cores on a node (since they won't spawn across nodes). To set the thread count, call

c.NumThreads = 48;

This will then force each worker to run on its own node.

Workers are capable of running more than one node. Post a bit more context and we can figure out what's going on.

Are you running MATLAB on your machine, head node, or compute node?
Are you sbumitting jobs with parpool or batch?
What size pool are you starting (i.e., parpool(X) or batch(.., 'Pool', X);
Can you provide the code you're trying to run?

5 Comments
Show 3 older commentsHide 3 older comments

Raymond Norris on 12 Oct 2022

Open in MATLAB Online

Hi @Science Machine. If you have access to 5 nodes with 48 cores/node then, yes, you could run a parallel pool of 240 workers. Keep in mind that the threads MATLAB will spawn and the processes that the workers start all share the 240 cores you request. Additionally, the threads don't run across nodes, only the workers. Let me give a couple of specific examples.

c = parcluster;
c.NumThreads = 2;
pool = c.parpool(120);

In this case, we start 120 workers (let's say over 5 nodes), with each worker having access to 2 threads (this could be use for fft, eig, etc.). It's important that NumThreads is not greater than 48. For instance, this should fail to submit

c = parcluster;
c.NumThreads = 120;
pool = c.parpool(2);

In all likelihood, MemUsage is per core.

I have a couple of comments regarding your code

I'm assuming this is all one single function, pl. For readability, please auto-indent your code and repost. I can then take a closer look.
You have two parfor-loops. One for reading the circles and the other for the cross-spectral matrix. I don't know how much work (time/memory) is required for either of these, so it's hard to say what the RAM would be like.
It looks like you want to parallelize the nested for-loops c and mm. You can only parallelize one of them.

Before running this on the cluster, I'm assuming you've run this with a local pool and the mm parfor runs successfully?

Science Machine on 16 Oct 2022

Open in MATLAB Online

If I submit a job with pool=c.parpool(N), where N is anything but 1, I get the error,

Error using parallel.Cluster/parpool (line 86)
Parallel pool failed to start with the following error. For more detailed information, validate the profile
'RedactedClusterProfileName' in the Cluster Profile Manager.
Caused by:
    Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 678)
    Failed to start pool.
        Error using parallel.Job/submit (line 351)
        Job submission failed because the integration function 'communicatingSubmitFcn.m' errored.
            Error using communicatingSubmitFcn (line 122)
            Submit failed with the following message:
            sbatch: error: CPU count per node can not be satisfied
            sbatch: error: Batch job submission failed: Requested node configuration is not available

Raymond Norris on 17 Oct 2022

Slurm is telling you that your combonation of cores, threads, and nodes is not available. By the looks of it, I'd say someone on our team helped implement when you're doing (i.e., using configCluster). If you're interested, contact Technical Support (support@mathworks.com) with your contact info. They in turn can get a hold of me. We can work this out offline.

Sign in to comment.

Cluster uses only one node, even though 5 nodes Running (parfor)

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

5 Comments
Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Cluster uses only one node, even though 5 nodes Running (parfor)

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

5 Comments Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

5 Comments
Show 3 older commentsHide 3 older comments