Parpool Fail 2015a HPC

1 view (last 30 days)
Mohammad Sayyafzadeh
Mohammad Sayyafzadeh on 7 Nov 2015
Commented: Darwin on 17 Oct 2016
Hi,
We have recently purchased a HPC for simulation research. It has 64 cores (4 AMD Opteron) and 256 gb ram. The OS is linux CentOS 6.
We have installed MatLab 2015a 64-bit on the machine.
I am trying to open 64 parallel workers using parpool(64) command, but it gives me error. Even the profile cannot be validated. When I reduce the number to 25 it works. But any number more than 25 I get the error. The error is attached. I will really appriciate it, if you can help me in this regard. It is worth to mention that with the same machine in Windows 10, I can use all 32 cores and 32 hyperthreads (as windows only detect 2 physical CPUs). It means in Windows i can open up to 64 workers.
Stage: SPMD job test (createCommunicatingJob)
Status: Failed
Description:The job errored or did not reach state finished.
Command Line Output:(none)
Error Report:(none)
Debug Log:
LOG FILE OUTPUT:
[14] < M A T L A B (R) >
[14] Copyright 1984-2015 The MathWorks, Inc.
[14] R2015a (8.5.0.197613) 64-bit (glnxa64)
[14] February 12, 2015
[21] < M A T L A B (R) >
[21] Copyright 1984-2015 The MathWorks, Inc.
[21] R2015a (8.5.0.197613) 64-bit (glnxa64)
[21] February 12, 2015
[6] < M A T L A B (R) >
[6] Copyright 1984-2015 The MathWorks, Inc.
[6] R2015a (8.5.0.197613) 64-bit (glnxa64)
[6] February 12, 2015
[16] < M A T L A B (R) >
[16] Copyright 1984-2015 The MathWorks, Inc.
[16] R2015a (8.5.0.197613) 64-bit (glnxa64)
[16] February 12, 2015
[30] < M A T L A B (R) >
[30] Copyright 1984-2015 The MathWorks, Inc.
[30] R2015a (8.5.0.197613) 64-bit (glnxa64)
[30] February 12, 2015
[7] < M A T L A B (R) >
[7] Copyright 1984-2015 The MathWorks, Inc.
[7] R2015a (8.5.0.197613) 64-bit (glnxa64)
[7] February 12, 2015
[24] < M A T L A B (R) >
[24] Copyright 1984-2015 The MathWorks, Inc.
[24] R2015a (8.5.0.197613) 64-bit (glnxa64)
[24] February 12, 2015
[2] < M A T L A B (R) >
[2] Copyright 1984-2015 The MathWorks, Inc.
[2] R2015a (8.5.0.197613) 64-bit (glnxa64)
[2] February 12, 2015
[12] < M A T L A B (R) >
[12] Copyright 1984-2015 The MathWorks, Inc.
[12] R2015a (8.5.0.197613) 64-bit (glnxa64)
[12] February 12, 2015
[0] < M A T L A B (R) >
[0] Copyright 1984-2015 The MathWorks, Inc.
[0] R2015a (8.5.0.197613) 64-bit (glnxa64)
[0] February 12, 2015
[20] < M A T L A B (R) >
[20] Copyright 1984-2015 The MathWorks, Inc.
[20] R2015a (8.5.0.197613) 64-bit (glnxa64)
[20] February 12, 2015
[31] < M A T L A B (R) >
[31] Copyright 1984-2015 The MathWorks, Inc.
[31] R2015a (8.5.0.197613) 64-bit (glnxa64)
[31] February 12, 2015
[9] < M A T L A B (R) >
[9] Copyright 1984-2015 The MathWorks, Inc.
[9] R2015a (8.5.0.197613) 64-bit (glnxa64)
[9] February 12, 2015
[10] < M A T L A B (R) >
[10] Copyright 1984-2015 The MathWorks, Inc.
[10] R2015a (8.5.0.197613) 64-bit (glnxa64)
[10] February 12, 2015
[28] < M A T L A B (R) >
[28] Copyright 1984-2015 The MathWorks, Inc.
[28] R2015a (8.5.0.197613) 64-bit (glnxa64)
[28] February 12, 2015
[18] < M A T L A B (R) >
[18] Copyright 1984-2015 The MathWorks, Inc.
[18] R2015a (8.5.0.197613) 64-bit (glnxa64)
[18] February 12, 2015
[13] < M A T L A B (R) >
[13] Copyright 1984-2015 The MathWorks, Inc.
[13] R2015a (8.5.0.197613) 64-bit (glnxa64)
[13] February 12, 2015
[23] < M A T L A B (R) >
[23] Copyright 1984-2015 The MathWorks, Inc.
[23] R2015a (8.5.0.197613) 64-bit (glnxa64)
[23] February 12, 2015
[25] < M A T L A B (R) >
[25] Copyright 1984-2015 The MathWorks, Inc.
[25] R2015a (8.5.0.197613) 64-bit (glnxa64)
[25] February 12, 2015
[15] < M A T L A B (R) >
[15] Copyright 1984-2015 The MathWorks, Inc.
[15] R2015a (8.5.0.197613) 64-bit (glnxa64)
[15] February 12, 2015
[26] < M A T L A B (R) >
[26] Copyright 1984-2015 The MathWorks, Inc.
[26] R2015a (8.5.0.197613) 64-bit (glnxa64)
[26] February 12, 2015
[5] < M A T L A B (R) >
[5] Copyright 1984-2015 The MathWorks, Inc.
[5] R2015a (8.5.0.197613) 64-bit (glnxa64)
[5] February 12, 2015
[11] < M A T L A B (R) >
[11] Copyright 1984-2015 The MathWorks, Inc.
[11] R2015a (8.5.0.197613) 64-bit (glnxa64)
[11] February 12, 2015
[17] < M A T L A B (R) >
[17] Copyright 1984-2015 The MathWorks, Inc.
[17] R2015a (8.5.0.197613) 64-bit (glnxa64)
[17] February 12, 2015
[19] < M A T L A B (R) >
[19] Copyright 1984-2015 The MathWorks, Inc.
[19] R2015a (8.5.0.197613) 64-bit (glnxa64)
[19] February 12, 2015
[27] < M A T L A B (R) >
[27] Copyright 1984-2015 The MathWorks, Inc.
[27] R2015a (8.5.0.197613) 64-bit (glnxa64)
[27] February 12, 2015
[8] < M A T L A B (R) >
[8] Copyright 1984-2015 The MathWorks, Inc.
[8] R2015a (8.5.0.197613) 64-bit (glnxa64)
[8] February 12, 2015
[22] < M A T L A B (R) >
[22] Copyright 1984-2015 The MathWorks, Inc.
[22] R2015a (8.5.0.197613) 64-bit (glnxa64)
[22] February 12, 2015
[29] < M A T L A B (R) >
[29] Copyright 1984-2015 The MathWorks, Inc.
[29] R2015a (8.5.0.197613) 64-bit (glnxa64)
[29] February 12, 2015
[3] < M A T L A B (R) >
[3] Copyright 1984-2015 The MathWorks, Inc.
[3] R2015a (8.5.0.197613) 64-bit (glnxa64)
[3] February 12, 2015
[4] < M A T L A B (R) >
[4] Copyright 1984-2015 The MathWorks, Inc.
[4] R2015a (8.5.0.197613) 64-bit (glnxa64)
[4] February 12, 2015
[1] < M A T L A B (R) >
[1] Copyright 1984-2015 The MathWorks, Inc.
[1] R2015a (8.5.0.197613) 64-bit (glnxa64)
[1] February 12, 2015
[21]
[14]
[6]
[21]To get started, type one of these: helpwin, helpdesk, or demo.
[21]For product information, visit www.mathworks.com.
[21]
[14]To get started, type one of these: helpwin, helpdesk, or demo.
[14]For product information, visit www.mathworks.com.
[14]
[16]
[6]To get started, type one of these: helpwin, helpdesk, or demo.
[6]For product information, visit www.mathworks.com.
[6]
[30]
[2]
[16]To get started, type one of these: helpwin, helpdesk, or demo.
[16]For product information, visit www.mathworks.com.
[16]
[30]To get started, type one of these: helpwin, helpdesk, or demo.
[30]For product information, visit www.mathworks.com.
[30]
[0]
[20]
[9]
[28]
[2]To get started, type one of these: helpwin, helpdesk, or demo.
[2]For product information, visit www.mathworks.com.
[2]
[7]
[31]
[10]
[0]To get started, type one of these: helpwin, helpdesk, or demo.
[0]For product information, visit www.mathworks.com.
[0]
[24]
[25]
[12]
[23]
[9]To get started, type one of these: helpwin, helpdesk, or demo.
[9]For product information, visit www.mathworks.com.
[9]
[26]
[20]To get started, type one of these: helpwin, helpdesk, or demo.
[20]For product information, visit www.mathworks.com.
[20]
[28]To get started, type one of these: helpwin, helpdesk, or demo.
[28]For product information, visit www.mathworks.com.
[28]
[17]
[31]To get started, type one of these: helpwin, helpdesk, or demo.
[31]For product information, visit www.mathworks.com.
[31]
[5]
[13]
[7]To get started, type one of these: helpwin, helpdesk, or demo.
[7]For product information, visit www.mathworks.com.
[7]
[11]
[18]
[10]To get started, type one of these: helpwin, helpdesk, or demo.
[10]For product information, visit www.mathworks.com.
[10]
[19]
[8]
[15]
[25]To get started, type one of these: helpwin, helpdesk, or demo.
[24]To get started, type one of these: helpwin, helpdesk, or demo.
[25]For product information, visit www.mathworks.com.
[25]
[24]For product information, visit www.mathworks.com.
[24]
[12]To get started, type one of these: helpwin, helpdesk, or demo.
[12]For product information, visit www.mathworks.com.
[12]
[29]
[23]To get started, type one of these: helpwin, helpdesk, or demo.
[26]To get started, type one of these: helpwin, helpdesk, or demo.
[23]For product information, visit www.mathworks.com.
[23]
[26]For product information, visit www.mathworks.com.
[26]
[17]To get started, type one of these: helpwin, helpdesk, or demo.
[17]For product information, visit www.mathworks.com.
[17]
[27]
[13]To get started, type one of these: helpwin, helpdesk, or demo.
[13]For product information, visit www.mathworks.com.
[13]
[19]To get started, type one of these: helpwin, helpdesk, or demo.
[19]For product information, visit www.mathworks.com.
[19]
[11]To get started, type one of these: helpwin, helpdesk, or demo.
[11]For product information, visit www.mathworks.com.
[11]
[5]To get started, type one of these: helpwin, helpdesk, or demo.
[5]For product information, visit www.mathworks.com.
[5]
[18]To get started, type one of these: helpwin, helpdesk, or demo.
[3]
[18]For product information, visit www.mathworks.com.
[18]
[22]
[8]To get started, type one of these: helpwin, helpdesk, or demo.
[8]For product information, visit www.mathworks.com.
[8]
[15]To get started, type one of these: helpwin, helpdesk, or demo.
[15]For product information, visit www.mathworks.com.
[15]
[29]To get started, type one of these: helpwin, helpdesk, or demo.
[29]For product information, visit www.mathworks.com.
[29]
[1]
[4]
[27]To get started, type one of these: helpwin, helpdesk, or demo.
[27]For product information, visit www.mathworks.com.
[27]
[3]To get started, type one of these: helpwin, helpdesk, or demo.
[3]For product information, visit www.mathworks.com.
[3]
[22]To get started, type one of these: helpwin, helpdesk, or demo.
[22]For product information, visit www.mathworks.com.
[22]
[1]To get started, type one of these: helpwin, helpdesk, or demo.
[1]For product information, visit www.mathworks.com.
[1]
[4]To get started, type one of these: helpwin, helpdesk, or demo.
[4]For product information, visit www.mathworks.com.
[4]
[14] Academic License
[21] Academic License
[6] Academic License
[14]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[21]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[0] Academic License
[16] Academic License
[14]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core
[14]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core/iSetup
[14]2015-11-07 17:01:29 | This process will exit on any fault.
[21]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core
[21]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core/iSetup
[30] Academic License
[14]2015-11-07 17:01:29 | This process will exit when its parent process dies.
[21]2015-11-07 17:01:29 | This process will exit on any fault.
[14]2015-11-07 17:01:29 | About to initialize MPI.
[21]2015-11-07 17:01:29 | This process will exit when its parent process dies.
[21]2015-11-07 17:01:29 | About to initialize MPI.
[6]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[9] Academic License
[2] Academic License
[13] Academic License
[0]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[28] Academic License
[12] Academic License
[31] Academic License
[10] Academic License
[6]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core
[6]2015-11-07 17:01:29 | Enter distcomp_evaluate_filetask_core/iSetup
[6]2015-11-07 17:01:29 | This process will exit on any fault.
[29] Academic License
[6]2015-11-07 17:01:29 | This process will exit when its parent process dies.
[16]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[26] Academic License
[6]2015-11-07 17:01:30 | About to initialize MPI.
[19] Academic License
[30]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[5] Academic License
[0]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[0]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[9]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[0]2015-11-07 17:01:30 | This process will exit on any fault.
[0]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[2]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[25] Academic License
[0]2015-11-07 17:01:30 | About to initialize MPI.
[7] Academic License
[13]2015-11-07 17:01:29 | About to evaluate task with DistcompEvaluateFileTask
[20] Academic License
[16]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[16]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[8] Academic License
[18] Academic License
[16]2015-11-07 17:01:30 | This process will exit on any fault.
[30]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[28]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[31]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[30]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[24] Academic License
[12]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[16]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[30]2015-11-07 17:01:30 | This process will exit on any fault.
[16]2015-11-07 17:01:30 | About to initialize MPI.
[23] Academic License
[10]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[9]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[15] Academic License
[30]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[9]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[17] Academic License
[29]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[9]2015-11-07 17:01:30 | This process will exit on any fault.
[30]2015-11-07 17:01:30 | About to initialize MPI.
[9]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[2]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[2]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[9]2015-11-07 17:01:30 | About to initialize MPI.
[2]2015-11-07 17:01:30 | This process will exit on any fault.
[26]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[19]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[2]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[11] Academic License
[13]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[13]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[2]2015-11-07 17:01:30 | About to initialize MPI.
[5]2015-11-07 17:01:30 | About to evaluate task with DistcompEvaluateFileTask
[13]2015-11-07 17:01:30 | This process will exit on any fault.
[28]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[28]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[31]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[31]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[28]2015-11-07 17:01:30 | This process will exit on any fault.
[12]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[13]2015-11-07 17:01:30 | Unexpected error setting up process monitor. Error returned:
[13]Unexpected Standard exception from MEX file.
[13]What() is:boost::thread_resource_error
[13]..
[13]Error in distcomp_evaluate_filetask_core>iSetupProcessMonitoringThreads (line 622)
[13] dct_psfcns('pidwatch', pidToWatch)
[13]Error in distcomp_evaluate_filetask_core>iMaybeSetupProcessMonitoringThreads (line 256)
[13] iSetupProcessMonitoringThreads;
[13]Error in distcomp_evaluate_filetask_core>iSetup (line 506)
[13]iMaybeSetupProcessMonitoringThreads();
[13]Error in distcomp_evaluate_filetask_core (line 25)
[13] runprop = iSetup(handlers, mdceDebugEnabled, outputWriterStack, isSyncTaskEvaluation, varargin);
[12]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[13]2015-11-07 17:01:30 | About to exit with code: 1
[31]2015-11-07 17:01:30 | This process will exit on any fault.
[28]2015-11-07 17:01:30 | This process will exit when its parent process dies.
[12]2015-11-07 17:01:30 | This process will exit on any fault.
[10]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core
[10]2015-11-07 17:01:30 | Enter distcomp_evaluate_filetask_core/iSetup
[28]2015-11-07 17:01:30 | About to initialize MPI.
[31]2015-11-07 17:01:30 | This process will exit when its parent process dies.
job aborted:
rank: node: exit code[: error message]
0: 127.0.0.1: -2
1: 127.0.0.1: -2
2: 127.0.0.1: -2
3: 127.0.0.1: -2
4: 127.0.0.1: -2
5: 127.0.0.1: -2
6: 127.0.0.1: -2
7: 127.0.0.1: -2
8: 127.0.0.1: -2
9: 127.0.0.1: -2
10: 127.0.0.1: -2
11: 127.0.0.1: -2
12: 127.0.0.1: -2
13: 127.0.0.1: -2: process 13 exited without calling init while other processes have called init
14: 127.0.0.1: -2
15: 127.0.0.1: -2
16: 127.0.0.1: -2
17: 127.0.0.1: -2
18: 127.0.0.1: -2
19: 127.0.0.1: -2
20: 127.0.0.1: -2
21: 127.0.0.1: -2
22: 127.0.0.1: -2
23: 127.0.0.1: -2
24: 127.0.0.1: -2
25: 127.0.0.1: -2
26: 127.0.0.1: -2
27: 127.0.0.1: -2
28: 127.0.0.1: -2
29: 127.0.0.1: -2
30: 127.0.0.1: -2
31: 127.0.0.1: -2
Stage: Pool job test (createCommunicatingJob)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
Stage: Parallel pool test (parpool)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)

Answers (1)

Edric Ellis
Edric Ellis on 9 Nov 2015
This looks like your machine ran out of resources while trying to start up the workers. Do you have any ulimit in effect?
  4 Comments
Mohammad Sayyafzadeh
Mohammad Sayyafzadeh on 9 Nov 2015
That is the output of the command :
ulimit -a core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1032918
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Darwin
Darwin on 17 Oct 2016
I manage Matlab on Linux HPC machines and can use the number of workers equal to the number of cores on 1 node with parpool. Hyperthreading does not work right under CentOS.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!