race condition at asynchronous task assignment during runtime
Show older comments
Dear Matlab community,
I am trying to figure out a problem that I anticipated easy but is proving me wrong ... Basically, I am trying to solve a race condition that arises using the parallel computing toolbox.
Imagine I have N tasks and G GPUs. Each task will take a different amount of time to be completed, and this time can be determined only during runtime. Thus, I don't want to preassign the N/G tasks to each GPU, as this will lead to an unbalanced workload distribution. Instead, I wish to launch an spmd of G labs (each one controlling a GPU) so that , whenever a lab finishes one assigned task, a new one is assigned to it in runtime, till all N tasks have been finished.
The problem with this concept is that each lab needs to know which tasks have already been assigned to other labs.
In one of the approaches I've tested, I create a file that stores the last task number. The lab reads this file, increases the task number by one and updates the file. Here the code (for N=3 tasks and G=2 )
numberTasks = 3;
lastAssignedTask = 0;
file = 'lastAssignedTask.txt';
dlmwrite(file,lastAssignedTask);
spmd
while lastAssignedTask<numberTasks
lastAssignedTask = dlmread(file);
fprintf('Lab:%d read that the last assigned task was %d \n',labindex,lastAssignedTask);
taskForThisLab = lastAssignedTask+1;
lastAssignedTask = taskForThisLab;
dlmwrite(file,lastAssignedTask);
fprintf('task: %d Lab:%d \n',lastAssignedTask,labindex);
end
end
However, the output is
Lab 1:
Lab:1 read that the last assigned task was 0
task: 1 Lab:1
Lab:1 read that the last assigned task was 1
task: 2 Lab:1
Lab:1 read that the last assigned task was 2
task: 3 Lab:1
Lab 2:
Lab:2 read that the last assigned task was 0
task: 1 Lab:2
Lab:2 read that the last assigned task was 1
task: 2 Lab:2
Lab:2 read that the last assigned task was 2
task: 3 Lab:2
Looks like each worker reads the file at the same time. I have been trying several workarounds (see the tags,) but none seems to be useful for this problem. Is there somehitng very obvious that I am overlooking?
4 Comments
Walter Roberson
on 29 Dec 2020
Under what circumstances does an lab task need to be updated when a new task is started?
Allocate one extra lab that just does coordination. When a lab needs to know the current status, labSend to the coordinator, labRecieve the response.
Arabarra
on 29 Dec 2020
Walter Roberson
on 29 Dec 2020
Edited: Walter Roberson
on 29 Dec 2020
You can find the number of labs. Create a vector of task numbers T, one slot per lab including one for the controller, initialized to 0s.
While T(1)<maxTasks
find first T entry after the first that is 0. If you find one, T(1)=T(1)+1, T(K) =T(1), labSend() value T(K) to lab #K, go on to next K in zero checking.
If you reached the end of T, labRecieve asking for two outputs. Second output is the labindex. Pull out the value from T() and it will tell you which task was being done (or just have the lab send the value), and do whatever with that fact. Now zero the T entry. Return to the loop that checks for zeros.
When you reach max tasks, instead of sending a task number to a slot marked 0, send a negative shut down signal to the lab and mark the slot with -1. When all slots after the first are -1, shut down lab 1 too.
Labs 2 onwards:
loop. labRecieve. if the value is negative, close the lab. Otherwise it is a task number. Do the task and then labSend something to lab 1 so it knows you are finished the task. End loop (back to the labRecieve)
Accepted Answer
More Answers (0)
Categories
Find more on Parallel Computing Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!