Clear Filters
Clear Filters

Can I run Matlab function on multiple GPU?

14 views (last 30 days)
I work on image processing and handle big array of images. I need to use Matlab function, let, median filter on multiple GPU. I can run on 1 GPU but I have 4 GPUs so I want to run each Matlab function on all 4 GPUs. Is it possible to run every Matlab function in all 4 GPUs instead of just 1 GPU?
Joss Knight
Joss Knight on 6 Feb 2020
Well, if it's a colour image you could process each channel independently.
Walter Roberson
Walter Roberson on 6 Feb 2020
True, and sometimes large images are multichannel / multispectra, so there might plausibly be more than 3 channels.
It does become questionable about whether it is efficient to transfer large planes to other processes that would then have to transfer them to gpu and filtered response would have to go through the two transfer steps to get back. But it would be worth the experiment.

Sign in to comment.

Accepted Answer

Jason Ross
Jason Ross on 4 Feb 2020
Yes, it's explained in the documentation.
Muhammad Abir
Muhammad Abir on 4 Feb 2020
Thanks for your commnet. However, I don't see any example. All I found is how to run the multi-gpu on a for loop ( For a specific function such as medfilt2, I have no for loop. In that case, how can I run it on multi-gpu? I'd really appreciate if you kindly give me an example. Thank you.
Jason Ross
Jason Ross on 4 Feb 2020
You would need to put the function in a parfor loop and iterate over your image array. This example reads all jpg files in /tmp, opens a parallel pool equal to the number of GPUs in your host, then processes the images on the pool, and closes it after it's done.
As Walter indicates, this isn't sharing resources between four GPUs, under the hood it's launching four MATLAB workers and scheduling the work on each GPU as they are available. But it may be sufficient for your needs?
filePattern = fullfile('/tmp', '*.jpg');
jpegFiles = dir(filePattern);
for k = 1:length(jpegFiles)
baseFileName = jpegFiles(k).name;
fullFileName = fullfile('/tmp', baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
images{k} = imread(fullFileName);
parfor ii = 1:numel(images)
G = gpuArray(images{ii});
gs = rgb2gray(G);

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 4 Feb 2020
No, there is no way to run a user function on multiple GPU at this time.
At this time, each process can use gpuDevice to select a single GPU (this is done automatically for parpool members when the pool is no larger than the number of GPU.) Selecting a second GPU device would reset the first device.
Mathworks is doing work on sharing load between two hardware connected NVIDIA, and has implemented it for one particular kind of deep learning, and for one other task that is not coming to mind at the moment. Unfortunately I am having difficulty finding the appropriate postings.
shital shinde
shital shinde on 13 Feb 2020
Actually I am trying to use parallelization for image processing. And I take task to parallelize denoising of image. simple method for denoising I know, but after that what have to do, for that I am confuse. Please any one help me for code.
k = rgb2gray(k);
n=imnoise(k,'salt & pepper',0.01);
v=medfilt2(n,[3 3]);
I done with this. but how to use parallelization that i dont know. Please help me for that.
thank you.
Walter Roberson
Walter Roberson on 13 Feb 2020
You can replace
n=imnoise(k,'salt & pepper',0.01);
per_side = 4;
fun = @(block_struct) imnoise(, 'salt & pepper', 0.01);
blocksize = floor([size(k, 1)/per_side, size(k,2)/per_side]);
n = blockproc(fun, blocksize, 'UseParallel', true);
This would break up the image into per_side by per_side pieces (and possibly a fractional piece as well) and would run imnoise upon each piece, using the Parallel Computing Toolbox, and you would have achived your goal of using parallelization as part of the process.
You should, by the way, expect that this would be notably slower than just using imnoise for the entire matrix. imnoise() is completely vectorized; the overhead of splitting up the file and calling multiple functions and dispatching to the parpool processes, and so on, is going to overwhelm the gains from using multiple CPUs.
v=medfilt2(n,[3 3]);
medfilt2, on the other hand, cannot simply break the values into blocks and process the blocks independently, because two adjacent values that happen to fall into different fixed-sized blocks influence each other. You would have to use blockproc with overlap turned on, and be careful on how you dealt with the edges, and how you dealt with with the right edge and bottom partial blocks. You would also have the overlap of splitting up the file and calling multiple functions and dispatching to the parpool processes, and so on. It would not be at all surprising if the result was slower than just working with a single CPU.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!