- For-loop time = 0.010799
- Vectorized CPU time = 0.012376
- Vectorized GPU time = 0.069613
Make double loop run on GPU
3 views (last 30 days)
Show older comments
Rozh Al-Mashhdi
on 6 Sep 2023
Commented: Rozh Al-Mashhdi
on 6 Sep 2023
Hi
I have the following code that I want to run somehow on the GPU.
In short, I have a 2D array (an MRI image). Starting from a user-defined centre-point, the the image is divided into radial concentric "rings" of specified width (f.eks. 0.1mm) and total number of pixels and total signal intensity (pixel value) in each ring are recorded in separate output arrays (called Profile and SumBinSignal). The code below does this using 2 for loops.
for X= Xmin:1:Xmax
for Y= Ymin:1:Ymax
if ~isnan(Image(Y,X)) %NaN pixels are just ignored
Dist = Resolution * ( (X-CentX)^2 + (Y-CentY)^2 )^0.5; %distance of point to centre measured in mm
TargetBin= ceil(Dist * BinDensity); %convert distance from mm to "bins/rings"
if (TargetBin>=1) && (TargetBin<=MaxTargetBin)
Profile(TargetBin)= Profile(TargetBin)+1; %record number of pixels in bin/ring
SumBinSignal(TargetBin)= SumBinSignal(TargetBin) + Image(Y, X); % record total signal in ring/bin
end
end
end
end
Now I know that in order to run on the GPU, I either must vectorize the calculations or use "arrayfun". But I really cant manage to do either. I maganged to vectorize the calculations only partly with no great speed advantage. The arrayfun approach I dont know even how to start with.
Help will be highly appreciated
best regards, Rozh
0 Comments
Accepted Answer
Bruno Luong
on 6 Sep 2023
Edited: Bruno Luong
on 6 Sep 2023
GPU is the slowest on my PC. I knew for-loop is now away fast, but I'm still amazed by how fast it is. There is really a remarkable progress made by TMW on Excution Engine last few releases.
Code:
N = 1000;
Image = peaks(N);
Xmin = 1; Xmax = N;
Ymin = 1; Ymax = N;
CentX = 0;
CentY = 0;
Resolution = 1/N;
BinDensity = 100;
MaxTargetBin = 100;
tic
Profile = zeros(MaxTargetBin,1);
SumBinSignal = zeros(MaxTargetBin,1);
for X=Xmin:1:Xmax
for Y=Ymin:1:Ymax
if ~isnan(Image(Y,X)) %NaN pixels are just ignored
Dist = Resolution * ( (X-CentX)^2 + (Y-CentY)^2 )^0.5; %distance of point to centre measured in mm
TargetBin= ceil(Dist * BinDensity); %convert distance from mm to "bins/rings"
if (TargetBin>=1) && (TargetBin<=MaxTargetBin)
Profile(TargetBin)= Profile(TargetBin)+1; %record number of pixels in bin/ring
SumBinSignal(TargetBin)= SumBinSignal(TargetBin) + Image(Y, X); % record total signal in ring/bin
end
end
end
end
fprintf('For-loop time = %1.6f\n', toc)
% Vectorized code, CPU
tic
X = Xmin:Xmax;
Y = (Ymin:Ymax)';
Dist = Resolution * sqrt( (X-CentX).^2 + (Y-CentY).^2 );
TargetBin= ceil(Dist * BinDensity);
Im = Image;
keep = (TargetBin>=1) & (TargetBin<=MaxTargetBin) & ~isnan(Im);
TargetBin = TargetBin(keep);
Profile2 = accumarray(TargetBin, 1, [MaxTargetBin,1]);
if ~isequal([length(Y) length(X)], size(Im))
Im = Image(Y,X); % indexing is costly
end
SumBinSignal2 = accumarray(TargetBin, Im(keep), [MaxTargetBin,1]);
fprintf('Vectorized CPU time = %1.6f\n', toc)
if gpuDeviceCount("available")
% Vectorized code, GPU
tic
X = gpuArray(Xmin:Xmax);
Y = gpuArray((Ymin:Ymax))';
Dist = Resolution * sqrt( (X-CentX).^2 + (Y-CentY).^2 );
TargetBin= ceil(Dist * BinDensity);
Im = gpuArray(Image);
keep = (TargetBin>=1) & (TargetBin<=MaxTargetBin) & ~isnan(Im);
TargetBin = TargetBin(keep);
Profile3 = gather(accumarray(TargetBin, 1, [MaxTargetBin,1]));
if ~isequal([length(Y) length(X)], size(Im))
Im = Im(Y,X); % indexing is costly
end
SumBinSignal3 = gather(accumarray(TargetBin, Im(keep), [MaxTargetBin,1]));
fprintf('Vectorized GPU time = %1.6f\n', toc)
end
% Compare results
%norm(Profile-Profile2)
norm(SumBinSignal-SumBinSignal2)
if gpuDeviceCount("available")
%norm(Profile-Profile3)
norm(SumBinSignal-SumBinSignal3)
end
2 Comments
Sam Marshalik
on 6 Sep 2023
To add to Bruno's thorough response: In order to maximize your GPU, you will want to:
- Run only computationally intensive things on the GPU
- Run computations that do not require communication with other components on the motherboard (RAM, storage, etc.)
It seems that in this case the problem you are working on is not computationally intensive enough to really make use of the GPU. I imagine if you were to bump up the problem size you would eventually see the benefit of using a GPU.
More Answers (0)
See Also
Categories
Find more on Get Started with GPU Coder in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!