# 3D gpuArray vs cells of 2D gpuArrays major speed difference!

6 views (last 30 days)
Dan Ryan on 24 May 2013
Can anybody explain why these codes have drastically different runtimes?
I have a shared setup routine
clear all
y = gpuArray.rand(1000, 1000, 'single');
W = cell(1, 5);
WFull = gpuArray.zeros(1000, 1000, 5);
for j = 1:5
W{j} = gpuArray.rand(1000, 1000, 'single');
WFull(:,:,j) = W{j};
end
Version 1 (finishes in 1.4 seconds on my machine)
z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
for j = 1:size(W)
z(:,:,j) = W{j}*y;
end
end
toc
vs. Version 2 (finishes in 39 seconds on my machine... 27x times slower)
z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
for j = 1:size(WFull, 3)
z(:,:,j) = WFull(:,:,j)*y;
end
end
toc
Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?

Matt J on 24 May 2013
Edited: Matt J on 24 May 2013
Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?
Yes, it is faster to look-up a cell than to pull a slice out of a 3D array, and that's true for normal arrays as well, as long as there is a small number of slices/cells. Of course, you should really be including the time needed to allocate memory to each W{j} in your comparison.
Another reason is that you have a syntax error in your for-loop over W{j}. It's only doing 1 loop iteration instead of 5,
>> for j=1:size(W), j, end
j =
1
This is biasing the comparison to some degree.
##### 2 CommentsShowHide 1 older comment
Dan Ryan on 30 May 2013
James Lebak from mathworks helped me out with a really good tip:
use a
wait(gpuDevice)
command before the
toc
command when timing the GPU speeds.
Now the timings increase linearly with number of loop iterations and the two implementations give very similar results. Good to know!