Summing array elements seems to be slow on GPU

Question

Damian Suski on 26 Apr 2023

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1953314-summing-array-elements-seems-to-be-slow-on-gpu

Commented: Damian Suski on 18 May 2023

I am testing the times of execution for the following function on CPU and GPU

function funTestGPU(P,U,K,UN)
    for k = 1:P
        H = exp(1i*K);
        HU = U.*H;
        UN(k,:) = sum(HU,[1,3]);
    end
end

where

,

are complex arrays of size

and Kis a complex array of size

. So in each iteration I perform element-wise exp(), element-wise multiplication of two arrays and summing elements of 3D array along two dimensions.

I test the execution time on CPU and on GPU with the help of the following script

P = 200;
URe = 1/(sqrt(2))*rand(P);
UIm = 1/(sqrt(2))*rand(P);
KRe = 1/(sqrt(2))*rand(P,P,P);
KIm = 1/(sqrt(2))*rand(P,P,P);
% CPU
U = complex(URe, UIm);
K = complex(KRe, KIm);
UN = complex(zeros(P), zeros(P));
fcpu = @() funTestGPU(P,U,K,UN);
tcpu = timeit(fcpu);
disp(['CPU time: ',num2str(tcpu)])
% GPU
U = gpuArray(complex(URe, UIm));
K = gpuArray(complex(KRe, KIm));
UN = gpuArray(complex(zeros(P), zeros(P)));
fgpu = @() funTestGPU(P,U,K,UN);
tgpu = gputimeit(fgpu);
disp(['GPU time: ',num2str(tgpu)])

and I obtain the results

CPU time: 9.0315
GPU time: 3.3894

My concern is that if I remove the last operation from the funTestGPU (summing array elements) I obtain the results

CPU time: 8.0185
GPU time: 0.0045631

So it looks like the summation is the most time-consuming operation on GPU. Is that an expected result?

I wrote the analogical codes in cuPy and in Pytorch and there the summation does not seem to be the most time consuming operation.

I use Matlab 2019b. My graphics card is NVIDIA GeForce GTX 1050 Ti (768 CUDA cores), my processor is AMD Ryzen 7 3700X (8 physical cores).

2 Comments
Show NoneHide None

Matt J on 27 Apr 2023

Moved: Matt J on 27 Apr 2023

So it looks like the summation is the most time-consuming operation on GPU. Is that an expected result?

That's what I would expect. It's the only operation in the chain that is not element-wise.

Damian Suski on 27 Apr 2023

@Matt J Thank you for your comment. Before I run tests, I imagined that the exponential will be the most time consuming operation, but it turns out that element-wise operations are not the bottleneck of calculations. I just wanted to make sure that I do not miss something obvious.

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 27 Apr 2023

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1953314-summing-array-elements-seems-to-be-slow-on-gpu#answer_1224269

Open in MATLAB Online

These are my results that I got on my (somewhat old) GeForce GTX 1080 Ti:

CPU time: 16.1288
GPU time: 0.96266

If I change the datatype to single I get:

CPU time: 14.9785
GPU time: 0.35102

That's maybe 2x faster?

So on the one hand your GPU is pretty slow and your CPU is pretty fast, and on the other maybe you could try using single precision instead, if you don't mind the loss of accuracy.

1 Comment
Show -1 older commentsHide -1 older comments

Damian Suski on 27 Apr 2023

Well, I would also say that my CPu is quite fast and GPu is rather weak (only 800 CUDA cores, 4GB RAM). Several years ago I have bought the cheapest graphics card, without parallel computations in mind.

The results for your card (over 3.5k CUDA cores, 11GB RAM) are pretty impressive, I have tried GeForce RTX 3060 (over 3.5k CUDA cores, 12GB RAM) on another computer and it gave 1,5s for double precision. For the analogical code in pytorch, I have tried Tesla T4 card (freely available on Google Colab), which gave also 1,5s. So the proper choice of the GPU card makes the difference.

I will definitely try single precision, but at the moment it is hard for me to say whether the precision loss will be acceptable for my purpose.

Sign in to comment.

Answer 2

Joss Knight on 27 Apr 2023

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1953314-summing-array-elements-seems-to-be-slow-on-gpu#answer_1224169

Moved: Matt J on 27 Apr 2023

Why are you recomputing H and HU inside the loop? They do not change. If you remove the sum, because the results are never used from the first (P-1) iterations, only the last computation of those values will actually take place.

6 Comments
Show 4 older commentsHide 4 older comments

Damian Suski on 28 Apr 2023

I have tried batching approach on my GPU, but have not noticed any speed-up. I will try it on a better GPU and decribe the deatiled results.

Damian Suski on 18 May 2023

I made the experiments and I haven't noticed the speedup in the case of batching. Time of computations increases proportionally to the batch size.

I have implemented the proper procedure and I was able to reproduce the discussed speedup results for the dummy example. The computations time was reduced from 186s on CPU to 42s on GPU. On a better graphics card the computations time is even shorter - 21s. Summing up, I'm satisfied with the results.

What still concerns me is that in Matlab the element-wise exp() is much faster than summing elements along two dimensions. For the analogical calculations in cuPy or pytorch, the situation seems to be the opposite. Can I place here the detailed results of my findings or should I start a new topic?

Sign in to comment.

Summing array elements seems to be slow on GPU

2 Comments
Show NoneHide None

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

6 Comments
Show 4 older commentsHide 4 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Summing array elements seems to be slow on GPU

2 Comments Show NoneHide None

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

6 Comments Show 4 older commentsHide 4 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments

6 Comments
Show 4 older commentsHide 4 older comments