Sensible difference between computation on GPU single type variable and CPU single type

Question

Andrea Apiella on 24 Nov 2018

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/431662-sensible-difference-between-computation-on-gpu-single-type-variable-and-cpu-single-type

Answered: Joss Knight on 24 Nov 2018

I'm using an iterative algorithm based on repeated matrix multiplications; I notice that my algorithm, with the same imputs:

converges on CPU using double precision variables;
converges on CPU using single precision variables;
converges on GPU using double precision variables;
diverges on GPU using single precision variables.

the single precision on GPU is required to obtain the greatest speed-up; I think that the problem is connected with the follow behavior:

format longg %just for visualization
M = randn(10,10,'single') % single precision matrix on cpu
Mg = gpuArray(M) <---- % same matrix on gpu
for i=1:20
  E(i) = norm(M^i-Mg^i);
end
E = 0
    1.184116e-06
    5.051582e-06
    2.594609e-05
    0.0001531584
    0.0006053803
     0.002351832
      0.01166957
      0.04795818
        0.204383
       0.8535618
        2.759973
        13.55706
        48.12731
        167.6242
        666.1393
        3173.625
        10306.33
        43441.54
        154367.1
      

so the difference between results is very big after just few matrix multiplications;instead, in double precision the difference between the two matrices is very little:

format longg %just for visualization
M = randn(10,10,'double') % double precision matrix on cpu
Mg = gpuArray(M) <---- % same matrix on gpu
for i=1:20
  E(i) = norm(M^i-Mg^i);
end
794529e-15
594314e-15
508041e-14
038021e-13
723854e-13
127249e-12
581471e-12
382443e-12
095835e-11
506465e-11
805409e-11
562852e-10
968754e-10
101229e-09
449182e-09
083673e-08
235593e-08
35213e-08
378775e-07
    

I think that it causes very different results using the GPU in single precision, and it can be a problem since using double precision GPU is much slower.

How can I resolve?

1 Comment
Show -1 older commentsHide -1 older comments

Bruno Luong on 24 Nov 2018

I don't have an answer but I think you put your finger on something fundamental of matrix multiplication on GPU.

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 24 Nov 2018

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/431662-sensible-difference-between-computation-on-gpu-single-type-variable-and-cpu-single-type#answer_348711

Open in MATLAB Online

Jan's answer is correct of course; but perhaps the succinct point is to ask the question, which answer is right? You've been assuming that when you compare the GPU result to the CPU result you are getting the error in the GPU result.

format long
M = randn(10,10,'single'); % single precision matrix on cpu
Md = double(M); % double precision version
Mg = gpuArray(M);
for i=1:20
  E(i,1:2) = [norm(Md^i-M^i) norm(Md^i-Mg^i)];
end
E
E =
  20×2 single gpuArray matrix
   1.0e+04 *
           0           0
   0.0000000   0.0000000
   0.0000000   0.0000000
   0.0000000   0.0000000
   0.0000000   0.0000000
   0.0000001   0.0000001
   0.0000002   0.0000001
   0.0000006   0.0000007
   0.0000023   0.0000020
   0.0000055   0.0000051
   0.0000266   0.0000237
   0.0000645   0.0000487
   0.0002251   0.0002210
   0.0007011   0.0004216
   0.0031013   0.0021814
   0.0074494   0.0071806
   0.0314264   0.0256066
   0.1649611   0.0789942
   0.5249441   0.2566499
   1.3869889   0.9445597
   

It turns out that, in this case, it was the GPU version that was more accurate. Probably this will always be true, because the GPU computes the sum part of the dot product in parallel in a binary tree fashion, which tends to restrict error growth in floating point by doing most of the sums with smaller values.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Jan on 24 Nov 2018

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/431662-sensible-difference-between-computation-on-gpu-single-type-variable-and-cpu-single-type#answer_348654

Edited: Jan on 24 Nov 2018

Open in MATLAB Online

This is the expected behavior. The matrix multiplication is affected by the rounding error of the dot products of the rows and columns. The dot product is the sum of the element-wise mutliplication. The sum is a numerically instable operation, which means that a small variation of the input can cause a large difference in the output. Performing a matrix multiplication in single precision must have larger errors compared to double precision. Therefore the observed faster growing of the deviation is exactly what is expected in theory. See:

E_single = [0
    1.184116e-06
    5.051582e-06
    2.594609e-05
    0.0001531584
    0.0006053803
     0.002351832
      0.01166957
      0.04795818
        0.204383
       0.8535618
        2.759973
        13.55706
        48.12731
        167.6242
        666.1393
        3173.625
        10306.33
        43441.54
        154367.1];
    E_double = [    1.794529e-15
    7.594314e-15
    2.508041e-14
    1.038021e-13
    2.723854e-13
    1.127249e-12
    2.581471e-12
    7.382443e-12
    2.095835e-11
    4.506465e-11
    9.805409e-11
    2.562852e-10
    6.968754e-10
    2.101229e-09
    3.449182e-09
    1.083673e-08
    2.235593e-08
     5.35213e-08
    1.378775e-07];
plot(log(E_single), 'b')
hold on
plot(log(E_double), 'r')

2 Comments
Show NoneHide None

Andrea Apiella on 24 Nov 2018

Edited: Andrea Apiella on 24 Nov 2018

no, the point here is the difference on the same operation on the SAME DATATYPE on CPU compared with GPU i.e. GPU single is very different from CPU single, not single vs double that is your point. I expected that results obtained on GPU are similar to the results obtained on CPU using the same datatype.

Jan on 24 Nov 2018

Edited: Jan on 24 Nov 2018

Open in MATLAB Online

@Andrea: You did not get my point. It is not the comparison between single and double, but it is expected that both types create an accumulated rounding errors and for singles it is expected to be larger and to grow faster. See this:

sM  = randn(10, 10, 'single');
sM1 = sM;
sM2 = sM + eps(sM);  % Tiny variation of the input (single)
dM  = randn(10, 10, 'double');
dM1 = dM;
dM2 = dM + eps(dM);  % Tiny variation of the input (double)
for i=1:20
  sE(i) = norm(sM1^i - sM2^i);  % Both on CPU
  dE(i) = norm(dM1^i - dM2^i);  % Both on CPU
end
figure;
axes('NextPlot', 'add');
plot(log(sE), 'b');
plot(log(dE), 'r');
legend({'single', 'double'});

You see the same behavior, if you add a tiny variation to the inputs - here both are computed on the CPU. This is not a proof, but a strong hint, that the difference for the computations on the CPU and GPU are caused by a different rounding behavior for the dot products. You can expect the same effects, when you calculate X^i on different CPUs or with different BLAS libraries. The growing difference is an effect of the instability of the matrix multiplication, not a problem of the CPU versus GPU computation.

The problem is, that you expect the calculations on CPU and GPU give the same results, but the accumulated rounding errors dominate the result and therefore it is the nature of the problem, that the results differ on different hardware.

Sign in to comment.

Sensible difference between computation on GPU single type variable and CPU single type

1 Comment
Show -1 older commentsHide -1 older comments

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Sensible difference between computation on GPU single type variable and CPU single type

1 Comment Show -1 older commentsHide -1 older comments

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None