Half precision using GPU

Hello, I was trying to see if I can run some code using half precision rather than single.
before converting my code, I was trying a very simple example.
A=gpuArray(magic(3));
A=half(A);
This gives me the error: No constructor 'half' with matching signature found.
Using the the half with the CPU works fawlessly.
Any idea if this is supported by all? Looking here, https://www.mathworks.com/help/gpucoder/ug/what-is-half-precision.html, it seems some GPU should support it?
I am using a 16 GB RTX3080 Mobile. R2022b.

2 Comments

Perhaps
A=gpuArray(half(magic(3)))
??
I do not have a GPU available to test with
Unforunately, this won't work either, it gives: GPU arrays support only fundamental numeric or logical data types.

Sign in to comment.

 Accepted Answer

As pointed out, gpuArray does not support half. The main reason is that half is an emulated type only meaningful for deployment to special hardware, it is not native to most processors. Feel free to investigate use of half for code generation.
Do you just want to store data in half to save space on the GPU? You can use the following code to get something like the behaviour you're after:
function u = toHalf(x)
realmaxHalf = single(65504);
x = min(max(x,-realmaxHalf),realmaxHalf);
[f,e] = frexp(abs(x));
sgn = uint16(x>=0);
sgnbit = bitshift(sgn,15);
expbits = bitshift(uint16(e+15),10);
fbits = uint16(f.*2.^10 - 1);
u = bitor(bitor(sgnbit, expbits), fbits);
end
function x = fromHalf(u)
if u == 0
x = single(0);
return
end
u = uint16(u);
sgn = single(bitshift(u,-15));
fbits = bitand(u,uint16(1023));
f = single(fbits+1)./(2.^10);
expbits = bitand(u,uint16(31744));
e = single(bitshift(expbits,-10))-15;
x = (sgn.*2-1).*f.*2.^e;
end
Note, this is a very crude implementation of fp16 that takes no account of nans, infs, correct overflow behaviour or denormals. The half version is just a uint16 with the data in it, you can't actually use it to compute anything in fp16.

4 Comments

Fernando
Fernando on 11 Apr 2023
Edited: Fernando on 11 Apr 2023
Thanks for the response. I am going to take a look into this, need to understand what exactly this is doing to see how to incorporate it in my code.
Edit: I assume frexp needs to be replaced with log2(abs(x)) for this to work?.
Edit2: I was looking to increase the size of my models but also solving faster, I am simulating a superconducting machine using some electromagnetic laws. If I increase the number of elements per conductor, etc, increases the precision but also makes it slower. However, I can only increase my model so much due to my GPU. I was also hoping to simulate more complex machines if possible without loosing precision so I thought reducing digits or using half would be a good approach. I am also comparing the results with FEM, so ideally I want to run the model faster than it. Right now I can only use matlab to simulate half a machine using symmetry but I can get the exact answer as in FEM.
Fernando
Fernando on 11 Apr 2023
Edited: Fernando on 11 Apr 2023
I tried this a bit, it does work to convert the matrices and return them corrrectly.
If I want to do any operation such as cross, sums, etc... I need to convert them to singles before (using fromHalf), otherwise it does seem to fail to give the right answer. This is not entirely bad because it works to save memory before operations but it makes operations take longer times.
I am trying to figure out if there is a way to do the cross product or else while in uint16 but I guess it is not possible from your answer.
'fraid not. No chance of that! Your only hope is to actually convert to int16 (by rescaling to some range), but you will find many blockers in the way such as integer overflow and unsupported mathematical operations. The code I gave you merely stores the number you have as a float into 16 bits; you can't actually do any computation with it.
I see. The issue is that I gain more from having larger matrices as oppossed to have smaller ones with higher precision or digits in them.
I guess I could try to work with your solution while I figure out another way or buy a better gpu.

Sign in to comment.

More Answers (1)

Matt J
Matt J on 11 Apr 2023
Edited: Matt J on 11 Apr 2023
GPU Code Generation does support it, but not the Parallel Computing Toolbox, which is where gpuArray is defined.

3 Comments

Right, so I am calculating vectors using GPUarray, I imagine there is no work around to reduce the precision to reduce array size ?
I am doing some computations using the whole 16 GB and it takes like 40 minutes per calculation... it's not bad but I was hoping to run larger models.
Matt J
Matt J on 11 Apr 2023
Edited: Matt J on 11 Apr 2023
You should probaly break the data sets into smaller chunks and process them sequentially. The GeForce RTX 3080 can only process about 70000 threads at a time anyway.
Ok, I will try to look into this.

Sign in to comment.

Categories

Asked:

on 10 Apr 2023

Commented:

on 11 Apr 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!