ifft2 on GPU array
6 views (last 30 days)
Show older comments
I am trying to compute the ifft2 of a multiple matrices. The simplete code snippet is:
gAs = gpuArray.rand(999, 519, 20);
gBs = gpuArray.rand(999, 519);
ifft2(gAs .* gBs, "symmetric");
Error using gpuArray/ifft2
An invalid array was used on the GPU.
I thought that I was using all the GPU memory. I tried using single GPU arrays but it However, I then tried the following code (bigger matrix) and worked just fine.
gAs = gpuArray.rand(1000, 519, 2);
gBs = gpuArray.rand(1000, 519);
ifft2(gAs .* gBs, "symmetric");
I know that I can also do a for-loop through gAs slices and it works but I want to get some speedup by doing it in one call to ifft2.
I wanted to understand why this is happening and if there is a way in which I can pad the matrices so that I can still get the ifft2 of the original matrices.
For reference:
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'Tesla V100-SXM2-32GB'
Index: 1
ComputeCapability: '7.0'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 3.4090e+10
AvailableMemory: 3.3167e+10
MultiprocessorCount: 80
ClockRateKHz: 1530000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
3 Comments
Walter Roberson
on 3 Jan 2022
Sorry, I would have to boot into a different operating system to test (GPU is not supported on my MacOS.)
Accepted Answer
Matt J
on 4 Jan 2022
Edited: Matt J
on 4 Jan 2022
I think you should probably just omit the 'symmetric' flag. On the GPU (mine at least), it doesn't seem to make a big difference in performance:
A = gpuArray.rand(512,512,512);
gputimeit(@() ifft2(A,'symmetric') ) % 0.0706 seconds
gputimeit(@() ifft2(A) ) % 0.0753 seconds
Whether this is an indication of sub-optimal software design on Mathworks part, I'm not sure. On the CPU, the 'symmetric' flag means the software does fewer flops, but on a parallel system like the GPU, it's not the number of flops that matters.
0 Comments
More Answers (1)
Matt J
on 3 Jan 2022
Edited: Matt J
on 3 Jan 2022
I think it's a bug, but one solution might be,
fn=@(z,d) ifft(z,[],d,'symmetric');
out = fn( fn(gAs .* gBs,1) ,2);
2 Comments
Matt J
on 4 Jan 2022
Edited: Matt J
on 4 Jan 2022
It seems I had a conceptual error. ifft(ifft(X,1,'sym'),2,'sym') is not a valid replacement for ifft2(X,'sym') unless X is symmetric about both the x and y axes.
However, it does seem like a bug that only certain array sizes work for gpuArray.ifft2(). The CPU version of ifft2() doesn't have that problem.
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!