- The operation is out-of-place, which means that a copy of the matrix is made for storing the result.
- The FFT plan required to execute the operation can vary in size depending on the properties of the input data. For more information on cuFFT plans, please see: https://docs.nvidia.com/cuda/cufft/index.html#cufft-setup
- During the execution of the FFT, a temporary workspace is required, the size of which depends on the algorithm chosen in the FFT plan. For data with dimensions that are a power of twos, cuFFT requires a smaller workspace memory. This increases if the data dimensions are a factor of larger primes, where cuFFT resorts to other algorithms that can possibly use more workspace memory than the input data itself.
- MATLAB also loads CUDA libraries, which may use up their own memory on initialization.
How much additional memory is needed to perfrom a 3D FFT other than matrix to be transformed? GPU application.
8 views (last 30 days)
Hello. I'm trying to gain an understanding of how much memory is needed to perform an FFT, and if it is different with respect to performing it on a GPU.
For instance, it appears I can only utilize up to 67% of my GPU memory before an error is thrown. I can't seem to go above this value
Nx = 256;
Ny = 256;
Nz = 512;
A = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
A = gpuArray(A);
A = fftn(A);
B = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
B = gpuArray(B);
B = fftn(B);
C = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
C = gpuArray(C);
C = fftn(C);
D = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
D = gpuArray(D);
D = fftn(D);
E = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
E = gpuArray(E);
E = fftn(E);
F = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
F = gpuArray(F);
F = fftn(F);
G = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
G = gpuArray(G);
G = fftn(G);
H = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
H = gpuArray(H);
H = fftn(H);
I = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
I = gpuArray(I);
I = fftn(I);
J = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
J = gpuArray(J);
J = fftn(J);
bytes = 16; % Bytes used for complex number
Tbytes = Nx*Ny*Nz*bits; % Total number of Bytes
NoTran = 10; % Number of FFT transforms in memory
GPUmem = 8e9; % 8 GBytes of GPU memory
% Theoretical percentage of GPU memory used with all transforms
percent = (Tbytes/GPUmem)*NoTran*100;
ans = 67.1089
If I add another matrix, let's say 'K' in the same way the other matricies were contructed, an error is then thrown.
If call the GPU it appears I obtain a different answer than my calculation
CUDADevice with properties:
Name: 'GeForce RTX 2070 with Max-Q Design'
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
TotalMemory = 8.5899e+09
AvailableMemory = 1.5127e+09
% Percentage of GPU memory used
percent = (1 - AvailableMemory/TotalMemory)*100
ans = 82.390
This answer is somewhat confusing as I made sure to only enable my computer's integrated graphics rather than the GPU. Making changes to this setting in NVIDIA control panel does not appear to change 'AvailableMemory' if I rerun all the matrices and check available memory.
So my calculation for 'Tbytes' is wrong as it appears more memory is being used. Additionally, it appears there are 8.6 GBytes of total memory available on the GPU - I'm not going to complain about that.
So, how much additional memory is needed to perform a 3D FFT in matlab other than the starting matrix, and does performing one on a GPU make a difference?
That is, for some matrix A consisting of comlex numbers and of size (Nx*Ny*Nz) - Theoretically it should require (Nx*Ny*Nz)*16 bytes of memory. However in order to do a 3D FFT on that matrix, I believe it should require at least double that amount of memory when considering the transform matrix (including the zeros of that transform matrix). But it seems even more memory than that is required.
Hamza Butt on 13 May 2020
Edited: Hamza Butt on 13 May 2020
There are some additional memory requirements to consider while performing FFT operations:
While you have accounted for (1), the memory requirement for (2) and (3) can be difficult to estimate, as they rely on the internals of cuFFT. (4) will be constant for every MATLAB instance that uses gpuArrays.
In response to your question about the total available memory being 8.5899e+09 bytes: if you have CUDA installed, you can run "nvidia-smi" or "nvidia-smi --query-gpu=memory.total --format=csv" where you will find the total memory in "MiB". Note that "MiB" and "MB" are not the same, and for the case of the RTX 2070 Max-Q (and my RTX 2080 Max-Q), "8192MiB" translates to the value you are seeing in bytes.
I hope this answers your question. Please let me know if you would like any further clarifications.