GPU memory overhead dependent on fft dimension.

Question

0 votes

Hello all, I have a question regarding memory management during Matlab's gpuArray/fft operation. I have a large NxM matrix [N = 10E3,M = 20E3, as an approx] where where I wish to take an fft in the M dimension. Now, for CPU operations I would normally permute the matrix to make the fft operation act in the 1st (column) dimension, for speed.

On the GPU, if I run the fft operation in the 1st dimension, I slam into the memory ceiling of my GPU. However, if I apply it in the row dimension I do not. I assume that this has to do with whether Matlab is doing N asynchronous fft's in the row direction, vs. a single massive matrix operation in the column dimension.

So, 4 questions:

Is my assumption true?
Are GPU operations still faster in the column direction (sort of answered this myself, got 3x speed advantage with below snippet.)
Is there a way to know what the GPU memory need will be for the fft? If so, I can try chunking up the fft based on the GPU memory available.
Is there another implementation that will have the speed of the column operation without the memory issues? I am going to try doing this as an arrayfun just to see.

Code snippet:

 x = gpuArray.rand(10000,10000);
xp = x.';
gputimeit(@() fft(x,[],1))
gputimeit(@() fft(xp,[],2))

Thanks all.

1 Comment
Show -1 older comments Hide -1 older comments

D. Plotnick on 2 Jul 2018

Open in MATLAB Online

As I suspected, arrayfun (at least my way of using it) is way slower.

 f = @(i) fft(x(:,i),[],1);
tic
y = arrayfun(f,1:size(x,2),'UniformOutput',false);
wait(g);
y = cat(2,y{:});
toc

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 2 Jul 2018

Open in MATLAB Online

0 votes

MATLAB uses cufft, so the behaviour is whatever its behaviour is. The implication of the batching API as described by the doc - https://docs.nvidia.com/cuda/cufft/index.html - is that batches that are contiguous result in multiple kernel launches. This will be slower, but more efficient with memory.

Because the amount of memory an FFT needs is so variable and dependent on signal length, it isn't that valuable to know what the size will be for any particular example. If you're curious you can watch the FreeMemory property output from gpuDevice:

gpu = gpuDevice
gpu.FreeMemory

After an FFT the FFT plan is retained so you should see how much memory it took up (as long as it's the first FFT you do in the MATLAB session). For working memory you can assume there will be a copy of the input, possibly two because MATLAB itself will often take a copy of the input in order to ensure your data is not corrupted in the event of an error.

If you can get your signals to be a power of 2 in length (say, 8192) you'll find them much more efficient with memory.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

GPU memory overhead dependent on fft dimension.

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

GPU memory overhead dependent on fft dimension.

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments