Many small Eigenvalue Decompositions in parallel on GPU?
Show older comments
I have some code that involves a couple billion 3x3 and 4x4 eigenvalue decompositions. I have run this code with parfors on the CPU and the runtime is just barely bearable, but I'd really like to speed this up.
I have a GTX 780 available. I realize that a GPU is generally better suited for large matrix operations than a large number of small matrix operations. I looked at pagefun, which looks like the best way that Matlab has to run many small matrix operations in parallel. However, the functions available for pagefun are all element by element operations, with a few exceptions such as mtimes, rdivide, and ldivide. Unfortunately eig is not one of those functions.
Is there any other way to run this code on the GPU?
2 Comments
Are you sure you mean "several thousand"? My old machine from 2008 can do 10000 such decompositions without breaking a sweat,
>> tic; for i=1:10000, eig(rand(4)); end; toc
Elapsed time is 0.196188 seconds.
ervinshiznit
on 16 Aug 2015
Answers (3)
Brian Neiswander
on 18 Aug 2015
The "pagefun" function does not currently support the function "eig". However, note that the "eig" function will accept GPU arrays generated with the "gpuArray" function:
X = rand(1e3,1e3);
G = gpuArray(X);
Y = eig(G);
Depending on your data, this can be faster than the non-GPU approach but it is not parallelized across the pages.
It is possible to implement your own CUDA kernel using the CUDAKernel object or MEX functions. This allows for you to process custom functions using a distribution scheme of your choice. See the links below for more information:
2 Comments
ervinshiznit
on 19 Aug 2015
Birk Andreas
on 16 Jul 2019
So, its already 2019 and there are already some MAGMA eigenvalue functions implemented. However, still no eig for pagefun...
What prevents the progress?
Could you give an estimate, when it will be implemented?
It would really be very welcome!
Joss Knight
on 21 Aug 2015
Edited: Joss Knight
on 21 Aug 2015
Have you tried just concatenating your matrices in block-diagonal form and calling eig? You may then be limited by memory, but the eigenvalues and vectors of a block-diagonal system are just the union of the eigenvalues and vectors of the blocks:
N = 1000;
A = rand(3,3,N);
maskCell = mat2cell(ones(3,3,N),3,3,ones(N,1));
mask = logical(blkdiag(maskCell{:}));
Ablk = gpuArray.zeros(3*[N,N]);
Ablk(mask) = A(:);
[Vblk,Dblk] = eig(gpuArray(Ablk));
V = reshape(Vblk(mask), [3 3 N]);
D = reshape(Dblk(mask), [3 3 N]);
You should then find that A(:,:,i)*V(:,:,i) == V(:,:,i)*D(:,:,i) as required. Because of the way eigendecomposition works, I would expect the extra unnecessary zeros not to affect the performance much, the system should converge straightforwardly and parallelize well.
5 Comments
ervinshiznit
on 21 Aug 2015
ervinshiznit
on 21 Aug 2015
Joss Knight
on 24 Aug 2015
What is the bottleneck? Is eig itself slower on the GPU than the CPU? Run
Gblk = gpuArray(Ablk);
timeit(@()eig(Ablk),2)
gputimeit(@()eig(Gblk),2)
Joss Knight
on 24 Aug 2015
Also, I see that the GTX 780 has a terrible double-precision performance of 166 GFlops versus 3977 for single precision. Try running your code in single precision.
kunx
on 22 Jan 2022
thank you. your direction is very helpful.
James Tursa
on 20 Aug 2015
1 vote
If you just need the eigenvalues, you might look at this FEX submission by Bruno Luong:
Maybe you can expand it for 4x4 as well.
4 Comments
ervinshiznit
on 20 Aug 2015
Joss Knight
on 21 Aug 2015
You don't need to use an explicit formula - pagefun supports mldivide (the backslash \ operator) for solving small systems.
ervinshiznit
on 21 Aug 2015
Joss Knight
on 24 Aug 2015
Edited: Joss Knight
on 24 Aug 2015
Why do you need to transfer 3x3 and 4x4 matrices to the GPU independently? Just transfer it all as one 3D array. You have to anyway to use pagefun.
Categories
Find more on GPU Computing in MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!