Preconditioning for iterative solvers on GPU - Performance issues

10 views (last 30 days)
Dear all,
I'm experimenting some preconditioners for iterative solvers on GPU in a linear system [A]{x}={B}. The problem is defined by this simple command line:
sol=pcg(A_gpu,B_gpu,tol,maxit,P)
where A and B are gpuArrays and P is the preconditioner.
Some simple tests point out that the solution is faster than any iterative CPU solver, whenever P=[ ], with speedups up to 12x;
However, what I still can't figure out, is the reason why the performance drops whenever any type of preconditioner is selected. For an instance, using Incomplete Cholesky factorization:
L=ichol(A)
sol=pcg(A_gpu,B_gpu,tol,maxit,L*L')
Blows out the performance when compared to no preconditioner at all on the GPU. The solution is even slower than the CPU version, where this same preconditioner improves the CPU performance by 1.5x. That's really strange.
I've also tried passing A_gpu as preconditioner, but the solution takes forever:
sol=pcg(A_gpu,B_gpu,tol,maxit,A_gpu)
This issue is also related to other iterative solvers, such as: BICG and SYMMLQ
Am I doing something wrong? It appears that any preconditioner on the GPU is acting as a drawback, even when it is efficient for the CPU version.
Please share your thoughts and experiences. Thanks!
  7 Comments
Paulo Ribeiro
Paulo Ribeiro on 21 Nov 2019
Edited: Paulo Ribeiro on 22 Nov 2019
Thanks Joss. These are really impressive results on a Titan V. It's even faster than a backslash solver A\B on the CPU with an Intel i7 8700:
tic; A\B; toc
Elapsed time is 1.712258 seconds.
For this specific case it appears that the best option is to avoid preconditioning on the GPU.
Regards.
Joss Knight
Joss Knight on 25 Nov 2019
I investigated further and found that applying the preconditioner - not just decomposing it - does appear to be taking an unusually long time. This does warrant further investigation, since these two triangular solves should be fast, and your system matrix is band-diagonal. It does have quite a large bandwidth of 543 however, so that could be the issue.
Iterative solvers are always faster than direct solves for large sparse matrices (assuming they have reasonable convergence properties). Direct solves are hugely memory intensive because there is a lot of fill-in during factorization.

Sign in to comment.

Answers (0)

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!