CPU vs. GPU : Can I estimate the improvement by the provided code?

I am using i5 CPU which is estimated as <100Gigaflops. And according to Nvidia, Tesla k80 is about 2Teraflops.
Matlab provides Nvidia Cuda. So If I have a nice GPU, I can utilize it by a simple code. e.g., A=gpuArray(A);
The below is my current problem : Crank-Nicolson method for 1D schroedinger equation.
I just represent a significant part of the code. It won't work by own, yet I can send a complete code who want to benchmark this code by Cuda.
a=1i*hbar/2/m;
A(1:length(x),1:length(x))=zeros; %length(x)=6141
for k=1:round(1.5*tau/dt); % round(1.5*tau/dt)=1e5~1e6
nn=round(k/steps)+1;
Utemporary(1,:)=Utemp(nn,:)+(Utemp(nn+1,:)-Utemp(nn,:))*mod(k,steps)/steps;
c(1,:)=-1i*Utemporary/hbar;
kapsi1=2*dt*a*circshift(psi,[0,1]);
kapsi2=2*dt*a*circshift(psi,[0,-1]);
D(1,:)=kapsi1+kapsi2+(4*dx^2-4*dt*a).*psi+(2*dx^2*dt*c).*psi;
for q=1:length(x)-1;
A(q,q)=(4*dx^2+4*dt*a-2*dx^2*dt*c(q));
A(q+1,q)=-2*dt*a;
A(q,q+1)=-2*dt*a;
end;
A(end,end)=(4*dx^2+4*dt*a-2*dx^2*dt*c(end));
psi=A\D';
psi=psi';
if mod(k,DT)==0;
floor(k/DT)
psiev(floor(k/DT),:)=psi;
end;
end;
The main resource sink is psi=A\D'; It is already optimized by Gaussian elimination (matlab library).
So, here is my question.
If I transfer every elements, and matrix into GPU by gpuArray, and If I use Tesla K80, Can I achieve 20 times faster result?

 Accepted Answer

MATLAB will see the K80 device as two separate CUDA devices. Each MATLAB process can access only one device - so to take full advantage of the K80, you'll need to run a parallel pool. In a single MATLAB session, you should see a reasonable speedup. I just tried on a 32-core machine (16 physical cores) with a K80 using R2015b. For a 20000x20000 matrix, on the GPU the \ operator took ~7 seconds; on the CPU, it took ~27 seconds.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!