# Extreme memory usage when using gather

6 views (last 30 days)

Show older comments

Lorenco Santos Vasconcelos
on 28 Mar 2024

Edited: Joss Knight
on 14 Apr 2024

Hi! I have a code that solves a tridiagonal system and, because of the large sizes of the matrices, I'm trying to solve the A\V on GPU.

At each iteration I modify the diagonal elements and need to solve on the GPU like the below code:

After the GPU have solved the system, I need to get the present result and store in a matrix. For this I'm using abs(gather(head(ffield,savez-TERRAIN+1))); and when this commands are called, the memory usage goes beyond 130 GB.

What is happening here to the memory goes crazy like that? Also, there are any tips to improve my code? In summary, I have to solve a large tridiagonal system at each iteration of a large loop (like hundreds of thousands iterations). It'd be better if I could create the matrices on the GPU once and simply modify its diagonals values directly on the GPU, but I couldn't do this yet, so I'm modifying a normal matrix and sending it to the GPU again at each iteration.

A = spdiags([[subdiagA;0] zeros(N-1,1) [0;superdiagA]],-1:1,N-1,N-1);

B = spdiags([[subdiagB;0] zeros(N-1,1) [0;superdiagB]],-1:1,N-1,N-1);

ffield = gpuArray(complex(field));

indicesDiag = (1:N:(N-1)^2)';

for m = 2:M

a_mj = k0^2*delz^2*(m_ref_curr(2:end).^2 - 1);

% modify matrix A diagonal

diagA = c*(-2 + b + a_mj);

% modify matrix B diagonal

diagB = conj(c)*(-2 + conj(b) + a_mj);

% some boundary conditions

km = delz*(alpha_BC(m) + 1i*k0*0.5*slopes(m)^3);

if PEC && POL == 'V'

km = delz*1i*k0*0.5*slopes(m)^3;

end

% modify first and last elements of matrices A and B diagonals

diagA(1) = diagA(1) + c/(1 - km + 0.5*km^2);

diagB(1) = diagB(1) + conj(c)/(1 - km + 0.5*km^2);

if POL == 'H' % DIRICHLET

diagA(end) = 1;

diagB(end) = 1;

elseif POL == 'V' % NEUMANN

diagA(end) = diagA(end)/2;

diagB(end) = diagB(end)/2;

end

% update the diagonal terms of matrix A and send to GPU

A(indicesDiag) = diagA;

AA = gpuArray(complex(A)); % AA is the version of A on the GPU

% update the diagonal terms of matrix A and send to GPU

B(indicesDiag) = diagB;

BB = gpuArray(complex(B)); % BB is the version of matrix B on the GPU

% update another GPU array

ffield = ffield.*exp(-1i*k0*slopes(m)*z);

VV = BB*ffield(2:end); % operation on GPU

% solve the tridiagonal system on GPU

ffield(2:end) = AA\VV;

ffield(1) = ffield(2)/(1 - km + 0.5*km^2);

% some operations to prepare for the next operation

ffield = ffield.*exp(1i*k0*(slopes(m)*z + 0.5*slopes(m)^2*delx + delx));

ffield(zwin:end) = ffield(zwin:end).*win;

% here i need to get the result that is in the GPU array ffield and

% store part of it in a normal matrix u()

TERRAIN = floor(terreno(m)/delz)+1;

u(TERRAIN:end,m) = abs(gather(head(ffield,savez-TERRAIN+1))); %% when this line is executed, memory usage goes above 130 GB and the thing crashes

end

##### 8 Comments

Joss Knight
on 1 Apr 2024

Edited: Joss Knight
on 1 Apr 2024

I would have to do a deeper dive to give really good advice, but I can surmise this much. If your system is tridiagonal then store it as an n-by-3 matrix on the GPU and update it as such; only create a sparse matrix out of it to solve the system. The slowness of spdiags on the GPU is I think only because diagA is not a gpuArray. Make it one and it should speed things up.

So I would probably make diagA and diagB gpuArray objects and store A and B (and AA and BB) as 3-column dense matrices, for which you are updating the middle column. When you do the system solve, convert to sparse only temporarily:

ffield(2:end) = spdiags(AA,-1:1)\VV;

I'm assuming here that you are using R2021a or later, which was when sparse tridiagonal solves were optimized.

### Accepted Answer

Joss Knight
on 1 Apr 2024

Edited: Joss Knight
on 14 Apr 2024

As others have worked out, it looks like the issue is the indexed assignment into u:

u(TERRAIN:end,m) = abs(gather(head(ffield,savez-TERRAIN+1)));

Because u is in the base workspace, this needs to make a second copy of u during the assignment, to protect your workspace.

The normal way to fix this is to make your code into a function which returns u, so that it doesn't exist in the base workspace while the function is running. If this is impossible for you, there are ways to 'move' the variable u into a function for update, but they are a little bit advanced and I'm not even sure how safe it is to give this advice...

...

u = updateU(u, head, ffield, savez, TERRAIN, m);

...

function u = updateU(u, head, ffield, savez, TERRAIN, m)

% Delete u in parent workspace

assignin('base', 'u', []);

% Now the only copy of u is local and the assignment can be inplace

% BUT!! IF THIS ERRORS, u WILL BE DESTROYED PERMANENTLY!!

u(TERRAIN:end,m) = abs(gather(head(ffield,savez-TERRAIN+1)));

end

Of course, you could also try to rework your code so that you're not storing hundreds of gigs of data in a single variable. Even breaking it up into a cell array (e.g. each column is a different cell element) would help because only the element being overwritten would need to be copied.

##### 0 Comments

### More Answers (1)

Taylor
on 28 Mar 2024

##### 3 Comments

Taylor
on 28 Mar 2024

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!