MATLAB Answers

0

Sparse gpuArray accumulation in for-loop

Asked by CHEN ZIXIANG on 9 Oct 2019
Latest activity Commented on by CHEN ZIXIANG on 11 Oct 2019
I met a problem 'Out of memory' in Sparse gpuArray accumulation in my for-loop.
The following code is in a function. I need to accumulate the result 'KernelCurrent' of every loop into the grobal gpuArray Sparse 'Kernel'. In this function, 'KernelCurrent' is also a gpuArray Sparse and has the same size as 'Kernel'; (Size: 262144×262144)
I have tested all the other line of code in this function, which showed that the 'Out of memory' problem is caused by the operation of addition(accumulation). The storage memories requested for both 'Kernel' and 'KernelCurrent' is exactly less than the 'AvailableMemory' of the gpuDevice.
Kernel = gpuArray(sparse(num_row, num_col))
for
.
.
.
KernelCurrent = Result_oneLoop; % 'KernelCurrent' has the same size as 'Kernel'
Kernel = Kernel + KernelCurrent; % Causing the 'Out of mamory' problem
end
The gpuDevice that I can access:
Are there alternative method of coding for solving this problem ? Thanks in advance!

  2 Comments

Hi Chen,
How many elements do your sparse matrices have?
Hi Andrea,
The size of the sparse matrices is 262144×262144.(For both Kernel and KernelCurrent)

Sign in to comment.

1 Answer

Answer by Matt J
on 10 Oct 2019
Edited by Matt J
on 10 Oct 2019
 Accepted Answer

I would guess that your Kernel matrix is becoming less and less sparse as you accumulate until its memory consumption is growing beyond the GPU's capacity. Add the line below and re-run to check.
Kernel = gpuArray(sparse(num_row, num_col))
for
.
.
.
KernelCurrent = Result_oneLoop;
Kernel = Kernel + KernelCurrent;
percent_density=nnz(Kernel)/numel(Kernel)*100, %<---- Add this
end
How large does the percent_density become before the "Out of memory" occurs?

  1 Comment

Thank you for your answer!
Yes, the sparsity decreases very quickly as the accumulation goes on.
I finally try to keep the sparsity of Kernel by a Sparsity controlling vector(the size is 262144×1) with entries of 1 and 0(Only 6 elements of the vector is of value 1),now the code becomes:
Kernel = sparse([]);
parfor
.
.
.
KernelCurrent = Result_oneLoop; % 'KernelCurrent' has the size of (262144×1)
KernelCurrent = KernelCurrent.*Sparsity_Control_Vector;
Kernel = [Kernel, KernelCurrent];
end
As you can see, I don't apply 'gpuArray' anymore. However, the parallel computing pool still works. And now my problem is solved.

Sign in to comment.