gpuArray sparse memory usage

12 views (last 30 days)
Zheng Gu
Zheng Gu on 7 Aug 2015
Commented: Zheng Gu on 12 Aug 2015
I have a gpu with about 2GB of available memory:
CUDADevice with properties:
Name: 'Quadro K1100M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6.5000
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 2.0154e+09
MultiprocessorCount: 2
ClockRateKHz: 705500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
However, I'd like to load a sparse array into it (R2015A, which supports sparse GPUarray):
whos('pxe')
Name Size Bytes Class Attributes
pxe 5282400x5282400 1182580904 double sparse, complex
I get an error upon trying to copy it to GPU though:
gpxe = gpuArray(pxe);
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
I'm not sure what the problem here is? Trying it with smaller sized sparse arrays will work, but I'm still well within the memory limits here. Is there some kind of hidden maximum size, or is it that we are not allowed to actually use most of the GPU memory? This would theoretically take up less than 60% of GPU memory.
Edit: trying smaller arrays and loading multiple ones into GPU memory:
Trial>> gpu = gpuDevice;
Trial>> mem1 = gpu.FreeMemory;
Trial>> gpxe = gpuArray(pxet.');
Trial>> mem2 = gpu.FreeMemory;
Trial>> gpye = gpuArray(pyet.');
Trial>> mem3 = gpu.FreeMemory;
Trial>> gpxi = gpuArray(pxit.');
Trial>> mem4 = gpu.FreeMemory;
Trial>> gpyi = gpuArray(pyit.');
Trial>> mem5 = gpu.FreeMemory;
Sizes of these arrays are theoretically:
whos('pxet','pyet','pxit','pyit')
Name Size Bytes Class Attributes
pxet 211600x211600 47266024 double sparse, complex
pxit 211600x211600 47266024 double sparse, complex
pyet 211600x211600 47266024 double sparse, complex
pyit 211600x211600 47266024 double sparse, complex
Sequential memory footprint in the GPU:
Trial>> mem1-mem2
ans =
147456000
Trial>> mem2-mem3
ans =
39059456
Trial>> mem3-mem4
ans =
39059456
Trial>> mem4-mem5
ans =
39059456
So the very first one preallocates a huge chunk of memory, and subsequent ones take up less space than they should? Seems to me like I need to have enough GPU memory to fit the initial preallocation that's about 3 times as big as it needs to.
  2 Comments
Matt J
Matt J on 7 Aug 2015
Have you tried rebooting?
Zheng Gu
Zheng Gu on 7 Aug 2015
Just tried rebooting but the result is the same

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 12 Aug 2015
The first time you start up any of the GPU support within MATLAB, a series of libraries are loaded, and these consume memory on the GPU. Sparse gpuArray uses a different representation compared to the CPU (it uses CSR layout, and 4-byte integers for indices) which explains why the number of bytes consumed by a given sparse matrix is different on the GPU and the CPU. Converting between these formats requires additional storage on the GPU, which almost certainly explains why you cannot create the large sparse matrix on the GPU.
  3 Comments
Edric Ellis
Edric Ellis on 12 Aug 2015
You're quite right, sorry for not spelling that out. On the CPU, MATLAB uses Compressed Sparse Column format; on the GPU, gpuArray uses Compressed Sparse Row since it generally has better parallel performance, and better library support. Unfortunately, this means we need to perform the (relatively expensive) format conversion when sending/gathering sparse data.
Zheng Gu
Zheng Gu on 12 Aug 2015
Thanks for the answer. The libraries being loaded are understandable, and in the grand scheme of things fairly negligible (it looks like 100MB or so). The format conversion is problematic - since I get an error loading a 1.2GB matrix into 2GB VRAM, throw in the libraries and it looks like converting formats takes up about 700MB, more than half the total size of the matrix itself? Is there a way to convert it in system RAM, and then send to GPU?

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!