Why am I getting an "out of memory on device" error when trying to run the Speaker Recognition Using X Vectors Example?

6 views (last 30 days)
I am trying to run the live script for the Speaker Recognition Using X Vectors example available here on the Matlab help center: https://www.mathworks.com/help/audio/ug/speaker-recognition-using-x-vectors.html. I have all the necessary toolboxes installed on my version of Matlab. I've tried running the example without making any changes, and I get an error at the code block starting at line 108. The error occurs at line 143 and reads
Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.
X = nnet.internal.cnngpu.batchNormalizationForwardPredict(...
xdata = internal_batchnorm(xdata, offset, scale, args.Epsilon, channelDim, "inference", ...
Error in xvecModel (line 121)
Y = batchnorm(Y, ...
"
It's clear to me that something is going wrong with the use of my computer's gpu in this example, but I don't understand gpu's and parallel processing well enough to understand what. If I try to change the execution environment from gpu to cpu in line 100, I still get the same error. If I enter "gpuDevice()" into the command line, I get the following information about my computer's gpu:
"ans =
CUDADevice with properties:
Name: 'GeForce RTX 2060 with Max-Q Design'
Index: 1
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
AvailableMemory: 4.4473e+09
MultiprocessorCount: 30
ClockRateKHz: 1185000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
"
I'm honestly not sure what the above gpu information means, but maybe it can be helpful in understanding why I'm getting this error?
If I try to reset the device like suggested by putting "gpuDevice(1)" right above line 143, I get the following errors:
"Error using gpuArray/reshape
The data no longer exists on the device.
Error in deep.internal.dlarray.extractConvolutionFilter>iConvertWeightsToInternalAPIformat (line 159)
weights = reshape(weights, szW);
Error in deep.internal.dlarray.extractConvolutionFilter>iSizesOfNumericWeights (line 149)
weights = iConvertWeightsToInternalAPIformat(weights,nbFiltersPerGroup,nbChannelsPerGroup,nSpatialDimsInX,nbGroups);
Error in deep.internal.dlarray.extractConvolutionFilter (line 40)
[weights, nbChannelsPerGroup, nbFiltersPerGroup, nbGroups] = iSizesOfNumericWeights(weightsData, nSpatialDimsInX);
Error in dlarray/dlconv (line 214)
[weights, filterSize, nbChannelsPerGroup, nbFiltersPerGroup, nbGroups, isWeightsLabeled] = deep.internal.dlarray.extractConvolutionFilter(weights, ...
Error in xvecModel (line 15)
Y = dlconv(X,parameters.conv1.Weights,parameters.conv1.Bias,'DilationFactor',1);
"
I've showed this to my advisor and we're going to try running it on a different computer to see if that helps at all. If you can help me understand what's going wrong on my computer and how to fix it, that would be very helpful and very much appreciated.

Answers (1)

Brian Hemmat
Brian Hemmat on 7 Jun 2021
Hi Joseph,
Are you sharing that GPU with other programs (for example, is it also being used for your graphics?)? That can cause out-of-memory issues.
You can try reducing the miniBatchSize (see line 79). The default is 128. Try 64 or 32. Larger sizes generally train faster, so this will come at the cost of a longer time training.
Setting ExecutionEnvironment to cpu should make the issue goes away. I suspect you changed the value of the dropdown but didn't run the cell again. Try:
  1. Close MATLAB and open a new session.
  2. Open the example.
  3. Before doing anything else in the example, change the ExectutionEnvironment value to cpu on line 100.
  4. Then run the script.
  5. If you are still getting a gpu-related error after that, please do update with that info.
Regarding using gpuDevice(1). You can try that outside of any training loop, not within it. gpuDevice(1) clears data from your GPU, hence the error message saying that the data no longer exists. In this example, instead of putting it on line 143, you could have tried it before before executing any lines of the example.
Best,
Brian
  2 Comments
Joseph McKinley
Joseph McKinley on 13 Jun 2021
OK, I've tried changing the batch sizes to 32 and 64 like you suggested, but I'm still getting the same "out of memory on device" errors for both cases. I also tried using the gpuDevice(1) before any of the code in the example and I still get the same "out of memory" errors. I also tried running the gpuDevice(1) right before the training loop and I get the same "out of memory" errors.
I also followed your suggestion about running the ExecutionEnvironment value as cpu on line 100 before running anything else, and then running the code, but I still get the exact same "out of memory" errors. It's possible that the code block starting in line 57 is automatically activating the gpu for use even though I have the ExecutionEnvironment set to cpu, and that's why I'm still getting the gpu errors. It's also definitely possible that my gpu is being used for other things such as graphics, I'm honestly not sure how to check on or control this. I will look into that.
I will try running this example on another computer and see if that helps. Thank you for your help.
Brian Hemmat
Brian Hemmat on 14 Jun 2021
I'm sorry--there is in fact an issue with the example. It appears minibatchqueue is casting to gpuArray if it detects a GPU and the required licenses. To turn off the use of the GPU, add the following line to the construction of the minibatchqueue:
mbq = minibatchqueue(dsTrain,numOutputs, ...
'MiniBatchSize',miniBatchSize, ...
'MiniBatchFormat',{'SCB','CB'}, ...
'MiniBatchFcn',@preprocessMiniBatch, ...
'OutputEnvironment','cpu'); % ADD THIS LINE, LINE 85 of example.
I'm very sorry for the time this cost you. Thank you for bringing this bug to our attention. Please let me know if you encounter further issues.
Also, depending on your system, it may still be faster to run the example with a GPU and batch sizes smaller than 32 than to run on the CPU. So you may want to experiment with even smaller batch sizes if running on your CPU is prohibitively time consuming.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!