GPU Device is not recognised in Matlab Deep-Learning offical docker image

6 views (last 30 days)
Hello,
I am trying to used Matlab-deep-learning (mathworks/matlab-deep-learning Tags | Docker Hub R2023b) docker image on our HPC server (slurm -based).
I am using srun utility to run the docker image:
srun \
--time=0-02:00:00 --gpus-per-node=1 --container-image=mathworks/matlab-deep-learning:r2023b \
--container-name=matlabDeepLearningGPU --pty bash
When launching the image, nvidia-smi returns the following, showing the CUDA version to be N/A.
When I ran matlab and execute gpuDevice(), I get the following error:
I am wondering if this is an issue with the docker image provided by matlab, or is it related to the drivers installed on the host or maybe something else...?
I am getting the same error where i use NVIDIA GeForce RTX 3090 , NVIDIA H100 NVL , RTX 2080Ti....
Thank you!

Answers (1)

Michael
Michael on 9 Sep 2024
Hi @Mahdi,
Thanks for reaching out about this.
This error looks like the one seen when the container has not been started using the Nvidia container runtime correctly.
To do this using docker you need to install the nvidia-container-toolkit and then ensure that both the driver is installed and the GPUs passed into the container runtime. For Docker this is done by passing --gpus all when running the container and for Singularity this done by passing -nv.
You should be able to test these if you have interactive access to any machines where these GPUs are available, but for more information in your case you may need to speak to the system administrators of the HPC you are using to determine if the correct flags are being passed when the containers are run.
Hope that is helpful,
Michael
  1 Comment
Mahdi
Mahdi on 13 Sep 2024
Edited: Mahdi on 13 Sep 2024
Hello,
Thank you for the reply. In fact, I couldn't add the -nv option (neither --nv) to my srun command, and i couldn't use it by calling srun .... exec docker run .... Maybe it is a limitation by the adminstraition or by how slurm is configured...
I ended up using the image from : nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 and building a docker image containing matlab, based on it using matlab installer dockerfile provided by mathworks. Using this, I managed to access nvcc and nvidia-smi inside the container image and the GPUs where passed to matlab successfully. This worked without adding other options to the srun command above (except adapting the container name and image options to adapt to this new build). Hence it seems that slurm is passing the gpus and the drivers are installed on the HPC machine, given the image worked. Maybe it is an issue of compatibility of our slrum and the mathworks/matlab-deep-learning image?

Sign in to comment.

Categories

Find more on Containers in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!