This example shows how to generate CUDA® code for a deep learning network that classifies video and deploy the generated code onto the NVIDIA® Jetson Xavier board using the GPU Coder™ Support Package for NVIDIA GPUs. The deep learning network has both convolutional and bidirectional long short-term memory (BiLSTM) layers. The generated application reads the data from a specified video file as a sequence of video frames and outputs a label that classifies the activity in the video. This example generates code for the network trained in the Classify Videos Using Deep Learning example from the Deep Learning Toolbox (TM). For more information, see Classify Videos Using Deep Learning (Deep Learning Toolbox).
Target Board Requirements
NVIDIA Jetson board.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
Supported Jetpack SDK that includes CUDA and cuDNN libraries
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards (GPU Coder Support Package for NVIDIA GPUs) for NVIDIA boards.
To generate and deploy code to an NVIDIA® Jetson Xavier board, you will need the GPU Coder Support Package for NVIDIA GPUs. Use the
checkHardwareSupportPackageInstall function to verify that the host system is compatible to run this example. If the function does not throw an error, the support package is correctly installed.
The GPU Coder Support Package for NVIDIA GPUs uses an SSH connection over TCP/IP to execute commands while building and running the generated CUDA code on the Jetson platform. You must therefore connect the target platform to the same network as the host computer or use an Ethernet crossover cable to connect the board directly to the host computer. Refer to the NVIDIA documentation on how to set up and configure your board.
To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the
jetson (GPU Coder Support Package for NVIDIA GPUs) function. You must know the host name or IP address, username, and password of the target board to create a live hardware connection object. For example, when connecting to the target board for the first time, create a live object for Jetson hardware by using the command:
hwobj = jetson('jetson-name','ubuntu','ubuntu');
The jetson object reuses these settings from the most recent successful connection to the Jetson hardware. This example establishes an SSH connection to the Jetson hardware using the settings stored in memory.
hwobj = jetson;
Checking for CUDA availability on the Target... Checking for 'nvcc' in the target system path... Checking for cuDNN library availability on the Target... Checking for TensorRT library availability on the Target... Checking for prerequisite libraries is complete. Gathering hardware details... Checking for third-party library availability on the Target... Gathering hardware details is complete. Board name : NVIDIA Jetson AGX Xavier CUDA Version : 10.0 cuDNN Version : 7.5 TensorRT Version : 5.1 GStreamer Version : 1.14.5 V4L2 Version : 1.14.2-1 SDL Version : 1.2 Available Webcams : Available GPUs : Xavier
In case of a connection failure, a diagnostics error message is reported on the MATLAB command line. If the connection has failed, the most likely cause is incorrect IP address or hostname.
coder.checkGpuInstall function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('jetson'); envCfg.DeepLibTarget = 'cudnn'; envCfg.DeepCodegen = 1; envCfg.Quiet = 1; envCfg.HardwareObject = hwobj; coder.checkGpuInstall(envCfg);
net_classify entry-point function hardcodes the name of a video file. Note that this hardcoded path must be adjusted to the location of the video file on your target hardware. The entry-point function then reads the data from the file using a
VideoReader object. The data is read into MATLAB as a sequence of images (video frames). This data is then center-cropped, and finally passed as input to a trained network for prediction. Specifically, the function uses the network trained in the Classify Videos Using Deep Learning example. The function loads the network object from the
net.mat file into a persistent variable and reuses the persistent object for subsequent prediction calls.
function out = net_classify() %#codegen if coder.target('MATLAB') videoFilename = 'situp.mp4'; else videoFilename = '/home/ubuntu/VideoClassify/situp.mp4'; end frameSize = [1920 1080]; % read video video = readVideo(videoFilename, frameSize); % specify network input size inputSize = [224 224 3]; % crop video croppedVideo = centerCrop(video,inputSize); % A persistent object mynet is used to load the series network object. % At the first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is reused % to call predict on inputs, thus avoiding reconstructing and reloading the % network object. persistent mynet; if isempty(mynet) mynet = coder.loadDeepLearningNetwork('net.mat'); end % pass in cropped input to network out = classify(mynet, croppedVideo);
The network used to classify video input has a few notable features:
1. The network has a sequence input layer to accept images sequences as input.
2. The network uses a sequence folding layer followed by convolutional layers to apply the convolutional operations to each video frame independently, thereby extracting features from each frame.
3. The network uses a sequence unfolding layer and a flatten layer to restore the sequence structure and reshape the output to vector sequences, in anticipation of the BiLSTM layer.
4. Finally, the network uses the BiLSTM layer followed by output layers to classify the vector sequences.
To display an interactive visualization of the network architecture and information about the network layers, use the
analyzeNetwork (Deep Learning Toolbox) function.
Download the video classification network.
Loop over the individual frames of
situp.mp4 to view the test video in MATLAB.
videoFileName = 'situp.mp4'; video = readVideo(videoFileName); numFrames = size(video,4); figure for i = 1:numFrames frame = video(:,:,:,i); imshow(frame/255); drawnow end
net_classify and note the output label. Note that if there is a host GPU available, it will be automatically used when running
ans = categorical situp
To generate a CUDA executable that can be deployed to an NVIDIA target, create a new GPU coder configuration object for generating an executable. Set the target deep learning library to 'cudnn'.
clear cfg cfg = coder.gpuConfig('exe'); cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
coder.hardware function to create a configuration object for the Jetson platform and assign it to the
Hardware property of the GPU code configuration object
cfg.Hardware = coder.hardware('NVIDIA Jetson');
Set the build directory on the target hardware. Change the example path below to the location on your target hardware where you would like the generated code to be placed.
cfg.Hardware.BuildDir = '/home/ubuntu/VideoClassify';
The custom main file
main.cu is a wrapper that calls the
net_classify function in the generated library.
cfg.CustomInclude = '.'; cfg.CustomSource = fullfile('main.cu');
Run the codegen command. This time, code will be generated and then copied over to the target board. The executable will then be built on the target board.
codegen -config cfg net_classify
Copy the test video file
situp.mp4 from the host computer to the target device by using the
putFile command. Ensure that this video file is placed in the location hardcoded in the entry-point function
net_classify. In this example, this location happens to be the target hardware build directory.
runApplication to launch the application on the target hardware. The label will be displayed in the output terminal on the target.
### Launching the executable on the target... Executable launched successfully with process ID 22968. Displaying the simple runtime log for the executable... Note: For the complete log, run the following command in the MATLAB command window: system(hwobj,'cat /home/ubuntu/VideoClassify/MATLAB_ws/R2021a/home/nmalimba/Documents/MATLAB/Examples/deeplearning_shared-ex98434544/net_classify.log')