Skip to content
MathWorks - Mobile View
  • Melden Sie sich bei Ihrem MathWorks Konto anMelden Sie sich bei Ihrem MathWorks Konto an
  • Access your MathWorks Account
    • Eigener Account
    • Mein Community Profil
    • Lizenz zuordnen
    • Abmelden
  • Produkte
  • Lösungen
  • Forschung und Lehre
  • Support
  • Community
  • Veranstaltungen
  • MATLAB erhalten
MathWorks
  • Produkte
  • Lösungen
  • Forschung und Lehre
  • Support
  • Community
  • Veranstaltungen
  • MATLAB erhalten
  • Melden Sie sich bei Ihrem MathWorks Konto anMelden Sie sich bei Ihrem MathWorks Konto an
  • Access your MathWorks Account
    • Eigener Account
    • Mein Community Profil
    • Lizenz zuordnen
    • Abmelden

Videos und Webinare

  • MathWorks
  • Videos
  • Videos Homepage
  • Suche
  • Videos Homepage
  • Suche
  • Vertrieb kontaktieren
  • Testsoftware
5:14 Video length is 5:14.
  • Description
  • Full Transcript
  • Related Resources

Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

See how to quantize, calibrate, and validate deep neural networks in MATLAB® using a white-box approach to make tradeoffs between performance and accuracy, then deploy the quantized DNN to an embedded GPU and an FPGA hardware board.

Using the Deep Learning Toolbox™ Model Quantization Library, you can quantize deep neural networks such as Squeezenet. During calibration, the tool collects required ranges for weights, biases, and activations, then provides visualization that represents histogram distributions of the calibrated dynamic ranges in power of two scale. You can then deploy the quantized network using GPU Coder™ to an NVIDIA® Jetson® AGX Xavier that achieves 2x speedup in performance and 4x reduction in memory usage, and with only about 3% top-1 accuracy loss compared with single precision implementation.

See how to use the tool to quantize and deploy networks to a Xilinx® ZCU102 board connected to a high-speed camera. The original deep neural network had throughput of 45 frames per second. Using the Deep Learning Toolbox Model Quantization Library, you can quantize the networks to INT8, boosting the throughput to 139 frames per second while maintaining the right prediction results.

In this demonstration, we’ll show the workflow to quantize deep learning networks and deploy them to GPUs and FPGAs from MATLAB.

Deploying deep learning networks to edge devices is challenging as deep learning networks can be quite compute intensive. For example, simple networks like AlexNet is over 200 MB while larger ones like VGG-16 is north of 500 MB.

Quantization helps to reduce the size of the network by converting floating point values used in the networks to smaller bit-widths while keeping the precision loss to a minimum.

Starting in R2020a, we released the ability to quantize deep learning algorithms using a white-box, easy-to-use iterative workflow. This approach helps you to make tradeoffs between performance and accuracy.

To see this workflow in action, let’s take an example of detecting defects in nuts and bolts that you might find in manufacturing.

Let’s say this is part of inspecting a production line, so we need to use a high-speed camera processing at a 120 frames / sec.

Requirements from system engineering will involve metrics like accuracy, latency of the network, and overall hardware cost, …

and they often drive tradeoff of choices during the design and implementation of the network.

This application includes…

1) Preprocessing logic that resizes and selects a region of interest, ...

2) Using the pretrained network to detect where the part is defective or not, …

3) And finally postprocessing to annotate the result on the screen.

Let’s get started with quantization workflow by looking at deployment to embedded GPUs.

Quantizing and deploying to GPUs running on NVIDIA Jetson AGX Xavier achieves 2X speed up in performance and 4X memory reduction, and with only around 3 % top-1 accuracy loss compared with single precision implementation.

This example uses Squeezenet that consumes 5 MB of disk memory.

To start, we first download the Deep Learning Quantization Support Package from the Add-on Explorer and then launch the app.

Once we load the network to quantize for GPU target, we then calibrate with a datastore that has already been set up. Calibration runs a set of images through the network to collect required ranges for weights, biases, and activations.

The visualization represents histogram distributions of the calibrated dynamic ranges in power of two scale. The gray in the histograms shows data that cannot be represented by the quantized type, while the blue shows what can be represented by the quantized type. Finally, darker colors are higher frequency bins.

If this is acceptable, we quantize the network and load a datastore to validate the accuracy of the quantized network.

Here is the result. Memory has been reduced by 74 percent with no loss in top-1 accuracy compared with the original floating-point network when measured on a desktop GPU.

Once we validated results and export the dlquantizer workflow object, we can use GPU Coder to deploy the quantized network onto the NVIDIA Jetson board.

We run inference for defective.png, we expect this image to get classified as defective bolt.

Now let’s turn our attention to quantizing and deploying networks to a Xilinx ZCU102 board. The network uses 34 MB of memory for learnable parameters and a runtime memory of 200 MB.

With these 5 lines of MATLAB code, we can load the single precision bitstream running on the ZCU102 board. We see that it uses 84 MB of memory with a throughput of 45 frames per second. This is not fast enough for our high-speed camera.

Let’s choose to quantize for FPGA.

Once the quantization workflow is completed, we’ll export the quantized network to the MATLAB workspace.

The quantized network needs to run on a processor quantized to INT8, so we’ll use the INT8 version of our downloaded zcu102 bitstream.

After compiling, the parameters have been reduced to 68 MB and we can run the network at 139 frames per second. We are getting the right prediction results as well.

So as you can see, the Deep Learning Quantization app helps you to reduce the size of the deep learning network for GPUs and FPGAs while minimizing the loss in accuracy. If you’re interested to learn more, take a look at the Deep Learning Toolbox Model Quantization Library in R2020a or the latest R2020b.

Related Products

  • Deep Learning Toolbox
  • Deep Learning HDL Toolbox
  • GPU Coder

3 Ways to Speed Up Model Predictive Controllers

Read white paper

A Practical Guide to Deep Learning: From Data to Deployment

Read ebook

Bridging Wireless Communications Design and Testing with MATLAB

Read white paper

Deep Learning and Traditional Machine Learning: Choosing the Right Approach

Read ebook

Hardware-in-the-Loop Testing for Power Electronics Control Design

Read white paper

Predictive Maintenance with MATLAB

Read ebook

Electric Vehicle Modeling and Simulation - Architecture to Deployment : Webinar Series

Register for Free

How much do you know about power conversion control?

Start quiz

Feedback

Featured Product

Deep Learning Toolbox

  • Request Trial
  • Get Pricing

Up Next:

24:56
Optimal Neural Network for Automotive Product Development

Related Videos:

27:59
Deep Learning for Computer Vision
7:35
Deep Learning for Computer Vision with MATLAB (Highlights)
15:14
Modeling a Powertrain in Simscape in a Modular Vehicle...
44:19
Generating Optimal Engine Calibrations and Real-Time Engine...

View more related videos

MathWorks - Domain Selector

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (简体中文)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
    • 简体中文Chinese
    • English
  • 日本Japanese (日本語)
  • 한국Korean (한국어)

Contact your local office

  • Vertrieb kontaktieren
  • Testsoftware

MathWorks

Accelerating the pace of engineering and science

MathWorks ist der führende Entwickler von Software für mathematische Berechnungen für Ingenieure und Wissenschaftler.

Entdecken Sie…

Produkte

  • MATLAB
  • Simulink
  • Software für Studierende
  • Hardware-Unterstützung
  • File Exchange

Testen oder Kaufen

  • Downloads
  • Testsoftware
  • Vertrieb kontaktieren
  • Preise und Lizenzierung
  • Store

Lernen

  • Dokumentation
  • Tutorials
  • Beispiele
  • Videos und Webinare
  • Schulungen

Support

  • Hilfe zur Installation
  • MATLAB Answers
  • Consulting
  • License Center
  • Support kontaktieren

Über MathWorks

  • Jobs & Karriere
  • Newsroom
  • Soziales Engagement
  • Berichte von Anwendern
  • Über MathWorks
  • Select a Web Site United States
  • Trust Center
  • Handelsmarken
  • Datenschutz
  • Datendiebstahl verhindern
  • Status von Anwendungen

© 1994-2022 The MathWorks, Inc.

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • LinkedIn
  • RSS

Folgen Sie uns