Detect Objects Using YOLO v3 Network Deployed to FPGA

This example uses:

This example shows how to deploy a trained you only look once (YOLO) v3 object detector to a target FPGA board. You then use MATLAB® to retrieve the object classification from the FPGA board.

Compared to YOLO v2 networks, YOLO v3 networks have additional detection heads that help detect smaller objects.

Create YOLO v3 Detector Object

In this example, you use a pretrained YOLO v3 object detector. To construct and train a custom YOLO v3 detector, see Object Detection Using YOLO v3 Deep Learning (Computer Vision Toolbox).

Use the downloadPretrainedYOLOv3Detector function to generate a dlnetwork object. To get the code for this function, see the downloadPretrainedYOLOv3Detector Function section.

preTrainedDetector = downloadPretrainedYOLOv3Detector;

Downloaded pretrained detector

The generated network uses training data to estimate the anchor boxes, which help the detector learn to predict the boxes. For more information about anchor boxes, see Anchor Boxes for Object Detection (Computer Vision Toolbox). The downloadPretrainedYOLOv3Detector function creates this YOLO v3 network:

Load the Pretrained network

Extract the network from the pretrained YOLO v3 detector object.

yolov3Detector = preTrainedDetector;
net = yolov3Detector.Network;

Extract the attributes of the network as variables.

anchorBoxes = yolov3Detector.AnchorBoxes;
outputNames = yolov3Detector.Network.OutputNames;
inputSize = yolov3Detector.InputSize;
classNames = yolov3Detector.ClassNames;

Use the analyzeNetwork function to obtain information about the network layers. the function returns a graphical representation of the network that contains detailed parameter information for every layer in the network.

analyzeNetwork(net);

Define FPGA Board Interface

Define the target FPGA board programming interface by using the dlhdl.Target object. Create a programming interface with custom name for your target device and an Ethernet interface to connect the target device to the host computer.

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 board and the bitstream uses the single data type.

hW = dlhdl.Workflow('Network',net,'Bitstream','zcu102_single','Target',hTarget);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.

dn = compile(hW);

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_single.
### An output layer called 'Output1_customOutputConv1' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### An output layer called 'Output2_customOutputConv2' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
     1   'data'                        Image Input                    227×227×3 images                                                     (SW Layer)
     2   'conv1'                       2-D Convolution                64 3×3×3 convolutions with stride [2  2] and padding [0  0  0  0]    (HW Layer)
     3   'relu_conv1'                  ReLU                           ReLU                                                                 (HW Layer)
     4   'pool1'                       2-D Max Pooling                3×3 max pooling with stride [2  2] and padding [0  0  0  0]          (HW Layer)
     5   'fire2-squeeze1x1'            2-D Convolution                16 1×1×64 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
     6   'fire2-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
     7   'fire2-expand1x1'             2-D Convolution                64 1×1×16 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
     8   'fire2-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
     9   'fire2-expand3x3'             2-D Convolution                64 3×3×16 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    10   'fire2-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    11   'fire2-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    12   'fire3-squeeze1x1'            2-D Convolution                16 1×1×128 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    13   'fire3-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    14   'fire3-expand1x1'             2-D Convolution                64 1×1×16 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    15   'fire3-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    16   'fire3-expand3x3'             2-D Convolution                64 3×3×16 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    17   'fire3-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    18   'fire3-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    19   'pool3'                       2-D Max Pooling                3×3 max pooling with stride [2  2] and padding [0  1  0  1]          (HW Layer)
    20   'fire4-squeeze1x1'            2-D Convolution                32 1×1×128 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    21   'fire4-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    22   'fire4-expand1x1'             2-D Convolution                128 1×1×32 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    23   'fire4-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    24   'fire4-expand3x3'             2-D Convolution                128 3×3×32 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    25   'fire4-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    26   'fire4-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    27   'fire5-squeeze1x1'            2-D Convolution                32 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    28   'fire5-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    29   'fire5-expand1x1'             2-D Convolution                128 1×1×32 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    30   'fire5-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    31   'fire5-expand3x3'             2-D Convolution                128 3×3×32 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    32   'fire5-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    33   'fire5-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    34   'pool5'                       2-D Max Pooling                3×3 max pooling with stride [2  2] and padding [0  1  0  1]          (HW Layer)
    35   'fire6-squeeze1x1'            2-D Convolution                48 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    36   'fire6-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    37   'fire6-expand1x1'             2-D Convolution                192 1×1×48 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    38   'fire6-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    39   'fire6-expand3x3'             2-D Convolution                192 3×3×48 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    40   'fire6-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    41   'fire6-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    42   'fire7-squeeze1x1'            2-D Convolution                48 1×1×384 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    43   'fire7-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    44   'fire7-expand1x1'             2-D Convolution                192 1×1×48 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    45   'fire7-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    46   'fire7-expand3x3'             2-D Convolution                192 3×3×48 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    47   'fire7-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    48   'fire7-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    49   'fire8-squeeze1x1'            2-D Convolution                64 1×1×384 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    50   'fire8-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    51   'fire8-expand1x1'             2-D Convolution                256 1×1×64 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    52   'fire8-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    53   'fire8-expand3x3'             2-D Convolution                256 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    54   'fire8-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    55   'fire8-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    56   'fire9-squeeze1x1'            2-D Convolution                64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    57   'fire9-relu_squeeze1x1'       ReLU                           ReLU                                                                 (HW Layer)
    58   'fire9-expand1x1'             2-D Convolution                256 1×1×64 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    59   'fire9-relu_expand1x1'        ReLU                           ReLU                                                                 (HW Layer)
    60   'fire9-expand3x3'             2-D Convolution                256 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    61   'fire9-relu_expand3x3'        ReLU                           ReLU                                                                 (HW Layer)
    62   'fire9-concat'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    63   'customConv1'                 2-D Convolution                1024 3×3×512 convolutions with stride [1  1] and padding 'same'      (HW Layer)
    64   'customRelu1'                 ReLU                           ReLU                                                                 (HW Layer)
    65   'customOutputConv1'           2-D Convolution                18 1×1×1024 convolutions with stride [1  1] and padding 'same'       (HW Layer)
    66   'featureConv2'                2-D Convolution                128 1×1×512 convolutions with stride [1  1] and padding 'same'       (HW Layer)
    67   'featureRelu2'                ReLU                           ReLU                                                                 (HW Layer)
    68   'Output1_customOutputConv1'   Regression Output              mean-squared-error                                                   (SW Layer)
    69   'featureResize2'              dnnfpga.custom.Resize2DLayer   dnnfpga.custom.Resize2DLayer                                         (HW Layer)
    70   'depthConcat2'                Depth concatenation            Depth concatenation of 2 inputs                                      (HW Layer)
    71   'customConv2'                 2-D Convolution                256 3×3×384 convolutions with stride [1  1] and padding 'same'       (HW Layer)
    72   'customRelu2'                 ReLU                           ReLU                                                                 (HW Layer)
    73   'customOutputConv2'           2-D Convolution                18 1×1×256 convolutions with stride [1  1] and padding 'same'        (HW Layer)
    74   'Output2_customOutputConv2'   Regression Output              mean-squared-error                                                   (SW Layer)
                                                                                                                                         
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'Output1_customOutputConv1' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Notice: The layer 'Output2_customOutputConv2' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv1>>fire2-relu_squeeze1x1 ...
### Compiling layer group: conv1>>fire2-relu_squeeze1x1 ... complete.
### Compiling layer group: fire2-expand1x1>>fire2-relu_expand1x1 ...
### Compiling layer group: fire2-expand1x1>>fire2-relu_expand1x1 ... complete.
### Compiling layer group: fire2-expand3x3>>fire2-relu_expand3x3 ...
### Compiling layer group: fire2-expand3x3>>fire2-relu_expand3x3 ... complete.
### Compiling layer group: fire3-squeeze1x1>>fire3-relu_squeeze1x1 ...
### Compiling layer group: fire3-squeeze1x1>>fire3-relu_squeeze1x1 ... complete.
### Compiling layer group: fire3-expand1x1>>fire3-relu_expand1x1 ...
### Compiling layer group: fire3-expand1x1>>fire3-relu_expand1x1 ... complete.
### Compiling layer group: fire3-expand3x3>>fire3-relu_expand3x3 ...
### Compiling layer group: fire3-expand3x3>>fire3-relu_expand3x3 ... complete.
### Compiling layer group: pool3>>fire4-relu_squeeze1x1 ...
### Compiling layer group: pool3>>fire4-relu_squeeze1x1 ... complete.
### Compiling layer group: fire4-expand1x1>>fire4-relu_expand1x1 ...
### Compiling layer group: fire4-expand1x1>>fire4-relu_expand1x1 ... complete.
### Compiling layer group: fire4-expand3x3>>fire4-relu_expand3x3 ...
### Compiling layer group: fire4-expand3x3>>fire4-relu_expand3x3 ... complete.
### Compiling layer group: fire5-squeeze1x1>>fire5-relu_squeeze1x1 ...
### Compiling layer group: fire5-squeeze1x1>>fire5-relu_squeeze1x1 ... complete.
### Compiling layer group: fire5-expand1x1>>fire5-relu_expand1x1 ...
### Compiling layer group: fire5-expand1x1>>fire5-relu_expand1x1 ... complete.
### Compiling layer group: fire5-expand3x3>>fire5-relu_expand3x3 ...
### Compiling layer group: fire5-expand3x3>>fire5-relu_expand3x3 ... complete.
### Compiling layer group: pool5>>fire6-relu_squeeze1x1 ...
### Compiling layer group: pool5>>fire6-relu_squeeze1x1 ... complete.
### Compiling layer group: fire6-expand1x1>>fire6-relu_expand1x1 ...
### Compiling layer group: fire6-expand1x1>>fire6-relu_expand1x1 ... complete.
### Compiling layer group: fire6-expand3x3>>fire6-relu_expand3x3 ...
### Compiling layer group: fire6-expand3x3>>fire6-relu_expand3x3 ... complete.
### Compiling layer group: fire7-squeeze1x1>>fire7-relu_squeeze1x1 ...
### Compiling layer group: fire7-squeeze1x1>>fire7-relu_squeeze1x1 ... complete.
### Compiling layer group: fire7-expand1x1>>fire7-relu_expand1x1 ...
### Compiling layer group: fire7-expand1x1>>fire7-relu_expand1x1 ... complete.
### Compiling layer group: fire7-expand3x3>>fire7-relu_expand3x3 ...
### Compiling layer group: fire7-expand3x3>>fire7-relu_expand3x3 ... complete.
### Compiling layer group: fire8-squeeze1x1>>fire8-relu_squeeze1x1 ...
### Compiling layer group: fire8-squeeze1x1>>fire8-relu_squeeze1x1 ... complete.
### Compiling layer group: fire8-expand1x1>>fire8-relu_expand1x1 ...
### Compiling layer group: fire8-expand1x1>>fire8-relu_expand1x1 ... complete.
### Compiling layer group: fire8-expand3x3>>fire8-relu_expand3x3 ...
### Compiling layer group: fire8-expand3x3>>fire8-relu_expand3x3 ... complete.
### Compiling layer group: fire9-squeeze1x1>>fire9-relu_squeeze1x1 ...
### Compiling layer group: fire9-squeeze1x1>>fire9-relu_squeeze1x1 ... complete.
### Compiling layer group: fire9-expand1x1>>fire9-relu_expand1x1 ...
### Compiling layer group: fire9-expand1x1>>fire9-relu_expand1x1 ... complete.
### Compiling layer group: fire9-expand3x3>>fire9-relu_expand3x3 ...
### Compiling layer group: fire9-expand3x3>>fire9-relu_expand3x3 ... complete.
### Compiling layer group: customConv1>>customOutputConv1 ...
### Compiling layer group: customConv1>>customOutputConv1 ... complete.
### Compiling layer group: featureConv2>>featureRelu2 ...
### Compiling layer group: featureConv2>>featureRelu2 ... complete.
### Compiling layer group: customConv2>>customOutputConv2 ...
### Compiling layer group: customConv2>>customOutputConv2 ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "24.0 MB"        
    "OutputResultOffset"        "0x01800000"     "4.0 MB"         
    "SchedulerDataOffset"       "0x01c00000"     "4.0 MB"         
    "SystemBufferOffset"        "0x02000000"     "28.0 MB"        
    "InstructionDataOffset"     "0x03c00000"     "8.0 MB"         
    "ConvWeightDataOffset"      "0x04400000"     "104.0 MB"       
    "EndOffset"                 "0x0ac00000"     "Total: 172.0 MB"

### Network compilation complete.

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx® Zynq® UltraScale+ MPSoC ZCU102 hardware, run the deploy method of the dlhdl.Workflow object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, displays progress messages, and the time it takes to deploy the network.

deploy(hW);

### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101...
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
# Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/zcu102_single.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'
### Rebooting Xilinx SoC at 192.168.1.101...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 192.168.1.101...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 27-Oct-2022 13:44:50

Test Network

Load the example image and convert the image into a dlarray. Then classify the image on the FPGA by using the predict method of the dlhdl.Workflow object and display the results.

img = imread('vehicle_image.jpg'); 
I = single(rescale(img)); 
I = imresize(I, yolov3Detector.InputSize(1:2)); 
dlX = dlarray(I,'SSC');

Store the output of each detection head of the network in the features variable. Pass features to the post-processing function processYOLOv3Output to combine the multiple outputs and compute the final results. To get the code for this function, see the processYOLOv3Output Function section.

features = cell(size(net.OutputNames'));
[features{:}] = hW.predict(dlX, 'Profiler', 'on');

### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   34469645                  0.15668                       1           34473970              6.4
    conv1                   673148                  0.00306 
    pool1                   509022                  0.00231 
    fire2-squeeze1x1        308280                  0.00140 
    fire2-expand1x1         305546                  0.00139 
    fire2-expand3x3         305227                  0.00139 
    fire3-squeeze1x1        628018                  0.00285 
    fire3-expand1x1         305219                  0.00139 
    fire3-expand3x3         305220                  0.00139 
    pool3                   286781                  0.00130 
    fire4-squeeze1x1        264346                  0.00120 
    fire4-expand1x1         264777                  0.00120 
    fire4-expand3x3         264750                  0.00120 
    fire5-squeeze1x1        749166                  0.00341 
    fire5-expand1x1         264800                  0.00120 
    fire5-expand3x3         264880                  0.00120 
    pool5                   219686                  0.00100 
    fire6-squeeze1x1        195193                  0.00089 
    fire6-expand1x1         145091                  0.00066 
    fire6-expand3x3         145075                  0.00066 
    fire7-squeeze1x1        290001                  0.00132 
    fire7-expand1x1         144830                  0.00066 
    fire7-expand3x3         145390                  0.00066 
    fire8-squeeze1x1        369605                  0.00168 
    fire8-expand1x1         245085                  0.00111 
    fire8-expand3x3         245208                  0.00111 
    fire9-squeeze1x1        490784                  0.00223 
    fire9-expand1x1         244864                  0.00111 
    fire9-expand3x3         245458                  0.00112 
    customConv1           17592876                  0.07997 
    customOutputConv1       952889                  0.00433 
    featureConv2            913457                  0.00415 
    featureResize2           57819                  0.00026 
    customConv2            5600648                  0.02546 
    customOutputConv2       526143                  0.00239 
 * The clock frequency of the DL processor is: 220MHz

[bboxes, scores, labels] = processYOLOv3Output(anchorBoxes, inputSize, classNames, features, I);
resultImage = insertObjectAnnotation(I,'rectangle',bboxes,scores);
imshow(resultImage)

The FPGA returns a score prediction of 0.89605 with a bounding box drawn around the object in the image. The FPGA also returns a prediction of vehicle to the labels variable.

`downloadPretrainedYOLOv3Detector` Function

The downloadPretrainedYOLOv3Detector function to download the pretrained YOLO v3 detector network:

function detector = downloadPretrainedYOLOv3Detector 
if ~exist('yolov3SqueezeNetVehicleExample_21aSPKG.mat', 'file')
    if ~exist('yolov3SqueezeNetVehicleExample_21aSPKG.zip', 'file')
        zipFile = matlab.internal.examples.downloadSupportFile('vision/data', 'yolov3SqueezeNetVehicleExample_21aSPKG.zip');
        copyfile(zipFile);
    end
    unzip('yolov3SqueezeNetVehicleExample_21aSPKG.zip');
end
pretrained = load("yolov3SqueezeNetVehicleExample_21aSPKG.mat");
detector = pretrained.detector;
disp('Downloaded pretrained detector');
end

`processYOLOv3Output` Function

The processYOLOv3Output function is attached as a helper file in this example's directory. This function converts the feature maps from multiple detection heads to bounding boxes, scores and labels. A code snippet of the function is shown below.

function [bboxes, scores, labels] = processYOLOv3Output(anchorBoxes, inputSize, classNames, features, img)
% This function converts the feature maps from multiple detection heads to bounding boxes, scores and labels
% processYOLOv3Output is C code generatable

% Breaks down the raw output from predict function into Confidence score, X, Y, Width,
% Height and Class probabilities for each output from detection head
predictions = iYolov3Transform(features, anchorBoxes);

% Initialize parameters for post-processing
inputSize2d = inputSize(1:2);
info.PreprocessedImageSize = inputSize2d(1:2);
info.ScaleX = size(img,1)/inputSize2d(1);
info.ScaleY = size(img,2)/inputSize2d(1);
params.MinSize = [1 1];
params.MaxSize = size(img(:,:,1));
params.Threshold = 0.5;
params.FractionDownsampling = 1;
params.DetectionInputWasBatchOfImages = false;
params.NetworkInputSize = inputSize;
params.DetectionPreprocessing = "none";
params.SelectStrongest = 1;
bboxes = [];                                                                                                                               
scores = [];                                                                                                                                
labels = [];                                                                                                                             

% Post-process the predictions to get bounding boxes, scores and labels
[bboxes, scores, labels] = iPostprocessMultipleDetection(anchorBoxes, inputSize, classNames, predictions, info, params);
end

function [bboxes, scores, labels] = iPostprocessMultipleDetection (anchorBoxes, inputSize, classNames, YPredData, info, params)
% Post-process the predictions to get bounding boxes, scores and labels

% YpredData is a (x,8) cell array, where x = number of detection heads
% Information in each column is:
% column 1 -> confidence scores
% column 2 to column 5 -> X offset, Y offset, Width, Height of anchor boxes
% column 6 -> class probabilities
% column 7-8 -> copy of width and height of anchor boxes

% Initialize parameters for post-processing
classes = classNames;
predictions = YPredData;
extractPredictions = cell(size(predictions));
% Extract dlarray data
for i = 1:size(extractPredictions,1)
    for j = 1:size(extractPredictions,2)
        extractPredictions{i,j} = extractdata(predictions{i,j});
    end
end

% Storing the values of columns 2 to 5 of extractPredictions
% Columns 2 to 5 represent information about X-coordinate, Y-coordinate, Width and Height of predicted anchor boxes
extractedCoordinates = cell(size(predictions,1),4);
for i = 1:size(predictions,1)
    for j = 2:5 
        extractedCoordinates{i,j-1} = extractPredictions{i,j};
    end
end

% Convert predictions from grid cell coordinates to box coordinates.
boxCoordinates = anchorBoxGenerator(anchorBoxes, inputSize, classNames, extractedCoordinates, params.NetworkInputSize);
% Replace grid cell coordinates in extractPredictions with box coordinates
for i = 1:size(YPredData,1)
    for j = 2:5 
        extractPredictions{i,j} = single(boxCoordinates{i,j-1});
    end
end

% 1. Convert bboxes from spatial to pixel dimension
% 2. Combine the prediction from different heads.
% 3. Filter detections based on threshold.

% Reshaping the matrices corresponding to confidence scores and  bounding boxes
detections = cell(size(YPredData,1),6);
for i = 1:size(detections,1)
    for j = 1:5
        detections{i,j} = reshapePredictions(extractPredictions{i,j});
    end
end
% Reshaping the matrices corresponding to class probablities
numClasses = repmat({numel(classes)},[size(detections,1),1]);
for i = 1:size(detections,1)
    detections{i,6} = reshapeClasses(extractPredictions{i,6},numClasses{i,1}); 
end

% cell2mat converts the cell of matrices into one matrix, this combines the
% predictions of all detection heads
detections = cell2mat(detections);

% Getting the most probable class and corresponding index
[classProbs, classIdx] = max(detections(:,6:end),[],2);
detections(:,1) = detections(:,1).*classProbs;
detections(:,6) = classIdx;

% Keep detections whose confidence score is greater than threshold.
detections = detections(detections(:,1) >= params.Threshold,:);

[bboxes, scores, labels] = iPostProcessDetections(detections, classes, info, params);
end

function [bboxes, scores, labels] = iPostProcessDetections(detections, classes, info, params)
% Resizes the anchor boxes, filters anchor boxes based on size and apply
% NMS to eliminate overlapping anchor boxes
if ~isempty(detections)

    % Obtain bounding boxes and class data for pre-processed image
    scorePred = detections(:,1);
    bboxesTmp = detections(:,2:5);
    classPred = detections(:,6);
    inputImageSize = ones(1,2);
    inputImageSize(2) = info.ScaleX.*info.PreprocessedImageSize(2);
    inputImageSize(1) = info.ScaleY.*info.PreprocessedImageSize(1);
    % Resize boxes to actual image size.
    scale = [inputImageSize(2) inputImageSize(1) inputImageSize(2) inputImageSize(1)];
    bboxPred = bboxesTmp.*scale;
    % Convert x and y position of detections from centre to top-left.
    bboxPred = iConvertCenterToTopLeft(bboxPred);

    % Filter boxes based on MinSize, MaxSize.
    [bboxPred, scorePred, classPred] = filterBBoxes(params.MinSize, params.MaxSize, bboxPred, scorePred, classPred);

    % Apply NMS to eliminate boxes having significant overlap
    if params.SelectStrongest
        [bboxes, scores, classNames] = selectStrongestBboxMulticlass(bboxPred, scorePred, classPred ,...
            'RatioType', 'Union', 'OverlapThreshold', 0.4);
    else
        bboxes = bboxPred;
        scores = scorePred;
        classNames = classPred;
    end

    % Limit width detections
    detectionsWd = min((bboxes(:,1) + bboxes(:,3)),inputImageSize(1,2));
    bboxes(:,3) = detectionsWd(:,1) - bboxes(:,1);

    % Limit height detections
    detectionsHt = min((bboxes(:,2) + bboxes(:,4)),inputImageSize(1,1));
    bboxes(:,4) = detectionsHt(:,1) - bboxes(:,2);
    bboxes(bboxes<1) = 1;

    % Convert classId to classNames.
    labels = categorical(classes,cellstr(classes));
    labels = labels(classNames);

else
    % If detections are empty then bounding boxes, scores and labels should
    % be empty
    bboxes = zeros(0,4,'single');
    scores = zeros(0,1,'single');
    labels = categorical(classes);
end
end

function x = reshapePredictions(pred)
% Reshapes the matrices corresponding to scores, X, Y, Width and Height to
% make them compatible for combining the outputs of different detection
% heads
[h,w,c,n] = size(pred);
x = reshape(pred,h*w*c,1,n);
end

function x = reshapeClasses(pred,numClasses)
% Reshapes the matrices corresponding to the class probabilities, to make it
% compatible for combining the outputs of different detection heads
[h,w,c,n] = size(pred);
numAnchors = c/numClasses;
x = reshape(pred,h*w,numClasses,numAnchors,n);
x = permute(x,[1,3,2,4]);
[h,w,c,n] = size(x);
x = reshape(x,h*w,c,n);
end

function bboxes = iConvertCenterToTopLeft(bboxes)
% Convert x and y position of detections from centre to top-left.
bboxes(:,1) = bboxes(:,1) - bboxes(:,3)/2 + 0.5;
bboxes(:,2) = bboxes(:,2) - bboxes(:,4)/2 + 0.5;
bboxes = floor(bboxes);
bboxes(bboxes<1) = 1;
end

function tiledAnchors = anchorBoxGenerator(anchorBoxes, inputSize, classNames,YPredCell,inputImageSize)
% Convert grid cell coordinates to box coordinates.
% Generate tiled anchor offset.
tiledAnchors = cell(size(YPredCell));
for i = 1:size(YPredCell,1)
    anchors = anchorBoxes{i,:};
    [h,w,~,n] = size(YPredCell{i,1});
    [tiledAnchors{i,2},tiledAnchors{i,1}] = ndgrid(0:h-1,0:w-1,1:size(anchors,1),1:n);
    [~,~,tiledAnchors{i,3}] = ndgrid(0:h-1,0:w-1,anchors(:,2),1:n);
    [~,~,tiledAnchors{i,4}] = ndgrid(0:h-1,0:w-1,anchors(:,1),1:n);
end

for i = 1:size(YPredCell,1)
    [h,w,~,~] = size(YPredCell{i,1});
    tiledAnchors{i,1} = double((tiledAnchors{i,1} + YPredCell{i,1})./w);
    tiledAnchors{i,2} = double((tiledAnchors{i,2} + YPredCell{i,2})./h);
    tiledAnchors{i,3} = double((tiledAnchors{i,3}.*YPredCell{i,3})./inputImageSize(2));
    tiledAnchors{i,4} = double((tiledAnchors{i,4}.*YPredCell{i,4})./inputImageSize(1));
end
end

function predictions = iYolov3Transform(YPredictions, anchorBoxes)
% This function breaks down the raw output from predict function into Confidence score, X, Y, Width,
% Height and Class probabilities for each output from detection head

predictions = cell(size(YPredictions,1),size(YPredictions,2) + 2);

for idx = 1:size(YPredictions,1)
    % Get the required info on feature size.
    numChannelsPred = size(YPredictions{idx},3);  %number of channels in a feature map
    numAnchors = size(anchorBoxes{idx},1);    %number of anchor boxes per grid
    numPredElemsPerAnchors = numChannelsPred/numAnchors;
    channelsPredIdx = 1:numChannelsPred;
    predictionIdx = ones([1,numAnchors.*5]);

    % X positions.
    startIdx = 1;
    endIdx = numChannelsPred;
    stride = numPredElemsPerAnchors;
    predictions{idx,2} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:);
    predictionIdx = [predictionIdx startIdx:stride:endIdx];

    % Y positions.
    startIdx = 2;
    endIdx = numChannelsPred;
    stride = numPredElemsPerAnchors;
    predictions{idx,3} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:);
    predictionIdx = [predictionIdx startIdx:stride:endIdx];

    % Width.
    startIdx = 3;
    endIdx = numChannelsPred;
    stride = numPredElemsPerAnchors;
    predictions{idx,4} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:);
    predictionIdx = [predictionIdx startIdx:stride:endIdx];

    % Height.
    startIdx = 4;
    endIdx = numChannelsPred;
    stride = numPredElemsPerAnchors;
    predictions{idx,5} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:);
    predictionIdx = [predictionIdx startIdx:stride:endIdx];

    % Confidence scores.
    startIdx = 5;
    endIdx = numChannelsPred;
    stride = numPredElemsPerAnchors;
    predictions{idx,1} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:);
    predictionIdx = [predictionIdx startIdx:stride:endIdx];

    % Class probabilities.
    classIdx = setdiff(channelsPredIdx,predictionIdx);
    predictions{idx,6} = YPredictions{idx}(:,:,classIdx,:);
end

for i = 1:size(predictions,1)
    predictions{i,7} = predictions{i,4};
    predictions{i,8} = predictions{i,5};
end

% Apply activation to the predicted cell array
% Apply sigmoid activation to columns 1-3 (Confidence score, X, Y)
for i = 1:size(predictions,1)
    for j = 1:3
        predictions{i,j} = sigmoid(predictions{i,j});
    end
end
% Apply exponentiation to columns 4-5 (Width, Height)
for i = 1:size(predictions,1)
    for j = 4:5
        predictions{i,j} = exp(predictions{i,j});
    end
end
% Apply sigmoid activation to column 6 (Class probabilities)
for i = 1:size(predictions,1)
    for j = 6
        predictions{i,j} = sigmoid(predictions{i,j});
    end
end
end

function [bboxPred, scorePred, classPred] = filterBBoxes(minSize, maxSize, bboxPred, scorePred, classPred)
% Filter boxes based on MinSize, MaxSize
[bboxPred, scorePred, classPred] = filterSmallBBoxes(minSize, bboxPred, scorePred, classPred);
[bboxPred, scorePred, classPred] = filterLargeBBoxes(maxSize, bboxPred, scorePred, classPred);
end

function varargout = filterSmallBBoxes(minSize, varargin)
% Filter boxes based on MinSize
bboxes = varargin{1};
tooSmall = any((bboxes(:,[4 3]) < minSize),2);
for ii = 1:numel(varargin)
    varargout{ii} = varargin{ii}(~tooSmall,:);
end
end

function varargout = filterLargeBBoxes(maxSize, varargin)
% Filter boxes based on MaxSize
bboxes = varargin{1};
tooBig = any((bboxes(:,[4 3]) > maxSize),2);
for ii = 1:numel(varargin)
    varargout{ii} = varargin{ii}(~tooBig,:);
end
end

function m = cell2mat(c)
% Converts the cell of matrices into one matrix by concatenating
% the output corresponding to each feature map

elements = numel(c);
% If number of elements is 0 return an empty array
if elements == 0
    m = [];
    return
end
% If number of elements is 1, return same element as matrix
if elements == 1
    if isnumeric(c{1}) || ischar(c{1}) || islogical(c{1}) || isstruct(c{1})
        m = c{1};
        return
    end
end
% Error out for unsupported cell content
ciscell = iscell(c{1});
cisobj = isobject(c{1});
if cisobj || ciscell
    disp('CELL2MAT does not support cell arrays containing cell arrays or objects.');
end
% If input is struct, extract field names of structure into a cell
if isstruct(c{1})
    cfields = cell(elements,1);
    for n = 1:elements
        cfields{n} = fieldnames(c{n});
    end
    if ~isequal(cfields{:})
        disp('The field names of each cell array element must be consistent and in consistent order.');
    end
end
% If number of dimensions is 2 
if ndims(c) == 2
    rows = size(c,1);
    cols = size(c,2);
    if (rows < cols)
        % If rows is less than columns first concatenate each column into 1
        % row then concatenate all the rows
        m = cell(rows,1);
        for n = 1:rows
            m{n} = cat(2,c{n,:});
        end
        m = cat(1,m{:});
    else
        % If columns is less than rows, first concatenate each corresponding
        % row into columns, then combine all columns into 1
        m = cell(1,cols);
        for n = 1:cols
            m{n} = cat(1,c{:,n});
        end
        m = cat(2,m{:});
    end
    return
end
end

References

[1] Redmon, Joseph, and Ali Farhadi. “YOLOv3: An Incremental Improvement.” Preprint, submitted April 8, 2018. https://arxiv.org/abs/1804.02767.