yolov2OutputLayer

(To be removed) Create output layer for YOLO v2 object detection network

yolov2OutputLayer will be removed in a future release. Create a YOLO v2 object detection network using the yolov2ObjectDetector object instead. For more information, see Version History.

Description

The yolov2OutputLayer function creates a YOLOv2OutputLayer object, which represents the output layer for you only look once version 2 (YOLO v2) object detection network. The output layer provides the refined bounding box locations of the target objects.

Creation

Syntax

layer = yolov2OutputLayer(anchorBoxes)

layer = yolov2OutputLayer(anchorBoxes,Name,Value)

Description

layer = yolov2OutputLayer(anchorBoxes) creates a YOLOv2OutputLayer object, layer, which represents the output layer for YOLO v2 object detection network. The layer outputs the refined bounding box locations that are predicted using a predefined set of anchor boxes specified at the input.

example

layer = yolov2OutputLayer(anchorBoxes,Name,Value) sets the additional properties using name-value pairs and the input from the preceding syntax. Enclose each property name in single quotes. For example, yolov2OutputLayer('Name','yolo_Out') creates an output layer with the name 'yolo_Out'.

example

Input Arguments

expand all

`anchorBoxes` — Set of anchor boxes
M-by-2 matrix

Set of anchor boxes, specified as an M-by-2 matrix, where each row is of the form [height width]. The matrix defines the height and the width of M number of anchor boxes. This input sets the AnchorBoxes property of the output layer. You can use the clustering approach for estimating anchor boxes from the training data. For more information, see Estimate Anchor Boxes From Training Data.

Properties

expand all

`Name` — Layer name
`""` (default) | character vector | string scalar

Layer name, specified as a character vector or string scalar. For Layer array input, the trainnet (Deep Learning Toolbox) and dlnetwork (Deep Learning Toolbox) functions automatically assign names to layers with the name "".

The yolov2OutputLayer object stores this property as a character vector.

Data Types: char | string

`LossFunction` — Loss function
`'mean-squared-error'` (default)

This property is read-only.

Loss function, set as 'mean-squared-error'. For more information about the loss function, see Loss Function for Bounding Box Refinement.

`AnchorBoxes` — Set of anchor boxes
M-by-2 matrix

This property is read-only.

Set of anchor boxes used for training, specified as a M-by-2 matrix defining the width and the height of M number of anchor boxes. This property is set by the input anchorBoxes.

`LossFactors` — Weights in the loss function
`[5 1 1 1]` (default) | 1-by-4 vector

This property is read-only.

Weights in the loss function, specified as a 1-by-4 vector of form [K₁ K₂ K₃ K₄]. Weights increase the stability of the network model by penalizing incorrect bounding box predictions and false classifications. For more information about the weights in loss the function, see Loss Function for Bounding Box Refinement.

`Classes` — Classes of the output layer
'auto' (default) | categorical vector | string array | cell array of character vectors

Classes of the output layer, specified as a categorical vector, string array, cell array of character vectors, or 'auto'. Use this name-value pair to specify the names of the object classes in the input training data.

If the value is set to 'auto', then the software automatically sets the classes at training time. If you specify the string array or cell array of character vectors str, then the software sets the classes of the output layer to categorical(str). The default value is 'auto'.

Data Types: char | string | cell | categorical

`NumInputs` — Number of inputs
`1` (default)

This property is read-only.

Number of inputs to the layer, returned as 1. This layer accepts a single input only.

Data Types: double

`InputNames` — Input names
`{'in'}` (default)

This property is read-only.

Input names, returned as {'in'}. This layer accepts a single input only.

Data Types: cell

Examples

collapse all

Create YOLO v2 Output Layer

Create a YOLO v2 output layer with two anchor boxes.

Define the height and the width of the anchor boxes.

anchorBoxes = [16 16;32 32];

Specify the names of the object classes in the training data.

classNames = {'Vehicle','Person'};

Generate a YOLO v2 output layer with the name "yolo_Out".

layer = yolov2OutputLayer(anchorBoxes,'Name','yolo_Out','Classes',classNames);

Inspect the properties of the YOLO v2 output layer.

layer

layer = 
  YOLOv2OutputLayer with properties:

            Name: 'yolo_Out'

   Hyperparameters
         Classes: [2x1 categorical]
    LossFunction: 'mean-squared-error'
     AnchorBoxes: [2x2 double]
     LossFactors: [5 1 1 1]

You can read the values for Classes property by using dot notation layer.Classes. The function stores the class names as a categorical array.

layer.Classes

ans = 2x1 categorical
     Vehicle 
     Person

More About

expand all

Loss Function for Bounding Box Refinement

During training, the output layer of YOLO v2 network predicts refined bounding box locations by optimizing the mean squared error loss between predicted bounding boxes and the ground truth. The loss function is defined as

$\begin{array}{l} K_{1} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{i j}^{o b j} [{(x_{i} - {\hat{x}}_{i})}^{2} + {(y_{i} - {\hat{y}}_{i})}^{2}] \\ + K_{1} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{i j}^{o b j} [{(\sqrt{w_{i}} - \sqrt{{\hat{w}}_{i}})}^{2} + {(\sqrt{h_{i}} - \sqrt{{\hat{h}}_{i}})}^{2}] \\ + K_{2} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{i j}^{o b j} {(C_{i} - {\hat{C}}_{i})}^{2} \\ + K_{3} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 1_{i j}^{n o o b j} {(C_{i} - {\hat{C}}_{i})}^{2} \\ + K_{4} \sum_{i = 0}^{S^{2}} 1_{i}^{o b j} \sum_{c \in c l a s s e s} {(p_{i} (c) - {\hat{p}}_{i} (c))}^{2} \end{array}$

where:

S is the number of grid cells
B is the number of bounding boxes in each grid cell.
$1_{i j}^{o b j}$ is 1 if the jth bounding box in grid cell i is responsible for detecting the object. Otherwise it is set to 0. A grid cell i is responsible for detecting the object, if the overlap between the ground truth and a bounding box in that grid cell is greater than or equal to 0.6.
$1_{i j}^{n o o b j}$ is 1 if the jth bounding box in grid cell i does not contain any object. Otherwise it is set to 0.
$1_{i}^{o b j}$ is 1 if an object is detected in grid cell i. Otherwise it is set to 0.
K₁, K₂, K₃, and K₄ are the weights. To adjust the weights, modify the LossFactors property.

The loss function can be split into three parts:

Localization loss
The first and second terms in the loss function comprise the localization loss. It measures error between the predicted bounding box and the ground truth. The parameters for computing the localization loss include the position, size of the predicted bounding box, and the ground truth. The parameters are defined as follows.
- $(x_{i}, y_{i})$ , is the center of the jth bounding box relative to grid cell i.
- $({\hat{x}}_{i}, {\hat{y}}_{i})$ , is the center of the ground truth relative to grid cell i.
- $w_{i} and h_{i}$ is the width and the height of the jth bounding box in grid cell i, respectively. The size of the predicted bounding box is specified relative to the input image size.
- ${\hat{w}}_{i} and {\hat{h}}_{i}$ is the width and the height of the ground truth in grid cell i, respectively.
- K₁ is the weight for localization loss. Increase this value to increase the weightage for bounding box prediction errors.
Confidence loss
The third and fourth terms in the loss function comprise the confidence loss. The third term measures the objectness (confidence score) error when an object is detected in the jth bounding box of grid cell i. The fourth term measures the objectness error when no object is detected in the jth bounding box of grid cell i. The parameters for computing the confidence loss are defined as follows.
- C_i is the confidence score of the jth bounding box in grid cell i.
- Ĉ_i is the confidence score of the ground truth in grid cell i.
- K₂ is the weight for objectness error, when an object is detected in the predicted bounding box. You can adjust the value of K₂ to weigh confidence scores from grid cells that contain objects.
- K₃ is the weight for objectness error, when an object is not detected in the predicted bounding box. You can adjust the value of K₃ to weigh confidence scores from grid cells that do not contain objects.
The confidence loss can cause the training to diverge when the number of grid cells that do not contain objects is more than the number of grid cells that contain objects. To remedy this, increase the value for K₂ and decrease the value for K₃.
Classification loss
The fifth term in the loss function comprises the classification loss. For example, suppose that an object is detected in the predicted bounding box contained in grid cell i. Then, the classification loss measures the squared error between the class conditional probabilities for each class in grid cell i. The parameters for computing the classification loss are defined as follows.
- p_i (c) is the estimated conditional class probability for object class c in grid cell i.
- ${\hat{p}}_{i} (c)$ is the actual conditional class probability for object class c in grid cell i.
- K₄ is the weight for classification error when an object is detected in the grid cell. Increase this value to increase the weightage for classification loss.

Tips

To improve prediction accuracy, you can:

Train the network with more number of images. You can expand the training dataset through data augmentation. For information on how to apply data augmentation for training dataset, see Preprocess Images for Deep Learning (Deep Learning Toolbox).
Perform multiscale training by using the trainYOLOv2ObjectDetector function. To do so, specify the TrainingImageSize argument of trainYOLOv2ObjectDetector function for training the network.
Choose anchor boxes appropriate to the dataset for training the network. You can use the estimateAnchorBoxes function to compute anchor boxes directly from the training data.

References

[1] Joseph. R, S. K. Divvala, R. B. Girshick, and F. Ali. "You Only Look Once: Unified, Real-Time Object Detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. Las Vegas, NV: CVPR, 2016.

[2] Joseph. R and F. Ali. "YOLO 9000: Better, Faster, Stronger." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. Honolulu, HI: CVPR, 2017.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

To generate CUDA^® or C++ code by using GPU Coder™, you must first construct and train a deep neural network. Once the network is trained and evaluated, you can configure the code generator to generate code and deploy the convolutional neural network on platforms that use NVIDIA^® or ARM^® GPU processors. For more information, see Deep Learning with GPU Coder (GPU Coder).

For this layer, you can generate code that takes advantage of the NVIDIA CUDA deep neural network library (cuDNN), NVIDIA TensorRT™ high performance inference library, or the ARM Compute Library for Mali GPU.

Version History

Introduced in R2019a

collapse all

R2024b: To be removed

The yolov2OutputLayer object will be removed in a future release. Create a YOLO v2 object detection network using the yolov2ObjectDetector object instead, using these steps:

Define your network as a dlnetwork (Deep Learning Toolbox) object. You can use functions such as addLayers (Deep Learning Toolbox) and connectLayers (Deep Learning Toolbox) to build the network. Do not include output layers in the network.
Create a yolov2ObjectDetector using the dlnetwork as the custom network. You can specify the anchor boxes, class names, and loss factors using the AnchorBoxes, ClassNames, and LossFactors name-value arguments, respectively.

yolov2OutputLayer

Description

Creation

Syntax

Description

Input Arguments

`anchorBoxes` — Set of anchor boxes
M-by-2 matrix

Properties

`Name` — Layer name
`""` (default) | character vector | string scalar

`LossFunction` — Loss function
`'mean-squared-error'` (default)

`AnchorBoxes` — Set of anchor boxes
M-by-2 matrix

`LossFactors` — Weights in the loss function
`[5 1 1 1]` (default) | 1-by-4 vector

`Classes` — Classes of the output layer
'auto' (default) | categorical vector | string array | cell array of character vectors

`NumInputs` — Number of inputs
`1` (default)

`InputNames` — Input names
`{'in'}` (default)

Examples

Create YOLO v2 Output Layer

More About

Loss Function for Bounding Box Refinement

Tips

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: To be removed

See Also

Topics

yolov2OutputLayer

Description

Creation

Syntax

Description

Input Arguments

anchorBoxes — Set of anchor boxes M-by-2 matrix

Properties

Name — Layer name "" (default) | character vector | string scalar

LossFunction — Loss function 'mean-squared-error' (default)

AnchorBoxes — Set of anchor boxes M-by-2 matrix

LossFactors — Weights in the loss function [5 1 1 1] (default) | 1-by-4 vector

Classes — Classes of the output layer 'auto' (default) | categorical vector | string array | cell array of character vectors

NumInputs — Number of inputs 1 (default)

InputNames — Input names {'in'} (default)

Examples

Create YOLO v2 Output Layer

More About

Loss Function for Bounding Box Refinement

Tips

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: To be removed

See Also

Topics

`anchorBoxes` — Set of anchor boxes
M-by-2 matrix

`Name` — Layer name
`""` (default) | character vector | string scalar

`LossFunction` — Loss function
`'mean-squared-error'` (default)

`AnchorBoxes` — Set of anchor boxes
M-by-2 matrix

`LossFactors` — Weights in the loss function
`[5 1 1 1]` (default) | 1-by-4 vector

`Classes` — Classes of the output layer
'auto' (default) | categorical vector | string array | cell array of character vectors

`NumInputs` — Number of inputs
`1` (default)

`InputNames` — Input names
`{'in'}` (default)

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.