Main Content

For most deep learning tasks, you can use a pretrained network and adapt it to your own data. For an example showing how to use transfer learning to retrain a convolutional neural network to classify a new set of images, see Train Deep Learning Network to Classify New Images. Alternatively, you can create and train networks from scratch using `layerGraph`

objects with the `trainNetwork`

and `trainingOptions`

functions.

If the `trainingOptions`

function does not provide the training options that you need for your task, then you can create a custom training loop using automatic differentiation. To learn more, see Define Deep Learning Network for Custom Training Loops.

If Deep Learning Toolbox™ does not provide the layers you need for your task (including output layers that specify loss functions), then you can create a custom layer. To learn more, see Define Custom Deep Learning Layers. For loss functions that cannot be specified using an output layer, you can specify the loss in a custom training loop. To learn more, see Specify Loss Functions. For networks that cannot be created using layer graphs, you can define custom networks as a function. To learn more, see Define Network as Model Function.

For more information about which training method to use for which task, see Training Deep Learning Models in MATLAB.

`dlnetwork`

ObjectFor most tasks, you can control the training algorithm details using the `trainingOptions`

and `trainNetwork`

functions. If the `trainingOptions`

function does not provide the options you need for your task
(for example, a custom learn rate schedule), then you can define your own custom training
loop using a `dlnetwork`

object. A `dlnetwork`

object allows you to train a network specified as a layer graph
using automatic differentiation.

For networks specified as a layer graph, you can create a
`dlnetwork`

object from the layer graph by using the
`dlnetwork`

function directly. For a list of layers supported by
`dlnetwork`

objects, see the Supported Layers section of the
`dlnetwork`

page.

dlnet = dlnetwork(lgraph);

For an example showing how to train a network with a custom learn rate schedule, see Train Network Using Custom Training Loop.

For architectures that cannot be created using layer graphs (for example, a
Siamese network that requires shared weights), you can define the model as a
function of the form ```
[dlY1,...,dlYM] =
model(parameters,dlX1,...,dlXN)
```

, where `parameters`

contains the network parameters, `dlX1,...,dlXN`

correspond to the
input data for the `N`

model inputs, and
`dlY1,...,dlYM`

correspond to the `M`

model
outputs. To train a deep learning model defined as a function, use a custom training
loop. For an example, see Train Network Using Model Function.

When defining deep learning models as a function, you must manually initialize the layer weights. For more information, see Initialize Learnable Parameters for Model Functions.

If you define a custom network as a function, then the model function must support
automatic differentiation. You can use the following deep learning operations. The
functions listed here are only a subset. For a complete list of functions that
support `dlarray`

input, see List of Functions with dlarray Support.

Function | Description |
---|---|

`avgpool` | The average pooling operation performs downsampling by dividing the input into pooling regions and computing the average value of each region. |

`batchnorm` | The batch normalization operation normalizes each input channel
across a mini-batch. To speed up training of convolutional neural networks and reduce the
sensitivity to network initialization, use batch normalization between convolution and nonlinear
operations such as `relu` . |

`crossentropy` | The cross-entropy operation computes the cross-entropy loss between network predictions and target values for single-label and multi-label classification tasks. |

`crosschannelnorm` | The cross-channel normalization operation uses local responses
in different channels to normalize each activation. Cross-channel normalization typically
follows a `relu` operation.
Cross-channel normalization is also known as local response normalization. |

`dlconv` | The convolution operation applies sliding filters to the input data. Use 1-D and 2-D filters with ungrouped or grouped convolutions and 3-D filters with ungrouped convolutions. |

`dltranspconv` | The transposed convolution operation upsamples feature maps. |

`embed` | The embed operation converts numeric indices to numeric vectors, where the indices correspond to discrete data. Use embeddings to map discrete data such as categorical values or words to numeric vectors. |

`fullyconnect` | The fully connect operation multiplies the input by a weight matrix and then adds a bias vector. |

`groupnorm` | The group normalization operation divides the channels of the input data into groups and normalizes the activations across each group. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use group normalization between convolution and nonlinear operations such as `relu` . You can perform instance normalization and layer normalization by setting the appropriate number of groups. |

`gru` | The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data. |

`leakyrelu` | The leaky rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is multiplied by a fixed scale factor. |

`lstm` | The long short-term memory (LSTM) operation allows a network to learn long-term dependencies between time steps in time series and sequence data. |

`maxpool` | The maximum pooling operation performs downsampling by dividing the input into pooling regions and computing the maximum value of each region. |

`maxunpool` | The maximum unpooling operation unpools the output of a maximum pooling operation by upsampling and padding with zeros. |

`mse` | The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks. |

`relu` | The rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is set to zero. |

`onehotdecode` | The one-hot decode operation decodes probability vectors, such as the output of a classification network, into classification labels. The input |

`sigmoid` | The sigmoid activation operation applies the sigmoid function to the input data. |

`softmax` | The softmax activation operation applies the softmax function to the channel dimension of the input data. |

When using custom training loops, you must calculate the loss in the model gradients function. Use the loss value when computing gradients for updating the network weights. To compute the loss, you can use the following functions:

Function | Description |
---|---|

`softmax` | The softmax activation operation applies the softmax function to the channel dimension of the input data. |

`sigmoid` | The sigmoid activation operation applies the sigmoid function to the input data. |

`crossentropy` | The cross-entropy operation computes the cross-entropy loss between network predictions and target values for single-label and multi-label classification tasks. |

`mse` | The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks. |

Alternatively, you can use a custom loss function by creating a function of the form
`loss = myLoss(Y,T)`

, where `Y`

is the network
predictions, `T`

are the targets, and `loss`

is the
returned loss.

For an example showing how to train a generative adversarial network (GAN) that generates images using a custom loss function, see Train Generative Adversarial Network (GAN).

When training a deep learning model with a custom training loop, the software minimizes the loss with respect to the learnable parameters. To minimize the loss, the software uses the gradients of the loss with respect to the learnable parameters. To calculate these gradients using automatic differentiation, you must define a model gradients function.

For models specified as a `dlnetwork`

object, create a function of the form
`gradients = modelGradients(dlnet,dlX,T)`

, where
`dlnet`

is the network, `dlX`

contains the input
predictors, `T`

contains the targets, and `gradients`

contains the returned gradients. Optionally, you can pass extra arguments to the gradients
function (for example, if the loss function requires extra information), or return extra
arguments (for example, metrics for plotting the training progress).

For models specified as a function, create a function of the form
`gradients = modelGradients(parameters,dlX,T)`

, where
`parameters`

contains the learnable parameters,
`dlX`

contains the input predictors, `T`

contains the targets, and `gradients`

contains the returned
gradients. Optionally, you can pass extra arguments to the gradients function (for
example, if the loss function requires extra information), or return extra arguments
(for example, metrics for plotting the training progress).

To learn more about defining model gradients functions for custom training loops, see Define Model Gradients Function for Custom Training Loop.

To evaluate the model gradients using automatic differentiation, use the
`dlfeval`

function which evaluates a function with automatic
differentiation enabled. For the first input of `dlfeval`

, pass the model
gradients function specified as a function handle and for the following inputs, pass the
required variables for the model gradients function. For the outputs of the
`dlfeval`

function, specify the same outputs as the model gradients
function.

To update the learnable parameters using the gradients, you can use the following functions:

Function | Description |
---|---|

`adamupdate` | Update parameters using adaptive moment estimation (Adam) |

`rmspropupdate` | Update parameters using root mean squared propagation (RMSProp) |

`sgdmupdate` | Update parameters using stochastic gradient descent with momentum (SGDM) |

`dlupdate` | Update parameters using custom function |

`dlarray`

| `dlfeval`

| `dlgradient`

| `dlnetwork`

- Train Generative Adversarial Network (GAN)
- Train Network Using Custom Training Loop
- Specify Training Options in Custom Training Loop
- Define Model Gradients Function for Custom Training Loop
- Update Batch Normalization Statistics in Custom Training Loop
- Update Batch Normalization Statistics Using Model Function
- Make Predictions Using dlnetwork Object
- Make Predictions Using Model Function
- Train Network Using Model Function
- Initialize Learnable Parameters for Model Functions
- Training Deep Learning Models in MATLAB
- Define Custom Deep Learning Layers
- List of Functions with dlarray Support
- Automatic Differentiation Background
- Use Automatic Differentiation In Deep Learning Toolbox