Main Content

Use Bayesian Optimization in Custom Training Experiments

This example shows how to use Bayesian optimization to find optimal hyperparameter values for custom training experiments in Experiment Manager. Bayesian optimization provides an alternative strategy to sweeping hyperparameters in an experiment. You specify a range of values for each hyperparameter and select a metric to optimize, and Experiment Manager searches for a combination of hyperparameters that optimizes your selected metric. Bayesian optimization requires Statistics and Machine Learning Toolbox™.

In this example, you train a network to classify images of handwritten digits using a custom learning rate schedule. The experiment uses Bayesian optimization to find the type of schedule and the combination of hyperparameters that maximizes the validation accuracy. For more information on using a custom learning rate schedule, see Train Network Using Custom Training Loop and Piecewise Learn Rate Schedule.

Alternatively, you can find optimal hyperparameter values programmatically by calling the bayesopt function. For more information, see Deep Learning Using Bayesian Optimization.

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment that you can inspect and run. To open the experiment, in the Experiment Browser pane, double-click the name of the experiment (BayesOptExperiment).

Custom training experiments consist of a description, a table of hyperparameters, and a training function. Experiments that use Bayesian optimization include additional options to limit the duration of the experiment. For more information, see Configure Custom Training Experiment.

The Description field contains a textual description of the experiment. For this example, the description is:

Classification of digits, using two custom learning rate schedules:
* decay - Use the learning rate p(t) = p(0)/(1+kt), where t is the iteration number and k is DecayRate.
* piecewise - Multiply the learning rate by DropFactor every 100 iterations.

The Hyperparameters section specifies the strategy (Bayesian Optimization) and hyperparameter options to use for the experiment. For each hyperparameter, specify these options:

  • Range — Enter a two-element vector that gives the lower bound and upper bound of a real- or integer-valued hyperparameter, or a string array or cell array that lists the possible values of a categorical hyperparameter.

  • Type — Select real (real-valued hyperparameter), integer (integer-valued hyperparameter), or categorical (categorical hyperparameter).

  • Transform — Select none (no transform) or log (logarithmic transform). For log, the hyperparameter must be real or integer and positive. With this option, the hyperparameter is searched and modeled on a logarithmic scale.

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters. Each trial in the experiment uses a new combination of hyperparameter values based on the results of the previous trials. This example uses the hyperparameters Schedule, InitialLearnRate, DecayRate, and DropFactor to specify the custom learning rate schedule used for training. The options for Schedule are:

  • decay — For each iteration, use the time-based learning rate $\rho_t = \frac{\rho_0}{1+kt}$, where $t$ is the iteration number, $\rho_0$ is the initial learning rate specified by InitialLearnRate, and $k$ is the decay rate specified by DecayRate. This option ignores the value of the hyperparameter DropFactor.

  • piecewise — Start with the initial learning rate specified by InitialLearnRate and periodically drop the learning rate by multiplying by the drop factor specified by DropFactor. In this example, the learning rate drops every 100 iterations. This option ignores the value of the hyperparameter DecayRate.

The experiment models InitialLearnRate and DecayRate on a logarithmic scale because the range of values for these hyperparameters (0.001 to 0.1) spans several orders of magnitude. In contrast, the values for DropFactor range from 0.1 to 0.9, so the experiment models DropFactor on a linear scale.

Under Bayesian Optimization Options, you can specify the duration of the experiment by entering the maximum time (in seconds) and the maximum number of trials to run. To best use the power of Bayesian optimization, perform at least 30 objective function evaluations.

The Training Function specifies the training data, network architecture, training options, and training procedure used by the experiment. The input to the training function is a structure with fields from the hyperparameter table and an experiments.Monitor object that you can use to track the progress of the training, record values of the metrics used by the training, and produce training plots. The training function returns a structure that contains the trained network, the training loss, the validation accuracy, and the execution environment used for training. Experiment Manager saves this output, so you can export it to the MATLAB workspace when the training is complete. The training function has five sections.

  • Initialize Output sets the initial value of the network, loss, and accuracy to empty arrays to indicate that the training has not started. The experiment sets the execution environment to "auto", so it trains the network on a GPU if one is available. Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For more information, see GPU Support by Release (Parallel Computing Toolbox).

output.trainedNet = [];
output.trainingInfo.loss = [];
output.trainingInfo.accuracy = [];
output.executionEnvironment = "auto";
  • Load Training Data defines the training and validation data for the experiment as augmented image datastores using the Digits data set. For each image in the training set, the experiment applies a random translation of up to 5 pixels on the horizontal and vertical axes. For more information on this data set, see Image Data Sets.

dataFolder = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imds = imageDatastore(dataFolder, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.9,'randomize');
inputSize = [28 28 1];
pixelRange = [-5 5];
imageAugmenter = imageDataAugmenter( ...
    RandXTranslation = pixelRange, ...
    RandYTranslation = pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
    DataAugmentation = imageAugmenter);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
classes = categories(imdsTrain.Labels);
numClasses = numel(classes);
  • Define Network Architecture defines the architecture for the image classification network. To train the network with a custom training loop and enable automatic differentiation, the training function converts the layer graph to a dlnetwork object.

layers = [
    imageInputLayer(inputSize,Normalization="none",Name="input")
    convolution2dLayer(5,20,Name ="conv1")
    batchNormalizationLayer(Name="bn1")
    reluLayer(Name="relu1")
    convolution2dLayer(3,20,Padding="same",Name="conv2")
    batchNormalizationLayer(Name="bn2")
    reluLayer(Name="relu2")
    convolution2dLayer(3,20,Padding="same",Name="conv3")
    batchNormalizationLayer(Name="bn3")
    reluLayer(Name="relu3")
    fullyConnectedLayer(numClasses,Name="fc")
    softmaxLayer(Name="softmax")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
  • Specify Training Options defines the training options used by the experiment. In this example, Experiment Manager trains the networks with a mini-batch size of 128 for 10 epochs using the custom learning rate schedule defined by the hyperparameters.

numEpochs = 10;
miniBatchSize = 128;
momentum = 0.9;
learnRateSchedule = params.Schedule;
initialLearnRate = params.InitialLearnRate;
learnRateDecay = params.DecayRate;
learnRateDropFactor = params.DropFactor;
learnRateDropPeriod = 100;
learnRate = initialLearnRate;
  • Train Model defines the custom training loop used by the experiment. The custom training loop uses minibatchqueue to process and manage the mini-batches of images. For each mini-batch, the minibatchqueue object converts the labels to one-hot encoded variables and formats the image data with the dimension labels 'SSCB' (spatial, spatial, channel, batch). By default, the minibatchqueue object converts the data to dlarray objects with underlying type single. If you train on a GPU, the data is converted to gpuArray (Parallel Computing Toolbox) objects. For each epoch, the custom training loop shuffles the datastore, loops over mini-batches of data, and evaluates the model gradients, state, and loss. Then, the training function determines the learning rate for the selected schedule and updates the network parameters. After each iteration of the custom training loop, the training function computes the validation accuracy, saves the trained network, and updates the training progress.

monitor.Metrics = ["LearnRate" "TrainingLoss" "ValidationAccuracy"];
monitor.XLabel = "Iteration";
mbq = minibatchqueue(augimdsTrain,...
    MiniBatchSize=miniBatchSize,...
    MiniBatchFcn=@preprocessMiniBatch,...
    MiniBatchFormat={'SSCB',''},...
    OutputEnvironment=output.executionEnvironment);
iteration = 0;
velocity = [];
for epoch = 1:numEpochs
    shuffle(mbq);
    while hasdata(mbq)
        iteration = iteration + 1;
        [dlX, dlY] = next(mbq);
        [gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX,dlY);
        dlnet.State = state;
        switch learnRateSchedule
            case "decay"
                learnRate = initialLearnRate/(1 + learnRateDecay*iteration);
            case "piecewise"
                if mod(iteration,learnRateDropPeriod) == 0
                    learnRate = learnRate*learnRateDropFactor;
                end
        end
        recordMetrics(monitor,iteration, ...
            LearnRate=learnRate, ...
            TrainingLoss=loss);
        output.trainingInfo.loss = [output.trainingInfo.loss; iteration loss];
        [dlnet,velocity] = sgdmupdate(dlnet,gradients,velocity,learnRate,momentum);
        if monitor.Stop
            return;
        end
    end
    numOutputs = 1;
    mbqTest = minibatchqueue(augimdsValidation,numOutputs, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@preprocessMiniBatchPredictors, ...
        MiniBatchFormat="SSCB");
    predictions = modelPredictions(dlnet,mbqTest,classes);
    YTest = imdsValidation.Labels;
    accuracy = mean(predictions == YTest)*100.0;
    output.trainedNet = dlnet;
    monitor.Progress = (epoch*100.0)/numEpochs;
    recordMetrics(monitor,iteration, ...
        ValidationAccuracy=accuracy);
    output.trainingInfo.accuracy = [output.trainingInfo.accuracy; iteration accuracy];
end

To inspect the training function, under Training Function, click Edit. The training function opens in MATLAB® Editor. In addition, the code for the training function appears in Appendix 1 at the end of this example.

In the Metrics section, the Optimize and Direction fields indicate the metric that the Bayesian optimization algorithm uses as an objective function. For this experiment, Experiment Manager seeks to maximize the value of the validation accuracy.

Run Experiment

When you run the experiment, Experiment Manager trains the network defined by the training function multiple times. Each trial uses a different combination of hyperparameter values. By default, Experiment Manager runs one trial at a time. If you have Parallel Computing Toolbox, you can run multiple trials at the same time. For best results, before you run your experiment, start a parallel pool with as many workers as GPUs. For more information, see Use Experiment Manager to Train Networks in Parallel.

  • To run one trial of the experiment at a time, on the Experiment Manager toolstrip, click Run.

  • To run multiple trials at the same time, click Use Parallel and then Run. If there is no current parallel pool, Experiment Manager starts one using the default cluster profile. Experiment Manager then executes multiple simultaneous trials, depending on the number of parallel workers available.

A table of results displays the training loss and validation accuracy for each trial. Experiment Manager indicates the trial with the optimal value for the selected metric. For example, in this experiment, the third trial produces the greatest validation accuracy.

While the experiment is running, click Training Plot to display the training plot and track the progress of each trial. The training plot shows the learning rate, training loss, and validation accuracy for each trial. For example, this training plot is for a trial that uses a piecewise learning rate schedule.

In contrast, this training plot is for a trial that uses a time-based decay learning rate schedule.

Evaluate Results

To test the best trial in your experiment, plot a confusion matrix.

  1. In the results table, select the trial with the highest validation accuracy.

  2. On the Experiment Manager toolstrip, click Export.

  3. In the dialog window, enter the name of a workspace variable for the exported training output. The default name is trainingOutput.

  4. Create a confusion matrix by calling the drawConfusionMatrix function, which is listed in Appendix 3 at the end of this example. As the input to the function, use the exported training output and the fraction of the Digits data set to use as a test set. For instance, in the MATLAB Command Window, enter:

drawConfusionMatrix(trainingOutput,0.5)

The function creates a confusion matrix using half of the images in the data set rotated by a small, random angle.

To record observations about the results of your experiment, add an annotation.

  1. In the results table, right-click the ValidationAccuracy cell for the best trial.

  2. Select Add Annotation.

  3. In the Annotations pane, enter your observations in the text box.

For more information, see Sort, Filter, and Annotate Experiment Results.

Close Experiment

In the Experiment Browser pane, right-click the name of the project and select Close Project. Experiment Manager closes all of the experiments and results contained in the project.

Appendix 1: Training Function

This function specifies the training data, network architecture, training options, and training procedure used by the experiment.

Input

  • params is a structure with fields from the Experiment Manager hyperparameter table.

  • monitor is an experiments.Monitor object that you can use to track the progress of the training, update information fields in the results table, record values of the metrics used by the training, and produce training plots.

Output

  • output is a structure that contains the trained network, the values of the training loss and validation accuracy, and the execution environment used for training. Experiment Manager saves this output, so you can export it to the MATLAB workspace when the training is complete.

function output = BayesOptExperiment_training(params,monitor)

output.trainedNet = [];
output.trainingInfo.loss = [];
output.trainingInfo.accuracy = [];
output.executionEnvironment = "auto";

dataFolder = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imds = imageDatastore(dataFolder, ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.9,'randomize');
inputSize = [28 28 1];
pixelRange = [-5 5];
imageAugmenter = imageDataAugmenter( ...
    RandXTranslation = pixelRange, ...
    RandYTranslation = pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
    DataAugmentation = imageAugmenter);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
classes = categories(imdsTrain.Labels);
numClasses = numel(classes);

layers = [
    imageInputLayer(inputSize,Normalization="none",Name="input")
    convolution2dLayer(5,20,Name ="conv1")
    batchNormalizationLayer(Name="bn1")
    reluLayer(Name="relu1")
    convolution2dLayer(3,20,Padding="same",Name="conv2")
    batchNormalizationLayer(Name="bn2")
    reluLayer(Name="relu2")
    convolution2dLayer(3,20,Padding="same",Name="conv3")
    batchNormalizationLayer(Name="bn3")
    reluLayer(Name="relu3")
    fullyConnectedLayer(numClasses,Name="fc")
    softmaxLayer(Name="softmax")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);

numEpochs = 10;
miniBatchSize = 128;
momentum = 0.9;

learnRateSchedule = params.Schedule;
initialLearnRate = params.InitialLearnRate;
learnRateDecay = params.DecayRate;
learnRateDropFactor = params.DropFactor;
learnRateDropPeriod = 100;
learnRate = initialLearnRate;

monitor.Metrics = ["LearnRate" "TrainingLoss" "ValidationAccuracy"];
monitor.XLabel = "Iteration";

mbq = minibatchqueue(augimdsTrain,...
    MiniBatchSize=miniBatchSize,...
    MiniBatchFcn=@preprocessMiniBatch,...
    MiniBatchFormat={'SSCB',''},...
    OutputEnvironment=output.executionEnvironment);

iteration = 0;
velocity = [];
for epoch = 1:numEpochs
    shuffle(mbq);

    while hasdata(mbq)
        iteration = iteration + 1;

        [dlX, dlY] = next(mbq);
        
        [gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX,dlY);
        dlnet.State = state;
        
        switch learnRateSchedule
            case "decay"
                learnRate = initialLearnRate/(1 + learnRateDecay*iteration);
            case "piecewise"
                if mod(iteration,learnRateDropPeriod) == 0
                    learnRate = learnRate*learnRateDropFactor;
                end
        end
        
        recordMetrics(monitor,iteration, ...
            LearnRate=learnRate, ...
            TrainingLoss=loss);
        output.trainingInfo.loss = [output.trainingInfo.loss; iteration loss];
        
        [dlnet,velocity] = sgdmupdate(dlnet,gradients,velocity,learnRate,momentum);
        
        if monitor.Stop
            return;
        end
    end

    numOutputs = 1;
    mbqTest = minibatchqueue(augimdsValidation,numOutputs, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@preprocessMiniBatchPredictors, ...
        MiniBatchFormat="SSCB");
    predictions = modelPredictions(dlnet,mbqTest,classes);
    YTest = imdsValidation.Labels;
    accuracy = mean(predictions == YTest)*100.0;
    
    output.trainedNet = dlnet;
    monitor.Progress = (epoch*100.0)/numEpochs;
    recordMetrics(monitor,iteration, ...
        ValidationAccuracy=accuracy);
    output.trainingInfo.accuracy = [output.trainingInfo.accuracy; iteration accuracy];
end
end

Appendix 2: Custom Training Helper Functions

The modelGradients function takes as input a dlnetwork object dlnet and a mini-batch of input data dlX with corresponding labels Y, and returns the gradients of the loss with respect to the learnable parameters in dlnet, the network state, and the loss. To compute the gradients automatically, use the dlgradient function.

function [gradients,state,loss] = modelGradients(dlnet,dlX,Y)
[dlYPred,state] = forward(dlnet,dlX);
loss = crossentropy(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
loss = double(gather(extractdata(loss)));
end

The modelPredictions function takes as input a dlnetwork object dlnet, a minibatchqueue of input data mbq, and the network classes, and computes the model predictions by iterating over all data in the minibatchqueue object. The function uses the onehotdecode function to find the predicted class with the highest score.

function predictions = modelPredictions(dlnet,mbq,classes)
predictions = [];
while hasdata(mbq)
    dlXTest = next(mbq);
    dlYPred = predict(dlnet,dlXTest);
    YPred = onehotdecode(dlYPred,classes,1)';
    predictions = [predictions; YPred];
end
end

The preprocessMiniBatch function preprocesses a mini-batch of predictors and labels using these steps:

  1. Preprocess the images using the preprocessMiniBatchPredictors function.

  2. Extract the label data from the incoming cell array and concatenate the data into a categorical array along the second dimension.

  3. One-hot encode the categorical labels into numeric arrays. Encoding into the first dimension produces an encoded array that matches the shape of the network output.

function [X,Y] = preprocessMiniBatch(XCell,YCell)
X = preprocessMiniBatchPredictors(XCell);
Y = cat(2,YCell{1:end});
Y = onehotencode(Y,1);
end

The preprocessMiniBatchPredictors function preprocesses a mini-batch of predictors by extracting the image data from the input cell array and concatenating the data into a numeric array. For grayscale input, concatenating over the fourth dimension adds a third dimension to each image, for use as a singleton channel dimension.

function X = preprocessMiniBatchPredictors(XCell)
X = cat(4,XCell{1:end});
end

Appendix 3: Create Confusion Matrix

This function takes as input a trained network and the fraction of the Digits data set to use as a test set and creates a confusion matrix chart. This function uses the helper functions modelPredictions and preprocessMiniBatchPredictors, which are listed in Appendix 2.

function drawConfusionMatrix(trainingOutput,testSize)

dataFolder = fullfile(toolboxdir("nnet"), ...
    "nndemos","nndatasets","DigitDataset");
imds = imageDatastore(dataFolder, ...
    IncludeSubfolders=true, ....
    LabelSource="foldernames");
imdsTest = splitEachLabel(imds,testSize,"randomize");
inputSize = [28 28 1];
imageAugmenter = imageDataAugmenter(RandRotation=[-15 15]);
augimdsTest = augmentedImageDatastore(inputSize(1:2),imdsTest, ...
    DataAugmentation=imageAugmenter);
classes = categories(imdsTest.Labels);

trainedNet = trainingOutput.trainedNet;
numOutputs = 1;
miniBatchSize = 128;
mbqTest = minibatchqueue(augimdsTest,numOutputs, ...
    MiniBatchSize=miniBatchSize, ...
    MiniBatchFcn=@preprocessMiniBatchPredictors, ...
    MiniBatchFormat="SSCB");
predictedLabels = modelPredictions(trainedNet,mbqTest,classes);
trueLabels = imdsTest.Labels;

figure
confusionchart(trueLabels,predictedLabels, ...
    ColumnSummary="column-normalized", ...
    RowSummary="row-normalized", ...
    Title="Confusion Matrix for Digits Data Set");
cm = gcf;
cm.Position(3) = cm.Position(3)*1.5;

end

See Also

Apps

Objects

Related Topics