Neural Network Training Concepts

This topic is part of the design workflow described in Workflow for Neural Network Design.

This topic describes two different styles of training. In i ncremental training the weights and biases of the network are updated each time an input is presented to the network. In batch training the weights and biases are only updated after all the inputs are presented. The batch training methods are generally more efficient in the MATLAB^® environment, and they are emphasized in the Deep Learning Toolbox™ software, but there are some applications where incremental training can be useful, so that paradigm is implemented as well.

Incremental Training with adapt

Incremental training can be applied to both static and dynamic networks, although it is more commonly used with dynamic networks, such as adaptive filters. This section illustrates how incremental training is performed on both static and dynamic networks.

Incremental Training of Static Networks

Consider again the static network used for the first example. You want to train it incrementally, so that the weights and biases are updated after each input is presented. In this case you use the function adapt, and the inputs and targets are presented as sequences.

Suppose you want to train the network to create the linear function:

$t = 2 p_{1} + p_{2}$

Then for the previous inputs,

$p_{1} = [\begin{array}{l} 1 \\ 2 \end{array}], p_{2} = [\begin{array}{l} 2 \\ 1 \end{array}], p_{3} = [\begin{array}{l} 2 \\ 3 \end{array}], p_{4} = [\begin{array}{l} 3 \\ 1 \end{array}]$

the targets would be

$t_{1} = [4], t_{2} = [5], t_{3} = [7], t_{4} = [7]$

For incremental training, you present the inputs and targets as sequences:

P = {[1;2] [2;1] [2;3] [3;1]};
T = {4 5 7 7};

First, set up the network with zero initial weights and biases. Also, set the initial learning rate to zero to show the effect of incremental training.

net = linearlayer(0,0);
net = configure(net,P,T);
net.IW{1,1} = [0 0];
net.b{1} = 0;

Recall from Simulation with Concurrent Inputs in a Static Network that, for a static network, the simulation of the network produces the same outputs whether the inputs are presented as a matrix of concurrent vectors or as a cell array of sequential vectors. However, this is not true when training the network. When you use the adapt function, if the inputs are presented as a cell array of sequential vectors, then the weights are updated as each input is presented (incremental mode). As shown in the next section, if the inputs are presented as a matrix of concurrent vectors, then the weights are updated only after all inputs are presented (batch mode).

You are now ready to train the network incrementally.

[net,a,e,pf] = adapt(net,P,T);

The network outputs remain zero, because the learning rate is zero, and the weights are not updated. The errors are equal to the targets:

a = [0]    [0]    [0]    [0]
e = [4]    [5]    [7]    [7]

If you now set the learning rate to 0.1 you can see how the network is adjusted as each input is presented:

net.inputWeights{1,1}.learnParam.lr = 0.1;
net.biases{1,1}.learnParam.lr = 0.1;
[net,a,e,pf] = adapt(net,P,T);
a = [0]    [2]    [6]    [5.8]
e = [4]    [3]    [1]    [1.2]

The first output is the same as it was with zero learning rate, because no update is made until the first input is presented. The second output is different, because the weights have been updated. The weights continue to be modified as each error is computed. If the network is capable and the learning rate is set correctly, the error is eventually driven to zero.

Incremental Training with Dynamic Networks

You can also train dynamic networks incrementally. In fact, this would be the most common situation.

To train the network incrementally, present the inputs and targets as elements of cell arrays. Here are the initial input Pi and the inputs P and targets T as elements of cell arrays.

Pi = {1};
P = {2 3 4};
T = {3 5 7};

Take the linear network with one delay at the input, as used in a previous example. Initialize the weights to zero and set the learning rate to 0.1.

net = linearlayer([0 1],0.1);
net = configure(net,P,T);
net.IW{1,1} = [0 0];
net.biasConnect = 0;

You want to train the network to create the current output by summing the current and the previous inputs. This is the same input sequence you used in the previous example with the exception that you assign the first term in the sequence as the initial condition for the delay. You can now sequentially train the network using adapt.

[net,a,e,pf] = adapt(net,P,T,Pi);
a = [0] [2.4] [7.98]
e = [3] [2.6] [-0.98]

The first output is zero, because the weights have not yet been updated. The weights change at each subsequent time step.

Batch Training

Batch training, in which weights and biases are only updated after all the inputs and targets are presented, can be applied to both static and dynamic networks. Both types of networks are discussed in this section.

Batch Training with Static Networks

Batch training can be done using either adapt or train, although train is generally the best option, because it typically has access to more efficient training algorithms. Incremental training is usually done with adapt; batch training is usually done with train.

For batch training of a static network with adapt, the input vectors must be placed in one matrix of concurrent vectors.

P = [1 2 2 3; 2 1 3 1];
T = [4 5 7 7];

Begin with the static network used in previous examples. The learning rate is set to 0.01.

net = linearlayer(0,0.01);
net = configure(net,P,T);
net.IW{1,1} = [0 0];
net.b{1} = 0;

When you call adapt, it invokes trains (the default adaption function for the linear network) and learnwh (the default learning function for the weights and biases). trains uses Widrow-Hoff learning.

[net,a,e,pf] = adapt(net,P,T);
a = 0 0 0 0
e = 4 5 7 7

Note that the outputs of the network are all zero, because the weights are not updated until all the training set has been presented. If you display the weights, you find

net.IW{1,1}
  ans = 0.4900 0.4100
net.b{1}
  ans =
    0.2300

This is different from the result after one pass of adapt with incremental updating.

Now perform the same batch training using train. Because the Widrow-Hoff rule can be used in incremental or batch mode, it can be invoked by adapt or train. (There are several algorithms that can only be used in batch mode (e.g., Levenberg-Marquardt), so these algorithms can only be invoked by train.)

For this case, the input vectors can be in a matrix of concurrent vectors or in a cell array of sequential vectors. Because the network is static and because train always operates in batch mode, train converts any cell array of sequential vectors to a matrix of concurrent vectors. Concurrent mode operation is used whenever possible because it has a more efficient implementation in MATLAB code:

P = [1 2 2 3; 2 1 3 1];
T = [4 5 7 7];

The network is set up in the same way.

net = linearlayer(0,0.01);
net = configure(net,P,T);
net.IW{1,1} = [0 0];
net.b{1} = 0;

Now you are ready to train the network. Train it for only one epoch, because you used only one pass of adapt. The default training function for the linear network is trainb, and the default learning function for the weights and biases is learnwh, so you should get the same results obtained using adapt in the previous example, where the default adaption function was trains.

net.trainParam.epochs = 1;
net = train(net,P,T);

If you display the weights after one epoch of training, you find

net.IW{1,1}
  ans = 0.4900 0.4100
net.b{1}
  ans =
    0.2300

This is the same result as the batch mode training in adapt. With static networks, the adapt function can implement incremental or batch training, depending on the format of the input data. If the data is presented as a matrix of concurrent vectors, batch training occurs. If the data is presented as a sequence, incremental training occurs. This is not true for train, which always performs batch training, regardless of the format of the input.

Batch Training with Dynamic Networks

Training static networks is relatively straightforward. If you use train the network is trained in batch mode and the inputs are converted to concurrent vectors (columns of a matrix), even if they are originally passed as a sequence (elements of a cell array). If you use adapt, the format of the input determines the method of training. If the inputs are passed as a sequence, then the network is trained in incremental mode. If the inputs are passed as concurrent vectors, then batch mode training is used.

With dynamic networks, batch mode training is typically done with train only, especially if only one training sequence exists. To illustrate this, consider again the linear network with a delay. Use a learning rate of 0.02 for the training. (When using a gradient descent algorithm, you typically use a smaller learning rate for batch mode training than incremental training, because all the individual gradients are summed before determining the step change to the weights.)

net = linearlayer([0 1],0.02);
net.inputs{1}.size = 1;
net.layers{1}.dimensions = 1;
net.IW{1,1} = [0 0];
net.biasConnect = 0;
net.trainParam.epochs = 1;
Pi = {1};
P = {2 3 4};
T = {3 5 6};

You want to train the network with the same sequence used for the incremental training earlier, but this time you want to update the weights only after all the inputs are applied (batch mode). The network is simulated in sequential mode, because the input is a sequence, but the weights are updated in batch mode.

net = train(net,P,T,Pi);

The weights after one epoch of training are

net.IW{1,1}
ans = 0.9000    0.6200

These are different weights than you would obtain using incremental training, where the weights would be updated three times during one pass through the training set. For batch training the weights are only updated once in each epoch.

Training Feedback

The showWindow parameter allows you to specify whether a training window is visible when you train. The training window appears by default. Two other parameters, showCommandLine and show, determine whether command-line output is generated and the number of epochs between command-line feedback during training. For instance, this code turns off the training window and gives you training status information every 35 epochs when the network is later trained with train:

net.trainParam.showWindow = false;
net.trainParam.showCommandLine = true;
net.trainParam.show= 35;

Sometimes it is convenient to disable all training displays. To do that, turn off both the training window and command-line feedback:

net.trainParam.showWindow = false;
net.trainParam.showCommandLine = false;

The training window appears automatically when you train.