High training error at the beginning of training the Convolutional neural network

3 views (last 30 days)
In the Convolutional neural network, I'm working on training CNN, and during the training process especially at the beginning of my training I get extremely high training error after that this error starts go down slowly. After approximately 500 Epochs the training error comes near to zero (e.g. 0.006604). Then, I took the final obtained model to measure its accuracy against the testing data, I've got about 89.50%. Does that normal? I mean getting a high training error rate at the very beginning of my training process. Another thing, I'd like to mention is that I've noticed that every time i decrease the number of the hidden nodes the results become better at the end of my training.
My CNN structure is:
config.forward_pass_scheme = {'conv_v', 'pool', 'conv_v', 'pool', 'conv_v', 'pool', 'conv_v','full', 'full', 'full', 'out'};
Here are some of my hyper parameters:
config.learning_rate = 0.01;
config.weight_range = 2;
config.decay = 0.0005;
config.normalize_init_weights = 1;
config.dropout_full_layer = 1;
config.optimization = 'adagrad';
Your help and suggestion in this regard is highly appreciated, thank you in advance.

Answers (1)

Greg Heath
Greg Heath on 28 Apr 2017
>during the training process especially at the beginning of my training I >get extremely high training error after that this error starts go down >slowly. After approximately 500 Epochs the training error comes near to >zero (e.g. 0.006604). Then, I took the final obtained model to measure >its accuracy against the testing data, I've got about 89.50%. Does that >normal?
That is not unusual.
>I mean getting a high training error rate at the very beginning of my >training process.
Yes. It's not unusual
>Another thing, I'd like to mention is that I've noticed that every time i >decrease the number of the hidden nodes the results become better at the >end of my training.
This, also is not unusual. It often occurs when an overfit net (i.e., see below for H > Hub and H >> Hub) is overtrained.
Assume
[ I N ] = size(input) % "I"nput matrix
[ O N ] = size(target) % "O"utput target matrix
[ O N ] = size(output) % "O"utput matrix
Ntrn = 0.7*N % Default value for number of training inputs
Ntrneq = Ntrn*O % Number of training equations
H = numberofhiddennodes
[ H I ] = size(IW) % IW = inputweightmatrix
[ H 1 ] = size(b1) % B1 = inputbiasvector
[ O H ] = size(LW) % LW = layerweightmatrix
[ 1 H ] = size(B2) % B1 = outputbiasvector
Then, the number of unknown weights is
Nw = (I+1)*H + (H+1)*O
The number of unknowns exceeds the number of equations when
Nw > Ntrneq
or
H > Hub
where the upper bound is
Hub = (Ntrneq-O)/(I+O+1)
When H > Hub there are two common ways to mitigate this.
1. STOPPED TRAINING: Stop training when the error on a validation subset
increases for a specified number of epochs. The default in the NN Toolbox is
a. Automatic trn/val/tst subset data division in the ratios 0.7/0.15/0.15
b. A 6 epoch limit of continuous val subset error increases.
2. REGULARIZATION: Using a trn/val/tst subset data division in the ratio
0.85/0.0/0.15 with the training function TRAINBR that uses a "regularized"
error function consisting of a weighted sum of sum-squared-error and
sum-squared weights.
3. Other useful search terms in the NEWSGROUP, ANSWERS and the internet are
a. OVERFITTING
b. OVERTRAINING
Hope this helps.
Thank you for formally accepting my answer
Greg

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!