Gradient in Levenberg Marquardt algorithm
Show older comments
I have trained a 5 neuron network on my dataset and extract the weights and biases oh this network. Then make another network with 10 neurons and set previous weights ans biases as follows.
net_d3.IW{1,1} = [IW;IW];
net_d3.b{1} = [b1;b1];
net_d3.LW{2,1} =(1/2)* [LW LW];
net_d3.b{2} = b2;
I set all the data set to be training set and also fix learning rate to be 1e-1. Then train net2 for one more epoch and the net_d3 for one epoch.
Basically the net_d3 has to behave just like net2 (first network), but it does not and from the first epoch the gradient and performance of net_d3 starts to be different with net2?
Is there any explanation for this behaviour?
Also, if we set N1>100, we see that net_d3.IW{1,1} which should be consist of two identical blocks of weights, starts to change. For instance, element (1,1) should be similar to element (6,1) in net_d3.IW{1,1}. Why?
P.S: X_tr and Y_tr are 2D input and 1D output. (It is a regression problem)
rng(14)
net2 = feedforwardnet([5]);
net2 = configure(net2,X_tr,Y_tr);
net2.trainParam.epochs = 61;
net2.divideFcn = 'dividetrain';
net2.trainParam.mu = 1e-1 ; % rng(1,9,14) produce good results
net2 = train(net2,X_tr,Y_tr);
net2.trainParam.epochs = 1;
N1 = 1 ; % number of epochs that networks are going to be trained, and then check the WB
IW = net2.IW{1,1} ;
b1 = net2.b{1} ;
LW = net2.LW{2,1};
b2 = net2.b{2};
net_d3 = feedforwardnet([10]);
net_d3= configure(net_d3,X_tr,Y_tr);
net_d3.trainParam.epochs = 1;
net_d3.divideFcn = 'dividetrain';
net_d3.trainParam.mu = 1e-1 ;
net_d3.IW{1,1} = [IW;IW];
net_d3.b{1} = [b1;b1];
net_d3.LW{2,1} =(1/2)* [LW LW];
net_d3.b{2} = b2;
net_d3.trainParam.showWindow = false;
for i=1: N1
net2 = train(net2,X_tr,Y_tr);
err_org(i) = mean((net2(X_tr)-Y_tr).^2);
net_d3 = train(net_d3,X_tr,Y_tr);
err_ens(i) = mean((net_d3(X_tr)-Y_tr).^2);
end
Answers (1)
Jayanti
on 8 Jul 2025
0 votes
Hi Parham,
Even though you initialize the 10-neuron network "net_d3" such that it mimics the 5-neuron network "net2", "Levenberg-Marquardt" applies weight updates independently to each parameter based on the structure of the Jacobian matrix, which changes with network size. As a result, symmetry across duplicated neurons does not hold after training begins.
Levenberg-Marquardt relies on second-order approximations and the Jacobian and consequently the Hessian approximation changes when moving from a 5-neuron to a 10-neuron model.
You can also refer to the below documentation for your reference:
Categories
Find more on Pattern Recognition in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!