Gradient in Levenberg Marquardt algorithm

Question

0 votes

I have trained a 5 neuron network on my dataset and extract the weights and biases oh this network. Then make another network with 10 neurons and set previous weights ans biases as follows.

net_d3.IW{1,1} = [IW;IW];

net_d3.b{1} = [b1;b1];

net_d3.LW{2,1} =(1/2)* [LW LW];

net_d3.b{2} = b2;

I set all the data set to be training set and also fix learning rate to be 1e-1. Then train net2 for one more epoch and the net_d3 for one epoch.

Basically the net_d3 has to behave just like net2 (first network), but it does not and from the first epoch the gradient and performance of net_d3 starts to be different with net2?

Is there any explanation for this behaviour?

Also, if we set N1>100, we see that net_d3.IW{1,1} which should be consist of two identical blocks of weights, starts to change. For instance, element (1,1) should be similar to element (6,1) in net_d3.IW{1,1}. Why?

P.S: X_tr and Y_tr are 2D input and 1D output. (It is a regression problem)

rng(14)
net2 = feedforwardnet([5]);
net2 = configure(net2,X_tr,Y_tr);
net2.trainParam.epochs = 61;
net2.divideFcn = 'dividetrain';
net2.trainParam.mu = 1e-1 ;     % rng(1,9,14) produce good results
net2 = train(net2,X_tr,Y_tr);
net2.trainParam.epochs = 1;
N1 = 1 ;        % number of epochs that networks are going to be  trained, and then check the WB
IW = net2.IW{1,1} ;   
b1  = net2.b{1}      ;
LW = net2.LW{2,1};    
b2 = net2.b{2};
net_d3 = feedforwardnet([10]);
net_d3= configure(net_d3,X_tr,Y_tr);
net_d3.trainParam.epochs = 1;
net_d3.divideFcn = 'dividetrain';
net_d3.trainParam.mu = 1e-1 ;
net_d3.IW{1,1} = [IW;IW];
net_d3.b{1} = [b1;b1];
net_d3.LW{2,1} =(1/2)* [LW LW];
net_d3.b{2} = b2;
net_d3.trainParam.showWindow = false;
for i=1: N1
    net2 = train(net2,X_tr,Y_tr);
    err_org(i) = mean((net2(X_tr)-Y_tr).^2);
    
    net_d3 = train(net_d3,X_tr,Y_tr);
    err_ens(i) = mean((net_d3(X_tr)-Y_tr).^2);
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jayanti on 8 Jul 2025

0 votes

Hi Parham,

Even though you initialize the 10-neuron network "net_d3" such that it mimics the 5-neuron network "net2", "Levenberg-Marquardt" applies weight updates independently to each parameter based on the structure of the Jacobian matrix, which changes with network size. As a result, symmetry across duplicated neurons does not hold after training begins.

Levenberg-Marquardt relies on second-order approximations and the Jacobian and consequently the Hessian approximation changes when moving from a 5-neuron to a 10-neuron model.

You can also refer to the below documentation for your reference:

https://www.mathworks.com/help/deeplearning/ref/trainlm.html#:~:text=sse%20performance%20function.-,More%20About,-collapse%20all

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Gradient in Levenberg Marquardt algorithm

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Gradient in Levenberg Marquardt algorithm

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments