freezing layers of actor and critic of RL agent

Question

Sourabh on 30 Jan 2024

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/2076011-freezing-layers-of-actor-and-critic-of-rl-agent

Edited: Karanjot on 30 Jan 2024

rewards_refer.png

After training ,I have freezed every layer of my actor and crtitc network of my RL agent (by using setLearnRateFactor(neuralnet,'layers','parameters',0);) and then I am retraining my agent in same enviornment and I am getting rewards like as shown in image file.

My ques is is it normal to get rewards like this? (I mean shouldnt there should be no variation or very little variation in rewards.)

my reward function is 10 - e^2 (error).

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Karanjot on 30 Jan 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2076011-freezing-layers-of-actor-and-critic-of-rl-agent#answer_1399471

Edited: Karanjot on 30 Jan 2024

Observing fluctuations in rewards is a common occurrence when retraining a reinforcement learning (RL) agent, despite having locked the parameters of both the actor and critic architectures. The agent continues its exploration and learning within the given environment, and the specified reward function significantly influences the reward outcomes.

The variation in rewards can be influenced by several factors, such as the exploration-exploitation trade-off, the complexity of the environment, and the learning rate of the agent. It is possible that the agent is still trying to optimize its policy and may encounter different states or actions that result in varying rewards.

The environment’s inherent stochasticity can lead to different state transitions and rewards for similar actions. Additionally, if there are other unfrozen parameters or noise processes involved in action selection, they could contribute to the observed variations

You may consider the following steps:

Plot the rewards over time during the retraining process to observe the trend. This can help you understand if the rewards are converging or not.
Experiment with different learning rates for the agent. A higher learning rate may lead to faster convergence but could also result in more variation initially.
You can also try modifying the reward function to see if it reduces the variation in rewards.

Keep in mind that RL training is inherently iterative, and achieving an optimal policy often requires multiple iterations. While some degree of reward variation is to be anticipated, it may indicate a need for further investigation or adjustments in your training setup.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

freezing layers of actor and critic of RL agent

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

freezing layers of actor and critic of RL agent

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments