RL DQN agent Episode Q0 does not converge to Average Reward

Question

Amin Moradi on 24 Feb 2022

1
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1658025-rl-dqn-agent-episode-q0-does-not-converge-to-average-reward

Answered: Ronit on 16 Feb 2024

I'm using Reinforcement Learning Toolbox for MATLAB R2021b and I'm training a DQN agent. After choosing a appropriate discount factor and other parameters, it seems that my average rewards are good and correct but the problem is my Epsiode Q0 won't converge to Average Rewards. I have attached the training results. I would be grateful if someone can help me on correcting this or informing me of the possible reasons that this error would happen. Here is my code for training part, you can see the training parameters in the code:

ObservationInfo = rlNumericSpec([1 11]);
ObservationInfo.Name = 'Line State';
ObservationInfo.Description = 'line1, line2, line3, line4, line5, line6, line7, line8, line9, line10, line11';
ObservationInfo.LowerLimit=0;
ObservationInfo.UpperLimit=1;
ActionInfo = rlFiniteSetSpec([1 2 3 4 5 6 7 8 9 10 11]);
ActionInfo.Name = 'Attacker Action';
ActionInfo.Description = ['attack-line1, attack-line2, attack-line3, attack-line4, ' ...
    'attack-line5, attack-line6, attack-line7, attack-line8, attack-line9, attack-line10, attack-line11'];
env = rlFunctionEnv(ObservationInfo, ActionInfo,'WW6_StepFunction_genloss','WW6_ResetFunction');
dnn = [
    featureInputLayer(obsInfo.Dimension(2),'Normalization','none','Name','state')
    fullyConnectedLayer(120,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(120, 'Name','CriticStateFC2')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(length(actInfo.Elements),'Name','output')];
 
criticOpts = rlRepresentationOptions('LearnRate',0.001,'GradientThreshold',1);
critic = rlQValueRepresentation(dnn,obsInfo,actInfo,'Observation',{'state'},criticOpts);
agentOpts = rlDQNAgentOptions(...
    'NumStepsToLookAhead',1,... % used for parallel computing
    'UseDoubleDQN',true, ...    
    'TargetSmoothFactor',1e-1, ...
    'TargetUpdateFrequency',4, ...   
    'ExperienceBufferLength',100000, ...
    'DiscountFactor',0.7, ...
    'MiniBatchSize',256 ...
    );
agentOpts.EpsilonGreedyExploration.Epsilon=1;
agentOpts.EpsilonGreedyExploration.EpsilonDecay=0.005;
agentOpts.EpsilonGreedyExploration.EpsilonMin=0.1;
agent = rlDQNAgent(critic,agentOpts);

trainOpts = rlTrainingOptions(...
    'UseParallel',true,... % used for parallel computing
    'MaxEpisodes',8000, ...
    'MaxStepsPerEpisode',5, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',900);
trainOpts.ScoreAveragingWindowLength=20;
trainingStats = train(agent,env,trainOpts);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ronit on 16 Feb 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1658025-rl-dqn-agent-episode-q0-does-not-converge-to-average-reward#answer_1410293

Hi,

I've noticed your concern regarding the convergence of Q0 on the track of average reward. It's important to recognize that if your model is already producing good results, the behaviour of Episode Q0 may not be a significant issue.

For more details regarding this you can refer to the following community answers:

Remember that reinforcement learning can be sensitive to hyperparameter settings and requires a lot of trial and error to find the right combination for a given problem. Should you decide to align Episode Q0 with the average reward track more closely, here are some adjustments you might consider:

Epsilon Decay Rate: Adjust the epsilon decay rate to ensure enough exploration throughout the training.
Larning rate: Experiment with different learning rates.
Discount Factor: Adjust the discount factor to better balance immediate and future rewards.
Target Network Update Frequency: Change the target network update frequency to improve stability.
Episodes: Increase the number of episodes or steps per episode.
Reward Function: Review and possibly redesign the reward function.
Step and Reset Functions: Check the implementation of your environment's step and reset functions for potential issues.

You can also use Bayesian Optimization, a framework provided by MATLAB through the ‘bayesopt’ function. It is an efficient method for global optimization of black-box functions that can be used to tune hyperparameters of an RL agent.

Following is the link for more details: https://www.mathworks.com/help/stats/bayesianoptimization.html?s_tid=doc_ta

Hope this helps!

Ronit Jain

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

RL DQN agent Episode Q0 does not converge to Average Reward

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

RL DQN agent Episode Q0 does not converge to Average Reward

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments