Reinforcement learning action getting saturated at one range of values

Question

Janani Sunil on 14 Apr 2021

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/802096-reinforcement-learning-action-getting-saturated-at-one-range-of-values

Edited: Emmanouil Tzorakoleftherakis on 20 Jun 2023

Accepted Answer: Emmanouil Tzorakoleftherakis

Capture.PNG

Open in MATLAB Online

Hi all,

I have a reinforcement learning env with 4 observations and 6 actions. Each action has a lower limit of 0.05 and an upper limit of 1. I see that the actions during training are getting saturated at one band of values.

Example: The action limits that has been specified is 0.05 to 1. But I see that the action output during training varies in the range of 0 to 0.16 only and does not go out of that band

I have attached a capture of the action output during training.

Attaching the code below

clc;
clear;
close;
%Load the parameters for the simulink
SPWM_RL_Data;
%Open Simulink Model
mdl = "RL_Debug";
open_system(mdl);
%Create Environment Interface
open_system('RL_Debug/Firing Unit');
%Create Observation specifications
numObservations = 4;
observationInfo = rlNumericSpec([numObservations 1]);
observationInfo.Name = 'observations';
observationInfo.Description = 'Error signals';
%Create Action Specifications
numActions = 6;
actionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0.05;0.05;0.05;0.05;0.05;0.05],'UpperLimit',[1;1;1;1;1;1]);
actionInfo.Name = 'switchingPulses';
%Create Simulink environment for observation and action specifications
agentblk = 'RL_Debug/Firing Unit/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
%Get observation and action info from the environment
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
rng(0)  % fix the random seed
statePath = [featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(64,'Name','fc1')];
actionPath = [featureInputLayer(numActions, 'Normalization', 'none', 'Name','Action')
    fullyConnectedLayer(64, 'Name','fc2')];
commonPath = [additionLayer(2,'Name','add')
    reluLayer('Name','relu2')
    fullyConnectedLayer(32, 'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(16, 'Name','fc4')
    fullyConnectedLayer(1, 'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','add/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','add/in2');
%Create a representation of the critic using recurrent neural network
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
        'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObservations,'Normalization','none','Name','State')
    fullyConnectedLayer(64, 'Name','actorFC1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(32, 'Name','actorFC2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numActions,'Name','Action')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','scale','Scale',actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',0.001);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'State'},'Action',{'scale'},actorOptions);
%Ts_agent = Ts;
agentOptions = rlTD3AgentOptions("SampleTime",Ts_agent, ...
    "DiscountFactor", 0.995, ...
    "ExperienceBufferLength",2e6, ...
    "MiniBatchSize",512, ...
    "NumStepsToLookAhead",5, ...
    "TargetSmoothFactor",0.005, ...
    "TargetUpdateFrequency",2);
agentOptions.ExplorationModel.Variance = 0.05;
agentOptions.ExplorationModel.VarianceDecayRate = 2e-4;
agentOptions.ExplorationModel.VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel.Variance = 0.1;
agentOptions.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
agent = rlTD3Agent(actor,[critic1,critic2],agentOptions);
%T = 1.0;
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts_agent); 
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',8000,... 
    'ScoreAveragingWindowLength',100);
if(doTraining)
  trainStats = train(agent,env,trainingOpts);
  save("Agent.mat","agent")
else   
    load("Agent.mat")
end
%Simulate the Agent
rng(0);
simOptions = rlSimulationOptions('MaxSteps', maxsteps, 'NumSimulations', 1);
sim(env,agent,simOptions);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 15 Apr 2021

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/802096-reinforcement-learning-action-getting-saturated-at-one-range-of-values#answer_676031

Edited: Emmanouil Tzorakoleftherakis on 20 Jun 2023

Open in MATLAB Online

Your scaling layer is not set up correctly. You want to scale to (upper limit-lower limit)/2 and then shift accordingly.

 scalingLayer('Scale',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2,'Bias',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2)

1 Comment
Show -1 older commentsHide -1 older comments

Janani Sunil on 19 Apr 2021

Thank you @Emmanouil Tzorakoleftherakis. This was of great help

Sign in to comment.

Reinforcement learning action getting saturated at one range of values

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Reinforcement learning action getting saturated at one range of values

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments