Reinforcement learning action getting saturated at one range of values

19 views (last 30 days)
Hi all,
I have a reinforcement learning env with 4 observations and 6 actions. Each action has a lower limit of 0.05 and an upper limit of 1. I see that the actions during training are getting saturated at one band of values.
Example: The action limits that has been specified is 0.05 to 1. But I see that the action output during training varies in the range of 0 to 0.16 only and does not go out of that band
I have attached a capture of the action output during training.
Attaching the code below
clc;
clear;
close;
%Load the parameters for the simulink
SPWM_RL_Data;
%Open Simulink Model
mdl = "RL_Debug";
open_system(mdl);
%Create Environment Interface
open_system('RL_Debug/Firing Unit');
%Create Observation specifications
numObservations = 4;
observationInfo = rlNumericSpec([numObservations 1]);
observationInfo.Name = 'observations';
observationInfo.Description = 'Error signals';
%Create Action Specifications
numActions = 6;
actionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0.05;0.05;0.05;0.05;0.05;0.05],'UpperLimit',[1;1;1;1;1;1]);
actionInfo.Name = 'switchingPulses';
%Create Simulink environment for observation and action specifications
agentblk = 'RL_Debug/Firing Unit/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
%Get observation and action info from the environment
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
rng(0) % fix the random seed
statePath = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64,'Name','fc1')];
actionPath = [featureInputLayer(numActions, 'Normalization', 'none', 'Name','Action')
fullyConnectedLayer(64, 'Name','fc2')];
commonPath = [additionLayer(2,'Name','add')
reluLayer('Name','relu2')
fullyConnectedLayer(32, 'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(16, 'Name','fc4')
fullyConnectedLayer(1, 'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','add/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','add/in2');
%Create a representation of the critic using recurrent neural network
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(32, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numActions,'Name','Action')
tanhLayer('Name','tanh1')
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',0.001);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'scale'},actorOptions);
%Ts_agent = Ts;
agentOptions = rlTD3AgentOptions("SampleTime",Ts_agent, ...
"DiscountFactor", 0.995, ...
"ExperienceBufferLength",2e6, ...
"MiniBatchSize",512, ...
"NumStepsToLookAhead",5, ...
"TargetSmoothFactor",0.005, ...
"TargetUpdateFrequency",2);
agentOptions.ExplorationModel.Variance = 0.05;
agentOptions.ExplorationModel.VarianceDecayRate = 2e-4;
agentOptions.ExplorationModel.VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel.Variance = 0.1;
agentOptions.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
agent = rlTD3Agent(actor,[critic1,critic2],agentOptions);
%T = 1.0;
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts_agent);
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',8000,...
'ScoreAveragingWindowLength',100);
if(doTraining)
trainStats = train(agent,env,trainingOpts);
save("Agent.mat","agent")
else
load("Agent.mat")
end
%Simulate the Agent
rng(0);
simOptions = rlSimulationOptions('MaxSteps', maxsteps, 'NumSimulations', 1);
sim(env,agent,simOptions);

Accepted Answer

Emmanouil Tzorakoleftherakis
Edited: Emmanouil Tzorakoleftherakis on 20 Jun 2023
Your scaling layer is not set up correctly. You want to scale to (upper limit-lower limit)/2 and then shift accordingly.
scalingLayer('Scale',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2,'Bias',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2)

More Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!