Error:
Dot indexing is not supported for variables of this type.
observation = {experiences.Observation};
s = rl.util.expstruct2timeserstruct(exp,time,oinfo,ainfo);
Code:
Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1],...
LowerLimit=[-inf -inf 0 ]',...
UpperLimit=[ inf inf inf]');
obsInfo.Name = "observations";
obsInfo.Description = "integrated error, error, and measured height";
actInfo = rlNumericSpec([1 1]);
env = rlSimulinkEnv("draft","draft/RL Agent",...
env.ResetFcn = @(in)localResetFcn(in);
Create the Critic
DDPG agents use a parametrized Q-value function approximator to estimate the value of the policy. A Q-value function critic takes the current observation and an action as inputs and returns a single scalar as output (the estimated discounted cumulative long-term reward for which receives the action from the state corresponding to the current observation, and following the policy thereafter).
To model the parametrized Q-value function within the critic, use a neural network with two input layers (one for the observation channel, as specified by obsInfo, and the other for the action channel, as specified by actInfo) and one output layer (which returns the scalar value).
Define each network path as an array of layer objects. Assign names to the input and output layers of each path. These names allow you to connect the paths and then later explicitly associate the network input and output layers with the appropriate environment channel. Obtain the dimension of the observation and action spaces from the obsInfo and actInfo specifications.
featureInputLayer(obsInfo.Dimension(1),Name="obsInLyr")
fullyConnectedLayer(25,Name="obsPathOutLyr")
featureInputLayer(actInfo.Dimension(1),Name="actInLyr")
fullyConnectedLayer(25,Name="actPathOutLyr")
additionLayer(2,Name="add")
fullyConnectedLayer(1,Name="QValue")
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,obsPath);
criticNetwork = addLayers(criticNetwork,actPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, ...
"obsPathOutLyr","add/in1");
criticNetwork = connectLayers(criticNetwork, ...
"actPathOutLyr","add/in2");
criticNetwork = dlnetwork(criticNetwork);
Initialized: true
Number of learnables: 1.5k
Inputs:
1 'obsInLyr' 3 features
2 'actInLyr' 1 features
critic = rlQValueFunction(criticNetwork, ...
ObservationInputNames="obsInLyr", ...
ActionInputNames="actInLyr");
{rand(obsInfo.Dimension)}, ...
{rand(actInfo.Dimension)})
ans = single
-0.1631
Create the Actor
DDPG agents use a parametrized deterministic policy over continuous action spaces, which is learned by a continuous deterministic actor.
A continuous deterministic actor implements a parametrized deterministic policy for a continuous action space. This actor takes the current observation as input and returns as output an action that is a deterministic function of the observation.
To model the parametrized policy within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by obsInfo) and one output layer (which returns the action to the environment action channel, as specified by actInfo).
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(actInfo.Dimension(1))
actorNetwork = dlnetwork(actorNetwork);
Initialized: true
Number of learnables: 16
Inputs:
1 'input' 3 features
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
getAction(actor,{rand(obsInfo.Dimension)})
ans = 1×1 cell array
{[-0.3408]}
Create the DDPG Agent
agent = rlDDPGAgent(actor,critic);
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 64;
agent.AgentOptions.ExperienceBufferLength = 1e6;
agent.AgentOptions.NoiseOptions.Variance = 0.3;
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
getAction(agent,{rand(obsInfo.Dimension)})
ans = 1×1 cell array
{[-0.7926]}
Train Agent
To train the agent, first specify the training options. For this example, use the following options:
- Run each training for at most 5000 episodes. Specify that each episode lasts for at most ceil(Tf/Ts) (that is 200) time steps.
- Display the training progress in the Episode Manager dialog box (set the Plots option) and disable the command line display (set the Verbose option to false).
- Stop training when the agent receives an average cumulative reward greater than 800 over 20 consecutive episodes. At this point, the agent can control the level of water in the tank.
trainOpts = rlTrainingOptions(...
MaxStepsPerEpisode=ceil(Tf/Ts), ...
ScoreAveragingWindowLength=20, ...
Plots="training-progress",...
StopTrainingCriteria="AverageReward",...
Train the agent using the train function. Training is a computationally intensive process that takes several minutes to complete. To save time while running this example, load a pretrained agent by setting doTraining to false. To train the agent yourself, set doTraining to true. trainingStats = train(agent,env,trainOpts);
Validate Trained Agent
Validate the learned agent against the model by simulation. Since the reset function randomizes the reference values, fix the random generator seed to ensure simulation reproducibility.
Simulate the agent within the environment, and return the experiences as output.
simOpts = rlSimulationOptions(MaxSteps=ceil(Tf/Ts), StopOnError='on');
experiences = sim(env,agent,simOpts);
Dot indexing is not supported for variables of this type.
Error in rl.util.expstruct2timeserstruct (line 7)
observation = {experiences.Observation};
Error in rl.env.AbstractEnv/sim (line 130)
s = rl.util.expstruct2timeserstruct(exp,time,oinfo,ainfo);
Local Reset Function
function in = localResetFcn(in)
blk = sprintf("draft/Desired \nDrone Height");
in = setBlockParameter(in, blk, Value=num2str(h));
Copyright 2019 - 2023 The MathWorks, Inc.