Design and Train Agent Using Reinforcement Learning Designer
This example shows how to design and train a DQN agent for an environment with a discrete action space using Reinforcement Learning Designer.
Open the Reinforcement Learning Designer App
Open the Reinforcement Learning Designer app.
reinforcementLearningDesigner
Initially, no agents or environments are loaded in the app.
Import Cart-Pole Environment
When using the Reinforcement Learning Designer, you can import an environment from the MATLAB® workspace or create a predefined environment. For more information, see Create MATLAB Environments for Reinforcement Learning Designer and Create Simulink Environments for Reinforcement Learning Designer.
For this example, use the predefined discrete cart-pole MATLAB environment. To import this environment, on the Reinforcement Learning tab, in the Environments section, select New > Discrete Cart-Pole.
In the Environments pane, the app adds the imported
Discrete CartPole
environment. To rename the environment, click the
environment text. You can also import multiple environments in the session.
To view the dimensions of the observation and action space, click the environment text. The app shows the dimensions in the Preview pane.
This environment has a continuous four-dimensional observation space (the positions and velocities of both the cart and pole) and a discrete one-dimensional action space consisting of two possible forces, –10N or 10N. This environment is used in the Train DQN Agent to Balance Cart-Pole System example. For more information on predefined control system environments, see Load Predefined Control System Environments.
Create DQN Agent for Imported Environment
To create an agent, on the Reinforcement Learning tab, in the Agent section, click New. In the Create agent dialog box, specify the agent name, the environment, and the training algorithm. The default agent configuration uses the imported environment and the DQN algorithm. For this example, change the number of hidden units from 256 to 24. For more information on creating agents, see Create Agents Using Reinforcement Learning Designer.
Click OK.
The app adds the new agent to the Agents pane and opens a corresponding agent1 document.
For a brief summary of DQN agent features and to view the observation and action specifications for the agent, click Overview.
When you create a DQN agent in Reinforcement Learning Designer, the agent uses a default deep neural network structure for its critic. To view the critic network, on the DQN Agent tab, click View Critic Model.
The Deep Learning Network Analyzer opens and displays the critic structure.
Close the Deep Learning Network Analyzer.
Train Agent
To train your agent, on the Train tab, first specify options for training the agent. For information on specifying training options, see Specify Simulation Options in Reinforcement Learning Designer.
For this example, specify the maximum number of training episodes by setting
Max Episodes to 1000
. For the other training
options, use their default values. The default criteria for stopping is when the average
number of steps per episode (over the last 5
episodes) is greater than
500
.
To start training, click Train.
During training, the app opens the Training Session tab and displays the training progress in the Training Results document.
Here, the training stops when the average number of steps per episode is 500. Clear the Show Episode Q0 option to visualize better the episode and average rewards.
To accept the training results, on the Training Session tab,
click Accept. In the Agents pane, the app adds
the trained agent, agent1_Trained
.
Simulate Agent and Inspect Simulation Results
To simulate the trained agent, on the Simulate tab, first select
agent1_Trained
in the Agent drop-down list, then
configure the simulation options. For this example, use the default number of episodes
(10
) and maximum episode length (500
). For more
information on specifying simulation options, see Specify Training Options in Reinforcement Learning Designer.
To simulate the agent, click Simulate.
The app opens the Simulation Session tab. After the simulation is completed, the Simulation Results document shows the reward for each episode as well as the reward mean and standard deviation.
To analyze the simulation results, click Inspect Simulation Data.
In the Simulation Data Inspector you can view the saved signals for each simulation episode. For more information, see Simulation Data Inspector (Simulink).
The following image shows the first and third states of the cart-pole system (cart
position and pole angle) for the sixth simulation episode. The agent is able to
successfully balance the pole for 500 steps, even though the cart position undergoes
moderate swings. You can modify some DQN agent options such as
BatchSize
and TargetUpdateFrequency
to promote
faster and more robust learning. For more information, see Train DQN Agent to Balance Cart-Pole System.
Close the Simulation Data Inspector.
To accept the simulation results, on the Simulation Session tab, click Accept.
In the Results pane, the app adds the simulation results
structure, experience1
.
Export Agent and Save Session
To export the trained agent to the MATLAB workspace for additional simulation, on the Reinforcement Learning tab, under Export, select the trained agent.
To save the app session, on the Reinforcement Learning tab, click Save Session. In the future, to resume your work where you left off, you can open the session in Reinforcement Learning Designer.
Simulate Agent at the Command Line
To simulate the agent at the MATLAB command line, first load the cart-pole environment.
env = rlPredefinedEnv("CartPole-Discrete");
The cart-pole environment has an environment visualizer that allows you to see how the system behaves during simulation and training.
Plot the environment and perform a simulation using the trained agent that you previously exported from the app.
plot(env) xpr2 = sim(env,agent1_Trained);
During the simulation, the visualizer shows the movement of the cart and pole. The trained agent is able to stabilize the system.
Finally, display the cumulative reward for the simulation.
sum(xpr2.Reward)
env = 500
As expected, the reward is to 500.
See Also
Reinforcement Learning
Designer | analyzeNetwork