setup
Set up reinforcement learning environment or initialize data logger object
Since R2022a
Description
When you define a custom training loop for reinforcement learning, you can
simulate an agent or policy against an environment using the runEpisode
function. Use the setup function to configure the environment for running
simulations using multiple calls to runEpisode.
Also use setup to initialize a FileLogger or
MonitorLogger object before logging data within a custom training
loop.
Environment Objects
setup( sets up the specified
reinforcement learning environment for running multiple simulations using
env)runEpisode.
setup(
specifies nondefault configuration options using one or more name-value pair
arguments.env,Name=Value)
Examples
Create a reinforcement learning environment and extract its observation and action specifications.
env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);To approximate the Q-value function within the critic, use a neural network. Create a network as an array of layer objects.
net = [...
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(24)
reluLayer
fullyConnectedLayer(24)
reluLayer
fullyConnectedLayer(2)
softmaxLayer];Convert the network to a dlnetwork object and display the number of learnable parameters (weights).
net = dlnetwork(net); summary(net)
Initialized: true
Number of learnables: 770
Inputs:
1 'input' 4 features
Create a discrete categorical actor using the network.
actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);
Check your actor with a random observation.
act = getAction(actor,{rand(obsInfo.Dimension)})act = 1×1 cell array
{[-10]}
Create a policy object from the actor.
policy = rlStochasticActorPolicy(actor);
Create an experience buffer.
buffer = rlReplayMemory(obsInfo,actInfo);
Set up the environment for running multiple simulations. For this example, configure the training to log any errors rather than send them to the command window.
setup(env,StopOnError="off")Simulate multiple episodes using the environment and policy. After each episode, append the experiences to the buffer. For this example, run 100 episodes.
for i = 1:100 output = runEpisode(env,policy,MaxSteps=300); append(buffer,output.AgentData.Experiences) end
Clean up the environment.
cleanup(env)
Sample a mini-batch of experiences from the buffer. For this example, sample 10 experiences.
batch = sample(buffer,10);
You can then learn from the sampled experiences and update the policy and actor.
This example shows how to log data to disk when training an agent using a custom training loop.
Create a FileLogger object using rlDataLogger.
flgr = rlDataLogger();
Set up the logger object. This operation initializes the object performing setup tasks such as, for example, creating the directory to save the data files.
setup(flgr);
Within a custom training loop, you can now store data to the logger object memory and write data to file.
For this example, store random numbers to the file logger object, grouping them in the variables Context1 and Context2. When you issue a write command, a MAT file corresponding to an iteration and containing both variables is saved with the name specified in flgr.LoggingOptions.FileNameRule, in the folder specified by flgr.LoggingOptions.LoggingDirectory.
for iter = 1:10 % Store three random numbers in memory % as elements of the variable "Context1" for ct = 1:3 store(flgr, "Context1", rand, iter); end % Store a random number in memory % as the variable "Context2" store(flgr, "Context2", rand, iter); % Write data to file every 4 iterations if mod(iter,4)==0 write(flgr); end end
Clean up the logger object. This operation performs clean up tasks like for example, writing to file any data still in memory.
cleanup(flgr);
Input Arguments
Environment, specified as follows:
MATLAB® environment, represented by one of the following objects.
Predefined environment created using
rlPredefinedEnv.rlMDPEnv— Markov decision process environment.rlFunctionEnv— Environment defined using custom functions.rlMultiAgentFunctionEnv— Multiagent environment in which all agents execute in the same step.rlTurnBasedFunctionEnv— Turn-based multiagent environment in which agents execute in turns.Custom environment created from a template, using
rlCreateEnvTemplate.rlNeuralNetworkEnvironment— Environment with neural network transition models.
Among the MATLAB environments, only
rlMultiAgentFunctionEnvandrlTurnBasedFunctionEnvsupport training more agents at the same time.Simulink® environment, represented by a
SimulinkEnvWithAgentobject, and created using:rlSimulinkEnv— This environment is created from a model already containing one or more agents block, and supports training multiple agents at the same time.createIntegratedEnv— This environment is created from a model that does not already contain an agent block, and does not supports training multiple agents at the same time.
A Simulink-based environment object acts as an interface so that the reinforcement learning simulation or training function calls the (compiled) Simulink model to generate experiences for the agents. Such an environment does not support using the
resetandstepfunctions.
Note
env is a handle object, so a function that does not return it
as output argument, such as train,
can still update its internal states. For more information about handle objects, see
Handle Object Behavior.
For more information on reinforcement learning environments, see Reinforcement Learning Environments and Create Custom Simulink Environments.
Example: env = rlPredefinedEnv("DoubleIntegrator-Continuous")
creates a predefined environment that implements a continuous-action double-integrator
system and assigns it to the variable env.
Data logger object, specified either as a FileLogger or
as a MonitorLogger
object.
For more information on reinforcement logger objects, see rlDataLogger.
Example: flgr = rlDataLogger() creates the
FileLogger object flgr.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: StopOnError="on"
Option to stop an episode when an error occurs, specified as one of the following:
"on"— Stop the episode when an error occurs and generate an error message in the MATLAB command window."off"— Log errors in theSimulationInfooutput ofrunEpisode.
Example: StopOnError="off"
Option for using parallel simulations, specified as a logical
value. Using parallel computing allows the usage of multiple cores, processors,
computer clusters, or cloud resources to speed up simulation.
When you set UseParallel to true, the
output of a subsequent call to runEpisode is an
rl.env.Future object, which supports deferred evaluation of the
simulation.
Example: UseParallel=true
Function to run on the each worker before running an episode, specified as a handle to a function with no input arguments. Use this function to perform any preprocessing required before running an episode.
Example: SetupFcn=@mySetupFcn
Function to run on each worker when cleaning up the environment, specified as a
handle to a function with no input arguments. Use this function to clean up the
workspace or perform other processing after calling
runEpisode.
Example: SetupFcn=@myCleanupFcn
Option to send model and workspace variables to parallel workers, specified as
"on" or "off". When the option is
"on", the client sends variables used in models and defined in
the base MATLAB workspace to the workers.
Example: TransferBaseWorkspaceVariables="off"
Additional files to attach to the parallel pool before running an episode, specified as a string or string array.
Example: AttachedFiles={"myFile1","myFile2"}
Worker random seeds, specified as one of the following:
-1— Set the random seed of each worker to the worker ID.Vector with length equal to the number of workers — Specify the random seed for each worker.
Example: [1:4]
Version History
Introduced in R2022a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)