Main Content

Create Custom Simulink Environments

To create a custom Simulink® environment, first create a Simulink environment model that represents the world as seen from the agent. Such a system is often referred to as plant or open loop system, while the whole (integrated) system that includes both agent and environment is often referred to as the closed loop system.

Your environment model must have an input signal, the action, which influences (through some discrete, continuous or mixed dynamics) its next internal state and its outputs, which are the observation, the reward and the is-done signals. The is-done signal is a scalar that indicates the termination of an episode, causing the simulation to stop when its value is true.

Note

A reinforcement learning environment is normally assumed to be strictly causal from the current action to the current observation. That is, it is assumed that the current observation does not depend on the current action (while the next state generally does). In other words, there must be no direct feedthrough between the current action and the current observation.

Note

The reward signal at time t must be the one corresponding to the transition between the observation output at time t-1 and the observation output at time t.

If your observation contains multiple channels, group the signals carried by the channels into a single observation bus. Similarly, for an hybrid environment, your action must be a two-element bus containing both the discrete (first) and the continuous (second) action channel. For more information about bus signals, see Simulink Bus Capabilities (Simulink).

For critical considerations on defining reward and observation signals in custom environments, see Define Reward and Observation Signals in Custom Environments.

Once you have created the Simulink model that represents the environment, you must add the RL Agent block to it. You can do so automatically or manually.

  • To automatically create a new closed-loop Simulink model that contains an RL Agent block and references your environment model from its Environment block, use createIntegratedEnv, specifying the names of both your existing environment model and of the new, to be created, closed-loop model that contains the agent.

    You can specify as input arguments the names of the action, observation, is-done, and reward ports in your environment model. If your action or observation space is finite, you can also specify its possible values (otherwise the signals are assumed to be continuous).

    This function returns an environment object as well as the block path of the agent and the environment observation and action specifications. For more information on model referencing, see Model Reference Basics (Simulink).

  • To manually add the agent to your model, drag and drop the RL Agent block from the Reinforcement Learning Simulink library. Connect the action, observation, reward and is-done signals to the appropriate output and input ports of the block.

    Unless you already have an agent object for this environment in the MATLAB® workspace, you must create specification objects for the action and observation signals using rlNumericSpec (for continuous signals) or rlFiniteSetSpec (for discrete signals). For bus signals, create specifications using bus2RLSpec.

    Once you connect the blocks, create an environment object using rlSimulinkEnv, specifying the model filename, the block path to the RL Agent within the model, and the specification objects for the observation and the action channels, respectively. If your agent block already references an agent object in the MATLAB workspace, you do not need to supply the specification objects as input arguments.

    For an example, see Water Tank Reinforcement Learning Environment Model.

Both rlSimulinkEnv and createIntegratedEnv return a custom Simulink environment as a SimulinkEnvWithAgent object. This environment object acts as an interface so that when you call sim or train, these functions in turn call the (compiled) Simulink model associated with the object to generate experiences for the agents. You can use this object to train and simulate agents in the same way as with any other environment.

Note

Before training or simulating an agent within a Simulink environment, to make sure that the RL Agent block runs at the intended sample time, set the SampleTime property of your agent object appropriately.

You can also create a multiagent Simulink environment. To do so, create a Simulink model that has one action input and one set of outputs (observation, reward and is-done) for every agent. Then manually add an agent block for each agent. Once you connect the blocks, create an environment object using rlSimulinkEnv. Unless each agent block already references an agent object in the MATLAB workspace, you must supply to rlSimulinkEnv two cell arrays containing the observation action specification objects, respectively, as input arguments. For an example, see Train Multiple Agents to Perform Collaborative Task.

Your environment can also include third-party functionality. For more information, see Integrate Components from External Tools (Simulink).

Algebraic Loops Between Environment and Agent

To avoid (potentially unsolvable) algebraic loops, you must avoid any direct feedthrough (that is any direct dependency in the same time step) from the action to the observation output signal. This is because in the Simulink implementation of the agent block, the action at a given time step depends on the observation at the same time step. In other words, the agent block has a direct feedthrough from its observation input to its action output (similarly to an output feedback controller).

Avoiding a direct feedthrough from the action to the observation output signal is also in line with the fact that the standard formulation of a reinforcement learning environment as a Markov Decision Process is strictly causal from the current action to the current observation, since the current state does not depend on the current action (while the next state generally does).

However, note that for models created using createIntegratedEnv the environment block is a referenced subsystem. Referenced subsystems are normally treated as a direct feedthrough block (including the path from action to observation) unless the Minimize algebraic loop occurrences parameter in the referenced subsystem is enabled. When the referenced model has no direct feedthrough from an input port that participates in an artificial algebraic loop to any of its outputs ports, enabling this parameter can remove artificial algebraic loops involving the model.

In general, adding a Delay (Simulink) or Memory (Simulink) block to the action signal between the agent block and environment block removes the algebraic loop. When you add an action delay, make sure that your reset function, which is called at the beginning of each training or simulation episode, initializes the delay to a feasible value.

Alternatively you can add delay blocks to all the environment output signals after the environment block. If you do so, make sure that your reset function initializes the delay to a feasible value which is also consistent with the initial state of the environment.

Note

In general, adding delays to solve algebraic loops should be done with extreme care, as it involves a modification of the loop dynamics.

If you have separate state and output functions (instead of a single step function), you can call them using separate MATLAB Function (Simulink) blocks, using a delay to represent the environment state. If you do so, your reset function only needs to initialize the state.

For more information on algebraic loops and how to remove some of them, see Algebraic Loop Concepts (Simulink) and Remove Algebraic Loops (Simulink). For a related example about using delays in a reinforcement learning loop implemented in Simulink, see Create and Simulate Same Environment in Both MATLAB and Simulink.

See Also

Functions

Objects

Related Examples

More About

Go to top of page