Main Content

Create Custom Grid World Environments

A grid world is a two-dimensional, cell-based environment where the agent starts from one cell and moves toward the terminal cell while collecting as much reward as possible. Grid world environments are useful for applying reinforcement learning algorithms to discover optimal paths and policies for agents on the grid to arrive at the terminal goal in the fewest moves.

Basic five-by-five grid world with agent (indicated by a red circle) positioned on the top left corner, terminal location (indicated by a light blue square) in the bottom right corner, and four obstacle squares, in black, in the middle.

Reinforcement Learning Toolbox™ lets you create custom MATLAB® grid world environments for your own applications. To create a custom grid world environment:

  1. Create the grid world model.

  2. Configure the grid world model.

  3. Use the grid world model to create your own grid world environment.

Grid World Models

You can create your own grid world model using the createGridWorld function. Specify the grid size when creating the GridWorld object.

The GridWorld object has the following properties.

PropertyRead-OnlyDescription
GridSizeYes

Dimensions of the grid world, displayed as an m-by-n array. Here, m represents the number of grid rows and n is the number of grid columns.

CurrentStateNo

Name of the current state of the agent, specified as a string. You can use this property to set the initial state of the agent. The agent always starts from cell [1,1] by default.

The agent starts from the CurrentState once you use the reset function in the rlMDPEnv environment object.

StatesYes

A string vector containing the state names of the grid world. For instance, for a 2-by-2 grid world model GW, specify the following:

GW.States = ["[1,1]";
             "[2,1]";
             "[1,2]";
             "[2,2]"];
ActionsYes

A string vector containing the list of possible actions that the agent can use. You can set the actions when you create the grid world model by using the moves argument:

GW = createGridWorld(m,n,moves)

Specify moves as either 'Standard' or 'Kings'.

movesGw.Actions
'Standard'['N';'S';'E';'W']
'Kings'['N';'S';'E';'W';'NE';'NW';'SE';'SW']
TNo

State transition matrix, specified as a 3-D array. T is a probability matrix that indicates the likelihood of the agent moving from the current state s to any possible next state s' by performing action a.

T can be denoted as

T(s,s',a) = probability(s'|s,a).

For instance, consider a 5-by-5 deterministic grid world object GW with the agent in cell [3,1]. View the state transition matrix for the north direction.

northStateTransition = GW.T(:,:,1)

Basic five-by-five grid world showing the agent position that moves north.

From the above figure, the value of northStateTransition(3,2) is 1 since the agent moves from cell [3,1] to cell [2,1] with action 'N'. A probability of 1 indicates that from a given state, if the agent goes north, it has a 100% chance of moving one cell north on the grid. For an example showing how to set up the state transition matrix, see Train Reinforcement Learning Agent in Basic Grid World.

RNo

Reward transition matrix, specified as a 3-D array. R determines how much reward the agent receives after performing an action in the environment. R has the same shape and size as the state transition matrix T.

The reward transition matrix R can be denoted as

r = R(s,s',a).

Set up R such that there is a reward to the agent after every action. For instance, you can set up a positive reward if the agent transitions over obstacle states and when it reaches the terminal state. You can also set up a default reward of -11 for all actions the agent takes, independent of the current state and next state. For an example that show how to set up the reward transition matrix, see Train Reinforcement Learning Agent in Basic Grid World.

ObstacleStatesNo

ObstacleStates are states that cannot be reached in the grid world, specified as a string vector. Consider the following 5-by-5 grid world model GW.

Basic five-by-five grid world with an arrow pointing to the terminal state location (indicated by a light blue square) in the bottom right corner.

The black cells are obstacle states, and you can specify them using the following syntax:

GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

For a workflow example, see Train Reinforcement Learning Agent in Basic Grid World.

TerminalStatesNo

TerminalStates are the final states in the grid world, specified as a string vector. Consider the previous 5-by-5 grid world model GW. The blue cell is the terminal state and you can specify it by:

GW.TerminalStates = "[5,5]";

For a workflow example, see Train Reinforcement Learning Agent in Basic Grid World.

Grid World Environments

You must create a Markov decision process (MDP) environment using rlMDPEnv from the grid world model from the previous step. MDP is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. The agent uses the grid world environment object rlMDPEnv to interact with the grid world model object GridWorld.

For more information, see rlMDPEnv and Train Reinforcement Learning Agent in Basic Grid World.

See Also

Functions

Objects

Related Topics