Main Content


SARSA reinforcement learning agent


The SARSA algorithm is a model-free, online, on-policy reinforcement learning method. A SARSA agent is a value-based reinforcement learning agent which trains a critic to estimate the return or future rewards.

For more information on SARSA agents, see SARSA Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.




agent = rlSARSAAgent(critic,agentOptions) creates a SARSA agent with the specified critic network and sets the AgentOptions property.

Input Arguments

expand all

Critic, specified as an rlQValueFunction object. For more information on creating critics, see Create Policies and Value Functions.


expand all

Agent options, specified as an rlSARSAAgentOptions object.

Option to use exploration policy when selecting actions, specified as a one of the following logical values.

  • false — Use the agent greedy policy when selecting actions.

  • true — Use the agent exploration policy when selecting actions.

This property is read-only.

Observation specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type, and name of the observation signal.

The value of ObservationInfo matches the corresponding value specified in critic.

This property is read-only.

Action specification, specified as an rlFiniteSetSpec object.

The value of ActionInfo matches the corresponding value specified in critic.

Sample time of agent, specified as a positive scalar or as -1. Setting this parameter to -1 allows for event-based simulations. The initial value of SampleTime matches the value specified in AgentOptions.

Within a Simulink® environment, the RL Agent block in which the agent is specified to execute every SampleTime seconds of simulation time. If SampleTime is -1, the block inherits the sample time from its parent subsystem.

Within a MATLAB® environment, the agent is executed every time the environment advances. In this case, SampleTime is the time interval between consecutive elements in the output experience returned by sim or train. If SampleTime is -1, the time interval between consecutive elements in the returned output experience reflects the timing of the event that triggers the agent execution.

Object Functions

trainTrain reinforcement learning agents within a specified environment
simSimulate trained reinforcement learning agents within specified environment
getActionObtain action from agent, actor, or policy object given environment observations
getActorGet actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticGet critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
generatePolicyFunctionGenerate function that evaluates policy of an agent or policy object


collapse all

Create or load an environment interface. For this example load the Basic Grid World environment interface also used in the example Train Reinforcement Learning Agent in Basic Grid World.

env = rlPredefinedEnv("BasicGridWorld");

Get observation and action specifications.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a table approximation model derived from the environment observation and action specifications.

qTable = rlTable(obsInfo,actInfo);

Create the critic using qTable. SARSA agents use an rlValueFunction object to implement the critic.

critic = rlQValueFunction(qTable,obsInfo,actInfo);

Create a SARSA agent using the specified critic and an epsilon value of 0.05.

opt = rlSARSAAgentOptions;
opt.EpsilonGreedyExploration.Epsilon = 0.05;

agent = rlSARSAAgent(critic,opt)
agent = 
  rlSARSAAgent with properties:

            AgentOptions: [1x1 rl.option.rlSARSAAgentOptions]
    UseExplorationPolicy: 0
         ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
              ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
              SampleTime: 1

To check your agent, use getAction to return the action from a random observation.

act = getAction(agent,{randi(numel(obsInfo.Elements))});
ans = 1

You can now test and train the agent against the environment.

Version History

Introduced in R2019a