Main Content

rlDeterministicActorPolicy

Policy object to generate continuous deterministic actions for custom training loops and application deployment

    Description

    This object implements a deterministic policy, which returns continuous deterministic actions given an input observation. You can create an rlDeterministicActorPolicy object from an rlContinuousDeterministicActor or extract it from an rlDDPGAgent or rlTD3Agent. You can then train the policy object using a custom training loop or deploy it for your application using generatePolicyBlock or generatePolicyFunction. This policy is always deterministic and does not perform any exploration. For more information on policies and value functions, see Create Policies and Value Functions.

    Creation

    Description

    example

    policy = rlDeterministicActorPolicy(actor) creates the deterministic actor policy object policy from the continuous deterministic actor actor. It also sets the Actor property of policy to the input argument actor.

    Properties

    expand all

    Continuous deterministic actor, specified as an rlContinuousDeterministicActor object.

    Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation signals.

    Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name. Note that the name of the action channel specified in actionInfo (if any) is not used.

    Note

    Only one action channel is allowed.

    Sample time of the policy, specified as a positive scalar or as -1 (default). Setting this parameter to -1 allows for event-based simulations.

    Within a Simulink® environment, the RL Agent block in which the policy is specified executes every SampleTime seconds of simulation time. If SampleTime is -1, the block inherits the sample time from its parent subsystem.

    Within a MATLAB® environment, the policy is executed every time the environment advances. In this case, SampleTime is the time interval between consecutive elements in the output experience. If SampleTime is -1, the sample time is treated as being equal to 1.

    Example: 0.2

    Object Functions

    generatePolicyBlockGenerate Simulink block that evaluates policy of an agent or policy object
    generatePolicyFunctionGenerate function that evaluates policy of an agent or policy object
    getActionObtain action from agent, actor, or policy object given environment observations
    getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
    resetReset environment, agent, experience buffer, or policy object
    setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object

    Examples

    collapse all

    Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlNumericSpec([2 1]);

    Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment

    Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

    To approximate the policy function within the actor, use a deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects.

    layers = [ 
        featureInputLayer(obsInfo.Dimension(1))
        fullyConnectedLayer(16)
        reluLayer
        fullyConnectedLayer(actInfo.Dimension(1)) 
        ];

    Convert the network to a dlnetwork object and display the number of weights.

    model = dlnetwork(layers);
    summary(model)
       Initialized: true
    
       Number of learnables: 114
    
       Inputs:
          1   'input'   4 features
    

    Create the actor using model, and the observation and action specifications.

    actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
    actor = 
      rlContinuousDeterministicActor with properties:
    
        ObservationInfo: [1x1 rl.util.rlNumericSpec]
             ActionInfo: [1x1 rl.util.rlNumericSpec]
              UseDevice: "cpu"
    
    

    Check the actor with a random observation input.

    act = getAction(actor,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2x1 single column vector
    
        0.4013
        0.0578
    
    

    Create a policy object from actor.

    policy = rlDeterministicActorPolicy(actor)
    policy = 
      rlDeterministicActorPolicy with properties:
    
                  Actor: [1x1 rl.function.rlContinuousDeterministicActor]
        ObservationInfo: [1x1 rl.util.rlNumericSpec]
             ActionInfo: [1x1 rl.util.rlNumericSpec]
             SampleTime: -1
    
    

    Check the policy with a random observation input.

    act = getAction(policy,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2×1
    
        0.4313
       -0.3002
    
    

    You can now train the policy with a custom training loop and then deploy it to your application.

    Version History

    Introduced in R2022a