Main Content

cleanup

Clean up reinforcement learning environment or data logger object

Since R2022a

    Description

    When you define a custom training loop for reinforcement learning, you can simulate an agent or policy against an environment using the runEpisode function. Use the cleanup function to clean up the environment after running simulations using multiple calls to runEpisode. To clean up the environment after each simulation, you can configure runEpisode to automatically call the cleanup function at the end of each episode.

    Also use cleanup to perform clean up tasks for a FileLogger or MonitorLogger object after logging data within a custom training loop.

    Environment Objects

    cleanup(env) cleans up the specified reinforcement learning environment after running multiple simulations using runEpisode.

    example

    Data Logger Objects

    cleanup(lgr) cleans up the specified data logger object after logging data within a custom training loop. This task might involve for example, transferring any remaining data from lgr internal memory to a logging target (either Log Data to Disk in a Custom Training Loop a MAT file or a trainingProgressMonitor object).

    Examples

    collapse all

    Create a reinforcement learning environment and extract its observation and action specifications.

    env = rlPredefinedEnv("CartPole-Discrete");
    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    To approximate the Q-value function within the critic, use a neural network. Create a network as an array of layer objects.

    net = [...
        featureInputLayer(obsInfo.Dimension(1))
        fullyConnectedLayer(24)
        reluLayer
        fullyConnectedLayer(24)
        reluLayer
        fullyConnectedLayer(2)
        softmaxLayer];

    Convert the network to a dlnetwork object and display the number of learnable parameters (weights).

    net = dlnetwork(net);
    summary(net)
       Initialized: true
    
       Number of learnables: 770
    
       Inputs:
          1   'input'   4 features
    

    Create a discrete categorical actor using the network.

    actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);

    Check your actor with a random observation.

    act = getAction(actor,{rand(obsInfo.Dimension)})
    act = 1×1 cell array
        {[-10]}
    
    

    Create a policy object from the actor.

    policy = rlStochasticActorPolicy(actor);

    Create an experience buffer.

    buffer = rlReplayMemory(obsInfo,actInfo);

    Set up the environment for running multiple simulations. For this example, configure the training to log any errors rather than send them to the command window.

    setup(env,StopOnError="off")

    Simulate multiple episodes using the environment and policy. After each episode, append the experiences to the buffer. For this example, run 100 episodes.

    for i = 1:100
        output = runEpisode(env,policy,MaxSteps=300);
        append(buffer,output.AgentData.Experiences)
    end

    Clean up the environment.

    cleanup(env)

    Sample a mini-batch of experiences from the buffer. For this example, sample 10 experiences.

    batch = sample(buffer,10);

    You can then learn from the sampled experiences and update the policy and actor.

    This example shows how to log data to disk when training an agent using a custom training loop.

    Create a FileLogger object using rlDataLogger.

    flgr = rlDataLogger();

    Set up the logger object. This operation initializes the object performing setup tasks such as, for example, creating the directory to save the data files.

    setup(flgr);

    Within a custom training loop, you can now store data to the logger object memory and write data to file.

    For this example, store random numbers to the file logger object, grouping them in the variables Context1 and Context2. When you issue a write command, a MAT file corresponding to an iteration and containing both variables is saved with the name specified in flgr.LoggingOptions.FileNameRule, in the folder specified by flgr.LoggingOptions.LoggingDirectory.

    for iter = 1:10
    
        % Store three random numbers in memory 
        % as elements of the variable "Context1"
        for ct = 1:3
            store(flgr, "Context1", rand, iter);
        end
    
        % Store a random number in memory 
        % as the variable "Context2"
        store(flgr, "Context2", rand, iter);
    
        % Write data to file every 4 iterations
        if mod(iter,4)==0
            write(flgr);
        end
    
    end

    Clean up the logger object. This operation performs clean up tasks like for example, writing to file any data still in memory.

    cleanup(flgr);

    Input Arguments

    collapse all

    Environment, specified as follows:

    • MATLAB® environment, represented by one of the following objects.

      Among the MATLAB environments, only rlMultiAgentFunctionEnv and rlTurnBasedFunctionEnv support training more agents at the same time.

    • Simulink® environment, represented by a SimulinkEnvWithAgent object, and created using:

      • rlSimulinkEnv — This environment is created from a model already containing one or more agents block, and supports training multiple agents at the same time.

      • createIntegratedEnv — This environment is created from a model that does not already contain an agent block, and does not supports training multiple agents at the same time.

      A Simulink-based environment object acts as an interface so that the reinforcement learning simulation or training function calls the (compiled) Simulink model to generate experiences for the agents. Such an environment does not support using the reset and step functions.

    Note

    env is a handle object, so a function that does not return it as output argument, such as train, can still update its internal states. For more information about handle objects, see Handle Object Behavior.

    For more information on reinforcement learning environments, see Reinforcement Learning Environments and Create Custom Simulink Environments.

    Example: env = rlPredefinedEnv("DoubleIntegrator-Continuous") creates a predefined environment that implements a continuous-action double-integrator system and assigns it to the variable env.

    Data logger object, specified either as a FileLogger or as a MonitorLogger object.

    For more information on reinforcement logger objects, see rlDataLogger.

    Example: flgr = rlDataLogger() creates the FileLogger object flgr.

    Version History

    Introduced in R2022a