- With warm-up experiences, the agent might be exploring the state and action space more efficiently.
- The learning rates for your critic and actor networks are set to allow for small updates. With a good initial experience buffer, the updates may be more stable and require fewer adjustments, leading to faster convergence and less time spent on each gradient update step.
- You mentioned that 'agentOptions.ResetExperienceBufferBeforeTraining' is set to 'false'. If the buffer is not reset, the agent with warm-up starts with a full buffer of experiences, which could lead to more efficient sampling and less time waiting for the buffer to fill up.
DDPG Agent (used to set a temperature) 41% faster training time per Episode with Warm-up than without. Why?
7 views (last 30 days)
Show older comments
Hi,
So I noticed something while training my DDPG Agent.
I use a DDPG Agent to set a temperature for a heating system depending on the weather forecast and other temperatures such as the outside temperature.
First I trained an Agent without any warm-up and then I trained another new Agent with a warm-up of 700 episodes. It did what I had hoped, converging faster and finding a much better strategy than without the warm-up. I also noticed that the training time was much faster. I have calculated that it takes 41% less time to train an episode than the training time for one episode without a warm-up.
Don't get me wrong, I really appreciate this, but I am trying to understand why.
I have not changed any of the agent options, just the warm-up.
If the agent is supposed to win a game as quickly as possible, I would understand that because of the experience in the warm-up, the agent would find a better strategy faster to win the game faster, so it would take less time per episode to win the game, but in my case the agent should just set a temperature. There is no faster way to set a temperature.
Am I missing an important point?
I mean, in every training step and every episode the process is more or less the same. Set an action, get a reward, update the networks, update the policy and so on. Where in those steps could the 41% time improvement be?
Just to be clear, I understand why it converges faster, I just don't understand why the training time per episode is so much faster. Without a warm-up, the average training time per episode was 28.1 seconds. With a warm-up it was 16.5 seconds.
These are my agent options, which I used for both agents:
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 128;
agent.AgentOptions.ExperienceBufferLength = 1e6;
agent.AgentOptions.NoiseOptions.Variance = 0.5;
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-6;
agentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
I also use the Reinforcement Learning Toolbox and normalised all my variables in both cases.
In general, everything works fine, but it drives me crazy that I can't understand why it's so much faster.
Maybe someone has an idea.
0 Comments
Accepted Answer
Venu
on 13 Jan 2024
Based on info u have provided, I can infer the following points:
More Answers (0)
See Also
Categories
Find more on Power and Energy Systems in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!