How to continue training a DQN agent in the reinforcement learning toolbox?

12 views (last 30 days)
I have created a neural network and DQN agent using the MATLAB reinforcement learning toolbox, using the following code
createEnvironment
createDQNetwork % Produces critic, criticOptions & GPU
createDQNOptions % Produces agentOptions
createDQNTrainingOptions % Produces trainOptions & parrallel processing
agent = rlDQNAgent(critic,agentOptions); % Create the agent
validateEnvironment(env)
After this, I begin training the agent using the following code.
trainingResults = train(agent,env,trainOptions);
curDir = pwd;
saveDir = 'savedAgents';
cd(saveDir)
save(['trainedAgent' datestr(now,'mm_DD_YYYY_HHMM')],'agent','-v7.3');
% save(['trainedAgent' datestr(now,'mm_DD_YYYY_HHMM')],'agent','trainingResults','-v7.3');
cd(curDir)
The agent begins training succesfully and I can observe it is learning how to control the system. Due to system memory constraints, I need to run the training process multiple times. When the first training process is finished, I simply run the following command again:
trainingResults = train(agent,env,trainOptions);
as I don't need to create a brand new agent, network, environment etc. from scratch. However, the behaviour of the agent when training begins the second time has obviously reverted back to what is was when it was first created. How can I begin retraining the agent, while keeping the progress from the previous training session?
Edit: My system has 64GB of RAM, getting more isn't really an option....

Accepted Answer

Emmanouil Tzorakoleftherakis
Hi James,
It looks like the experience buffer is the culprit here. Have a look at this question for a suggestion. Pretty much you need to make sure you also save the experience buffer when you stop training. I would also recommend reducing the size of the experience buffer just enough to reduce memory utilization and make it feasible to train in one go.
  2 Comments
James Norris
James Norris on 31 Jan 2020
Hi Emmanouil,
Thanks for the response, this has helped a lot. In addition for future people with the problem, the exploration factor randomises movement during the initial episodes of training, which can cause regression to bad habits. The exploration decay should be adjusted over the whole training set, not reset each time.
Maedeh
Maedeh on 6 Feb 2020
Hi,
I have created a DDPGA agent using the MATLAB reinforcement learning toolbox inverted pendel.
How can I save the experience to analys it'?

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!