Clear Filters
Clear Filters

ddpg agent does not learn

6 views (last 30 days)
dani ansari
dani ansari on 22 Jul 2023
Answered: awcii on 24 Jul 2023
hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range
but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.
and the construct of my architectures is:
statepath = [featureInputLayer(numObs , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numAct,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',0.001,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption

Answers (2)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath on 23 Jul 2023
Check this
% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
featureInputLayer(numObs, 'Name', 'observation')
fullyConnectedLayer(72, 'Name', 'actorFC1')
reluLayer
fullyConnectedLayer(48, 'Name', 'actorFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'actorFC3')
reluLayer
fullyConnectedLayer(numAct, 'Name', 'output')
tanhLayer
scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
featureInputLayer(numObs, 'Name', 'stateinp')
fullyConnectedLayer(96, 'Name', 'stateFC1')
reluLayer
fullyConnectedLayer(74, 'Name', 'stateFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
featureInputLayer(numAct, 'Name', 'actinp')
fullyConnectedLayer(72, 'Name', 'actFC1')
reluLayer
fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(96, 'Name', 'FC1')
reluLayer
fullyConnectedLayer(72, 'Name', 'FC2')
reluLayer
fullyConnectedLayer(24, 'Name', 'FC3')
reluLayer
fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime', 0.001,...
'Actor', actor,...
'Critic', critic,...
'ExperienceBufferLength', 1e6,...
'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 1000,...
'MaxStepsPerEpisode', 1000,...
'ScoreAveragingWindowLength', 5,...
'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);

awcii
awcii on 24 Jul 2023
.

Categories

Find more on Policies and Value Functions in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!