PPOAgentの実装について
1 view (last 30 days)
Show older comments
現在、simlinkの自作環境をPPOAgentを用いて制御しようとしています。
しかし、下のようなエラーが発生し上手くいかない状況が続いています。
どのように改善すればよろしいでしょうか。
エラー: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
エラー: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...
自分のコード
clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ...
'Weights',weights.criticFC1, ...
'Bias',bias.criticFC1)
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
'Weights',weights.criticFC2, ...
'Bias',bias.criticFC2)
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','CriticOutput',...
'Weights',weights.criticOut,...
'Bias',bias.criticOut)];
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
'Observation',{'observations'},criticOpts);
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
'Weights',weights.actorFC1,...
'Bias',bias.actorFC1)
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
'Weights',weights.actorFC2,...
'Bias',bias.actorFC2)
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','Action',...
'Weights',weights.actorOut,...
'Bias',bias.actorOut)
softmaxLayer('Name','actionProbability')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%% ↓error %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observations'}, actorOptions);
%%%% ↑error %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.02,...
'MiniBatchSize',64,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',0.05,...
'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxEpisodes,...
'MaxStepsPerEpisode',maxSteps,...
'ScoreAveragingWindowLength',250,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',maxEpisodes,...
'SaveAgentCriteria','EpisodeCount',...
'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);
0 Comments
Answers (1)
Toshinobu Shintai
on 11 Sep 2020
PPOエージェントのactionは離散でなければなりませんので、actionInfoは、例えば以下のように定義します。
actInfo = rlFiniteSetSpec({[-1, -1, -1], [1, 1, 1]});
上記の場合は、numActは2となります。numActにはアクションのパターン数を入力します。
制御器の出力としての次元数は、上記の[-1, -1, -1]のベクトルの次元数として指定します。このとき、PPOエージェントは [-1, -1, -1] か [1, 1, 1] を、その時のobservationに応じて選択して出力します。
修正したコードを添付しましたのでご確認ください。
See Also
Categories
Find more on モデルの準備 in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!