Constant output of Reinforcement learning on optimal control problem

Question

YU WENG on 13 Oct 2023

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/2032904-constant-output-of-reinforcement-learning-on-optimal-control-problem

Answered: Rishi on 27 Oct 2023

I am using reinforcement learning for voltage control when renewable varies. However, after training the agent keeps giving me constant control action regardless what the input is. Where can be the problem? it should be something wrong with the Env at the reward function or reset function?

My understanding is, the training process achieves an optimal action that works for all the observations with highest reward in average. So the trained agent gives a constant ouptuts Whereas my objective is to let the trained agent provide the optimial action for each observation. How should i fixed it?

My code is very simple with the following points:

Observation = [Power Injections; uncertainties, voltages]; Action = [Control Injection] on selected buses.

reset function: add a random uncertainties range on power injections, then run power flow to get the voltage.

step function: take this.state and Action. apply Action to change power injections, then run power flow to get the voltage. Update system states.

reward function: high voltage improvement - control efforts needed. It can be simplified as follows:

P_inj = this.State(:,1);

Ctr = sum(P_inj(this.Ctr_bus)-this.PowerInjections(this.Ctr_bus));

Vol = this.State(:,3)-this.GoalVoltage;

Reward = sum(Vol)-Ctr;

Thank you for your time and help. I can provide more details if needed.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Rishi on 27 Oct 2023

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2032904-constant-output-of-reinforcement-learning-on-optimal-control-problem#answer_1342006

Hi Yu,

I understand from your query that you want to know why the actor keeps on giving constant control actions, irrespective of the input.

Based on the information provided, it looks like the issue lies in the reward function. If the agent consistently receives a high reward for a certain action, it will learn to stick to that action regardless of the input. The current reward function sums up the voltage improvement and then subtracts the control efforts required. This formulation might lead the agent to favour a constant control action that achieves a high average reward.

Here are a few things you can try to overcome this:

Consider incorporating a penalty for deviation of the control agent from the optimal action.
Penalize the agent for deviating from optimal value for each observation. Instead of summing the voltage difference, calculate the absolute difference for each bus, and penalize the agent based on these differences.
Use different reward shaping techniques.

Try iterating and experimenting with different penalties and rewards to obtain the best result.

You can try the following penalty functions from Reinforcement Learning Toolbox: