Constant output of Reinforcement learning on optimal control problem

7 views (last 30 days)
I am using reinforcement learning for voltage control when renewable varies. However, after training the agent keeps giving me constant control action regardless what the input is. Where can be the problem? it should be something wrong with the Env at the reward function or reset function?
My understanding is, the training process achieves an optimal action that works for all the observations with highest reward in average. So the trained agent gives a constant ouptuts Whereas my objective is to let the trained agent provide the optimial action for each observation. How should i fixed it?
My code is very simple with the following points:
Observation = [Power Injections; uncertainties, voltages]; Action = [Control Injection] on selected buses.
reset function: add a random uncertainties range on power injections, then run power flow to get the voltage.
step function: take this.state and Action. apply Action to change power injections, then run power flow to get the voltage. Update system states.
reward function: high voltage improvement - control efforts needed. It can be simplified as follows:
P_inj = this.State(:,1);
Ctr = sum(P_inj(this.Ctr_bus)-this.PowerInjections(this.Ctr_bus));
Vol = this.State(:,3)-this.GoalVoltage;
Reward = sum(Vol)-Ctr;
Thank you for your time and help. I can provide more details if needed.

Answers (1)

Rishi
Rishi on 27 Oct 2023
Hi Yu,
I understand from your query that you want to know why the actor keeps on giving constant control actions, irrespective of the input.
Based on the information provided, it looks like the issue lies in the reward function. If the agent consistently receives a high reward for a certain action, it will learn to stick to that action regardless of the input. The current reward function sums up the voltage improvement and then subtracts the control efforts required. This formulation might lead the agent to favour a constant control action that achieves a high average reward.
Here are a few things you can try to overcome this:
  • Consider incorporating a penalty for deviation of the control agent from the optimal action.
  • Penalize the agent for deviating from optimal value for each observation. Instead of summing the voltage difference, calculate the absolute difference for each bus, and penalize the agent based on these differences.
  • Use different reward shaping techniques.
Try iterating and experimenting with different penalties and rewards to obtain the best result.
You can try the following penalty functions from Reinforcement Learning Toolbox:
You can also use the ‘generateRewardFunction’ to generate different reward functions. You can learn more about that from the documentation below:
Hope this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!