RL Agent Training for multiple training samples

Question

0 votes

I have designed a reinforcement learning (RL) environment using the Reset and Step functions. The code is provided below. In the Reset function, I utilize two files: "G1 line data.xlsx" and "G1 load data.xlsx." Currently, I obtain the first samples from these files for loads and power flow using the commands loads = load_data(1, :)'; and powerFlows = line_data(1, :)';, respectively. As a result, the RL agent trains on a single sample and then stops.

I would like to modify my approach so that the RL agent can train on all the samples in the files "G1 line data.xlsx" and "G1 load data.xlsx."

Reset Function

function [initialObs, initialState] = myResetFunction()

mpc = loadcase('case118');

%mpc.branch(:,6)=2*100*ones(1,186);

%initial_results=rundcopf(mpc)

%loads = initial_results.bus(:, 3);

%powerFlows = initial_results.branch(:, 14);

line_data=xlsread("G1 line data.xlsx");

load_data=xlsread("G1 load data.xlsx");

loads=load_data(1,:)';

powerFlows=line_data(1,:)';

% Ensure the initial observation is a column vector

initialObs = [loads; powerFlows];

initialObs = reshape(initialObs, [], 1); % Ensure column vector with correct shape

initialState = initialObs; % Initialize or reset logged signals if needed

end

Step Function

function [nextObs, reward, isDone, nextstate] = myStepFunctionnew(action, nextstate)

% Load the case

mpc = loadcase('case118');

mpc.branch(:, 6) = 2 * 100 * ones(1, 186); % Setting line limits

% Initialize penalties

genPenalty = 0;

linePenalty = 0;

% Initial generation values (example initialization)

initialGen = [

37.72083507, 41.21935468, 38.62409283, 16.8478368, 200, ...

88.01527158, 15.75220711, 12.34566609, 11.40791109, 8.44E-09, ...

196.3249472, 270.6145549, 9.26E-09, 6.794891389, 1.14E-08, ...

21.74105571, 21.99954329, 13.90368324, 4.394980313, 18.79989818, ...

201.4793428, 47.31339019, 3.51E-08, 3.52E-08, 152.5197471, ...

157.3295995, 3.07E-08, 383.9069549, 385.811582, 505.3672237, ...

1.65E-08, 1.10E-08, 1.52E-08, 1.89E-08, 2.11E-08, 2.33E-08, ...

466.9307997, 2.37E-08, 3.914399662, 594.0434087, 2.37E-08, ...

2.37E-08, 2.37E-08, 2.39E-08, 246.6411074, 39.14938215, ...

2.38E-08, 2.38E-08, 2.38E-08, 2.38E-08, 35.23444391, ...

2.38E-08, 5.851888342, 2.60E-08

]';

% Check if the action vector length matches the number of generators

if length(action) ~= length(initialGen)

error('Action vector length must match the number of generators.');

end

% Define the bounds for new generation values

lowerBound = initialGen * 0.8; % 20% reduction

upperBound = initialGen * 1.2; % 20% increase

% Calculate new generation values based on action

newGen = initialGen .* (1 + action); % Adjust PG value

% Clamp newGen to ensure it's within the bounds

newGen = max(newGen, lowerBound); % Ensure not below lower bound

newGen = min(newGen, upperBound); % Ensure not above upper bound

% Normalize newGen to ensure its sum is 4242

currentSum = sum(newGen);

desiredSum = 4242;

if currentSum ~= 0 % Avoid division by zero

scalingFactor = desiredSum / currentSum;

newGen = newGen * scalingFactor; % Scale newGen

end

mpc.gen(:, 2) = newGen; % Update the generation value

% Debugging output: Check new generation values

%disp('New Generation Values:');

%disp(newGen);

% Check if the generator is within the ±20% range of initial value

for i = 1:length(action)

if newGen(i) < 0.8 * initialGen(i) || newGen(i) > 1.2 * initialGen(i)

% High penalty for violating the generator constraint

genPenalty = genPenalty + abs(newGen(i) - initialGen(i)); % Amount by which generator constraint is violated

end

fprintf('genPenalty = %d\n', genPenalty)

% Run the DC power flow calculation

[results, success] = rundcpf(mpc);

% Extract observations

loads = results.bus(:, 3);

powerFlows = results.branch(:, 14);

nextObs = [loads; powerFlows];

nextObs = reshape(nextObs, [], 1); % Ensure column vector

% Power Flow Constraint Check

maxLineLimits = mpc.branch(:, 6);

for j = 1:length(powerFlows)

if abs(powerFlows(j)) > maxLineLimits(j)

% Calculate the line limit violation using absolute power flow

linePenalty = linePenalty + (abs(powerFlows(j)) - maxLineLimits(j)); % Amount by which line limit is violated

end

fprintf('genPenalty = %d\n', genPenalty)

fprintf('linePenalty = %d\n', linePenalty)

% Current cost calculation

currentCost = sum(results.gencost(:, 5) .* results.gen(:, 2).^2 + results.gencost(:, 6) .* results.gen(:, 2));

fprintf('currentCost = %d\n', currentCost)

% Initialize the reward

if success == 1

if genPenalty > 0 || linePenalty > 0

% If there are generator or line constraint violations

reward = -(100)*(genPenalty + linePenalty);

fprintf('reward is due to penalty %d\n', reward); % Reward for violations

else

% If no constraints are violated

reward = 1*10^4 - 0.01 * currentCost;

fprintf('reward is actual reward %d\n', reward);

fprintf('cost = %d\n', currentCost); % Calculate the reward based on current cost

end

else

reward = -127460.046762613 * 10000; % High penalty for solution divergence

fprintf('reward is due to divergence %d\n', reward);

end

% Set isDone to false as termination condition is removed

isDone = false; % Modify this as needed based on your logic

% Update the next state

nextstate = nextObs;

% Store the reward in the episode history

persistent rewardHistory;

if isempty(rewardHistory)

rewardHistory = [];

end

rewardHistory(end + 1) = reward; % Append the current reward to history

end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Aravind on 30 Oct 2024

0 votes

Hello @Praveen Verma,

I understand that you want your reinforcement learning agent to start training from different initial conditions each time, based on the conditions listed in your Excel files. From your code, it looks like you are currently reading the Excel files but only selecting the first entry. To vary the initial conditions, you can simply use a different index instead of always selecting the first one.

Here are a couple of strategies you can use to select the index for sampling:

Sample Index Tracking: Implement a system to keep track of which sample the agent is currently using. You can achieve this with a variable that persists across episodes, ensuring that each new episode uses the next sample from the dataset. If the index surpasses the number of available samples, you can reset it to start from the beginning.
Random Index: At the start of each episode, randomly select the sample index. This method removes the need for global or persistent variables and introduces variability into the training process. By randomly picking a starting sample, the agent is exposed to a wider range of initial conditions over time, which can enhance its ability to generalize and adapt to new situations.

By applying these methods, your RL agent will be able to iterate through all samples in your dataset, allowing for comprehensive training. Random initialization can enhance the robustness and adaptability of your RL agent, particularly when dealing with large and diverse datasets.

Here are some resources you might find helpful for implementing these strategies for selecting the initial condition in the reset function:

Persistent variables in MATLAB: https://in.mathworks.com/help/matlab/ref/persistent.html
Global variables in MATLAB: https://www.mathworks.com/help/matlab/ref/global.html
"randi" function for generating random integers: https://www.mathworks.com/help/matlab/ref/randi.html
Example of using the "randi" function in MATLAB: https://www.mathworks.com/help/matlab/math/random-integers.html

I hope this helps!

1 Comment
Show -1 older comments Hide -1 older comments

Praveen Verma on 30 Oct 2024

@Aravind Thanks for the response! If I change the reset function as shown below, will it work? I'm also a bit confused about how the agent will know when to move from the current sample to the next one (currentSample + 1).

function [initialObs, initialState] = myResetFunction()

persistent currentSampleIndex; % Persistent variable to track the current sample index

if isempty(currentSampleIndex)

currentSampleIndex = 1; % Initialize on the first call

end

mpc = loadcase('case118');

% Load data from Excel files

line_data = xlsread("G1 line data.xlsx");

load_data = xlsread("G1 load data.xlsx");

% Use currentSampleIndex to select loads and power flows

loads = load_data(currentSampleIndex, :)';

powerFlows = line_data(currentSampleIndex, :)';

% Ensure the initial observation is a column vector

initialObs = [loads; powerFlows];

initialObs = reshape(initialObs, [], 1); % Ensure column vector with correct shape

initialState = initialObs; % Initialize or reset logged signals if needed

% Display the sample index being used

fprintf('Using sample index: %d\n', currentSampleIndex);

% Update the sample index for the next call

currentSampleIndex = currentSampleIndex + 1; % Increment to the next index

if currentSampleIndex > size(load_data, 1)

currentSampleIndex = 1; % Reset to the first sample if at the end

end

Sign in to comment.

RL Agent Training for multiple training samples

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

RL Agent Training for multiple training samples

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments