I have designed a reinforcement learning (RL) environment using the Reset and Step functions. The code is provided below. In the Reset function, I utilize two files: "G1 line data.xlsx" and "G1 load data.xlsx." Currently, I obtain the first samples from these files for loads and power flow using the commands loads = load_data(1, :)'; and powerFlows = line_data(1, :)';, respectively. As a result, the RL agent trains on a single sample and then stops.
I would like to modify my approach so that the RL agent can train on all the samples in the files "G1 line data.xlsx" and "G1 load data.xlsx."
Reset Function
function [initialObs, initialState] = myResetFunction()
mpc = loadcase('case118');
%loads = initial_results.bus(:, 3);
%powerFlows = initial_results.branch(:, 14);
line_data=xlsread("G1 line data.xlsx");
load_data=xlsread("G1 load data.xlsx");
% Ensure the initial observation is a column vector
initialObs = [loads; powerFlows];
initialObs = reshape(initialObs, [], 1); % Ensure column vector with correct shape
initialState = initialObs; % Initialize or reset logged signals if needed
Step Function
function [nextObs, reward, isDone, nextstate] = myStepFunctionnew(action, nextstate)
% Load the case
mpc = loadcase('case118');
mpc.branch(:, 6) = 2 * 100 * ones(1, 186); % Setting line limits
% Initialize penalties
genPenalty = 0;
linePenalty = 0;
% Initial generation values (example initialization)
initialGen = [
37.72083507, 41.21935468, 38.62409283, 16.8478368, 200, ...
88.01527158, 15.75220711, 12.34566609, 11.40791109, 8.44E-09, ...
196.3249472, 270.6145549, 9.26E-09, 6.794891389, 1.14E-08, ...
21.74105571, 21.99954329, 13.90368324, 4.394980313, 18.79989818, ...
201.4793428, 47.31339019, 3.51E-08, 3.52E-08, 152.5197471, ...
157.3295995, 3.07E-08, 383.9069549, 385.811582, 505.3672237, ...
1.65E-08, 1.10E-08, 1.52E-08, 1.89E-08, 2.11E-08, 2.33E-08, ...
466.9307997, 2.37E-08, 3.914399662, 594.0434087, 2.37E-08, ...
2.37E-08, 2.37E-08, 2.39E-08, 246.6411074, 39.14938215, ...
2.38E-08, 2.38E-08, 2.38E-08, 2.38E-08, 35.23444391, ...
2.38E-08, 5.851888342, 2.60E-08
% Check if the action vector length matches the number of generators
if length(action) ~= length(initialGen)
error('Action vector length must match the number of generators.');
% Define the bounds for new generation values
lowerBound = initialGen * 0.8; % 20% reduction
upperBound = initialGen * 1.2; % 20% increase
% Calculate new generation values based on action
newGen = initialGen .* (1 + action); % Adjust PG value
% Clamp newGen to ensure it's within the bounds
newGen = max(newGen, lowerBound); % Ensure not below lower bound
newGen = min(newGen, upperBound); % Ensure not above upper bound
% Normalize newGen to ensure its sum is 4242
currentSum = sum(newGen);
desiredSum = 4242;
if currentSum ~= 0 % Avoid division by zero
scalingFactor = desiredSum / currentSum;
newGen = newGen * scalingFactor; % Scale newGen
mpc.gen(:, 2) = newGen; % Update the generation value
% Debugging output: Check new generation values
%disp('New Generation Values:');
% Check if the generator is within the ±20% range of initial value
for i = 1:length(action)
if newGen(i) < 0.8 * initialGen(i) || newGen(i) > 1.2 * initialGen(i)
% High penalty for violating the generator constraint
genPenalty = genPenalty + abs(newGen(i) - initialGen(i)); % Amount by which generator constraint is violated
fprintf('genPenalty = %d\n', genPenalty)
% Run the DC power flow calculation
[results, success] = rundcpf(mpc);
% Extract observations
loads = results.bus(:, 3);
powerFlows = results.branch(:, 14);
nextObs = [loads; powerFlows];
nextObs = reshape(nextObs, [], 1); % Ensure column vector
% Power Flow Constraint Check
maxLineLimits = mpc.branch(:, 6);
for j = 1:length(powerFlows)
if abs(powerFlows(j)) > maxLineLimits(j)
% Calculate the line limit violation using absolute power flow
linePenalty = linePenalty + (abs(powerFlows(j)) - maxLineLimits(j)); % Amount by which line limit is violated
fprintf('genPenalty = %d\n', genPenalty)
fprintf('linePenalty = %d\n', linePenalty)
% Current cost calculation
currentCost = sum(results.gencost(:, 5) .* results.gen(:, 2).^2 + results.gencost(:, 6) .* results.gen(:, 2));
fprintf('currentCost = %d\n', currentCost)
% Initialize the reward
if success == 1
if genPenalty > 0 || linePenalty > 0
% If there are generator or line constraint violations
reward = -(100)*(genPenalty + linePenalty);
fprintf('reward is due to penalty %d\n', reward); % Reward for violations
% If no constraints are violated
reward = 1*10^4 - 0.01 * currentCost;
fprintf('reward is actual reward %d\n', reward);
fprintf('cost = %d\n', currentCost); % Calculate the reward based on current cost
reward = -127460.046762613 * 10000; % High penalty for solution divergence
fprintf('reward is due to divergence %d\n', reward);
% Set isDone to false as termination condition is removed
isDone = false; % Modify this as needed based on your logic
% Update the next state
nextstate = nextObs;
% Store the reward in the episode history
persistent rewardHistory;
if isempty(rewardHistory)
rewardHistory = [];
rewardHistory(end + 1) = reward; % Append the current reward to history