I able to make below code for prediction of variable using Google, this code is runing fine:
file=csvread('data.csv'); % you can delete first column as its date column
data=cell(size(file,2),1); %features by variables
for i=1:size(data,1)
data{i}=file(:,i).'; % feature wise timeseries dataset of variables
end
%Partition the data into training and test sets. Use 90% of the observations for training and the remainder for testing.
numObservations = numel(data);
idxTrain = 1:floor(0.9*numObservations);
idxTest = floor(0.9*numObservations)+1:numObservations;
dataTrain = data(idxTrain);
dataTest = data(idxTest);
%%Prepare Data for Training
for n = 1:numel(dataTrain)
X = dataTrain{n};
XTrain{n} = X(:,1:end-1);
TTrain{n} = X(:,2:end);
end
%% data is going normalized
muX = mean(cat(2,XTrain{:}),2); %calculating means values why code calculating all population mean?
sigmaX = std(cat(2,XTrain{:}),0,2);
muT = mean(cat(2,TTrain{:}),2);
sigmaT = std(cat(2,TTrain{:}),0,2);
for n = 1:numel(XTrain)
XTrain{n} = (XTrain{n} - muX) ./ sigmaX;
TTrain{n} = (TTrain{n} - muT) ./ sigmaT;
end
%Define LSTM Network Architecture
numChannels = size(data{1},1)
layers = [
sequenceInputLayer(numChannels)
lstmLayer(128)
fullyConnectedLayer(numChannels)
regressionLayer];
%%Specify Training Options
maxEpochs = 100; % not sure what is this value means, I just add this after google ??????
miniBatchSize = 27; % not sure what is this mini batch size? What will be suitable for my above dataset
%setting option
options = trainingOptions('adam', ...
'ExecutionEnvironment','cpu', ...
'MaxEpochs',maxEpochs, ...
'MiniBatchSize',miniBatchSize, ...
'GradientThreshold',1, ...
'Verbose',false, ...
'Plots','training-progress');
%%Train Neural Network
net = trainNetwork(XTrain,TTrain,layers,options); % taking all trainging data mean?? why i not need this
%Test Network
for n = 1:size(dataTest,1)
X = dataTest{n};
XTest{n} = (X(:,1:end-1) - muX) ./ sigmaX;
TTest{n} = (X(:,2:end) - muT) ./ sigmaT;
end
%%Make predictions using the test data.
YTest = predict(net,XTest)%,SequencePaddingDirection="left");
%%To evaluate the accuracy, for each test sequence, calculate the root mean squared error (RMSE) between the predictions and the target.
for i = 1:size(YTest,1)
rmse(i) = sqrt(mean((YTest{i} - TTest{i}).^2,"all"));
end
figure
histogram(rmse)
xlabel("RMSE")
ylabel("Frequency")
mean(rmse)
%%Forecast Future Time Steps
idx = 2;
X = XTest{idx};
T = TTest{idx};
figure
%stackedplot(X',DisplayLabels="Channel " + (1:numChannels))
stackedplot(X')
xlabel("Time Step")
title("Test Observation " + idx)
%%Open Loop Forecasting
net = resetState(net);
offset = 1;
[net,~] = predictAndUpdateState(net,X(:,1:offset));
%%To forecast further predictions, loop over time steps and update the network state using the predictAndUpdateState function
numTimeSteps = size(X,2); % why numTimesteps set here 2? My dataset is of monthly duration?????
numPredictionTimeSteps = numTimeSteps - offset;
Y = zeros(numChannels,numPredictionTimeSteps);
for t = 1:numPredictionTimeSteps
Xt = X(:,offset+t);
[net,Y(:,t)] = predictAndUpdateState(net,Xt);
end
%%Compare the predictions with the target values.
figure
%t = tiledlayout(numChannels,1); % tiledlayout in 2019b use subplot
t = subplot(numChannels,1,1);
title(t,"Open Loop Forecasting")
for i = 1:numChannels
plot(T(i,:))
hold on
plot(offset:numTimeSteps,[T(i,offset) Y(i,:)],'--')
ylabel("Channel " + i)
end
xlabel("Time Step")
legend(["Input" "Forecasted"])
Problems in the above code:
(1) I want to predict my first three columns which depends on 41 rest of colums? I want to predicted them one by one actual vs predicted in plots. I not sure whether the above code is doing the same or not??
(2) Just after the training in above code, why this mean, stdv has been caculate for all training data and testing data? Training and testing dataset consist on my 41 independent variables. Each variable has different sense that why mean or stdv looking not sense to this code? Is mean and stdv for single variable will be suitable in ANN? (according to understanding)?
(3) This is my step by step working: I want to make 90% training and 10% testing, then want to train the ANN. Then validate and select best neural network. Then want to predict 2 last values of first 3 columns in data.csv file from optimum ANN?
(4) At the end, I want to make sure from experts over this plate form which verify my code whether its doing the following tasks correctly or not ??
(5) Is after training the dataset for 45 variables, I can predict 1 variables out of 3? then variable no 2 and 3 ? Is the results get from the train model will be reliable for predicting individual variable? It's much confusing to me? Please clerify ? Please
I'll be very thankful to experts for timely response?