Preparing input data for classification using LSTM

Question

Ernest Modise - Kgamane on 31 May 2024

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/2124346-preparing-input-data-for-classification-using-lstm

Commented: Cris LaPierre on 18 Aug 2024

I am interested in classifying graphs (senquence) data to category labels. I saw that I could use LSTM however, I would like know how the primary sequence data is store for inputing into the LSTM, I also want to know how to attach know labels to each graph for purpose of training.

https://www.mathworks.com/help/deeplearning/ug/classify-sequence-data-using-lstm-networks.html

In this there is a variable / struture called waveform, how was it constructed?

Please assist

2 Comments
Show NoneHide None

Cris LaPierre on 31 May 2024

The format is described at the top of the linked example.

You can also find it described at the top of this example: Sequence-to-One Regression Using Deep Learning

Ernest Modise - Kgamane on 1 Jun 2024

Edited: Ernest Modise - Kgamane on 1 Jun 2024

LSTMdataIn.xlsx

Hi Cris

I am looking at your response, I am trying to understand it, please see my code and input file and explain where I went wrong

label = strings(997,1);

label(1:200) = 'graphtype1';

label(201:399) = 'graphtype2';

label(400:598) = 'graphtype3';

label(599:798) = 'graphtype4';

label(799:997) = 'graphtype5';

className = categorical(label);

className2 = categories(className);

Datain = xlsread('C:\Users\ernes\OneDrive\Documents\MATLAB\LSTMdataIn.xlsx');

% Above Datain has 897 graphs each with 100 samples

% E.g for graphs Datain(1:200,:) - graphtype 1

% graphs Datain(201:399) - graphtype 2

%So my objective is to train my LSTM using the graphs to labels

numObservations = 997;

[idxTrain,idxTest] = trainingPartitions(numObservations,[0.9 0.1]);

XTrain = Datain(idxTrain,:);% in Xtrain - there are 897 graphs each with 100 values, so

% Xtrain is 897 x 100,

TTrain = className(idxTrain,:);

numHiddenUnits = 120;

numClasses = 5;

layers = [

sequenceInputLayer(100) % I am not sure about this input, because my data comes in 1 by 100 arrys of a seq

%,with 1 - 100 ms timestamps

bilstmLayer(numHiddenUnits,OutputMode="last")

fullyConnectedLayer(numClasses)

softmaxLayer]

options = trainingOptions("adam", ...

MaxEpochs=200, ...

InitialLearnRate=0.002,...

GradientThreshold=1, ...

Shuffle="never", ...

Plots="training-progress", ...

Metrics="accuracy", ...

Verbose=false);

net = trainnet(XTrain,TTrain,layers,"crossentropy",options);

Sign in to comment.

Sign in to answer this question.

Answer 1

Cris LaPierre on 31 May 2024

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/2124346-preparing-input-data-for-classification-using-lstm#answer_1466081

It is a mat file. This is a way of saving variables in MATLAB to a file (see save). It loads 3 variables to the Workspace

data - a 1000x1 cell array. Each cell contains an nx3 array of signal data
freq - 1000x1 array. This is the frequency of the corresponding observation
labels - a 1000x1 categorical array containg the waveform label for the corresponding observation

You don't need to create a mat file. You just need to organze your data into a numObservations-by-1 cell array of sequences as the input data.

Each sequence (cell of data) is a numTimeSteps-by-numChannels numeric array, where numTimeSteps is the number of time steps of the sequence and numChannels is the number of channels of the sequence.

The label data is a numObservations-by-1 categorical vector.

You do not need to use freq for the example you are using.

5 Comments
Show 3 older commentsHide 3 older comments

Cris LaPierre on 12 Jun 2024

Edited: Cris LaPierre on 14 Jun 2024

Open in MATLAB Online

LSTMdataIn.xlsx

Thank you for adding your data. That makes it easier to help.

If you separate your data into a cell array where each cell contains 3 signals of the same type (a 100x3 matrix), you can the use the example code. The challenging part is that you do not have an exact multiple of 3 of all your signals. That makes the actual code a little more complicated. Still, something like this should work.

Datain = readmatrix('LSTMdataIn.xlsx');
% orient the data to be time x sample
Datain = Datain';
% split the data into an numObservatoins x 1 cell array
% Each cell contains a 100x3 matrix. All 3 signals are of the same type
% Also create
data = {};
labels = {};
s=2;
idx = [0 200 399 598 798 997];
sig = 1:3:997;
L = {'graphtype1' 'graphtype2' 'graphtype3' 'graphtype4' 'graphtype5'};
for c = 2:length(sig)
    if sig(c)>idx(s-1) && sig(c)+2<=idx(s)
        data(end+1,:) = {Datain(:,sig(c-1):sig(c)-1)};
        labels(end+1,:) = L(s-1);
    elseif sig(c)==idx(s)
        data(end+1,:) = {Datain(:,sig(c-1):sig(c)-1)};
        labels(end+1,:) = L(s-1);
        s=s+1;
    else
        s=s+1;
        % skip cells that would contain a mix of signal types
        continue
    end
end

You can then pick up using the code from the example

numChannels = size(data{1},2);

idx = [3 4 5 12];

figure

tiledlayout(2,2)

for i = 1:4

nexttile

stackedplot(data{idx(i)},DisplayLabels="Channel "+string(1:numChannels))

xlabel("Time Step")

title("Class: " + string(labels(idx(i))))

end

labels = categorical(labels);

classNames = categories(labels)

classNames = 5x1 cell array

{'graphtype1'} {'graphtype2'} {'graphtype3'} {'graphtype4'} {'graphtype5'}

numObservations = numel(data);
[idxTrain,idxTest] = trainingPartitions(numObservations,[0.9 0.1]);
XTrain = data(idxTrain);
TTrain = labels(idxTrain);
XTest = data(idxTest);
TTest = labels(idxTest);

The only change you must make is numClasses = 5;

You must open the LSTM example locally and set that as your current folder in order to get the helper function trainingPartitions.

Fan on 14 Aug 2024

Hi Cris,

Sorry for jumpping into this answered question, but i do have a similar question thats been bugging me for a while. I think the Waveform example uses a timestep of 1 so that each row in a observation ( or in a cell), is 1 timepoint.

However, if I want to use a timestep of 5 sliding from time0 - timeN using a slideing window of 1, so basically creating another dimenssion within each observation, how should I organize my input data and label vector? Do i simply make each cell a 3d array, like timestep by channel by number of sliding window?

Also, does it matter if I transpose the input from time by channel to channel by time?

Cris LaPierre on 18 Aug 2024

There is no time data in the linked eample. Instead, index is used (1:numel)..You might be able to back out the actual time step size using the freq data if necessary, but it will require some exta work, as the number of periods captured varies across observations.

If you are following the LSTM example, then yes, order matters. You can read more about the input syntax for trainnet here:

netTrained = trainnet(sequences,targets,net,lossFcn,options)

As for your question about data format, that can be specified in the InputDataFormats training option. From the linked doc page:

"The size and shape of the numeric arrays or dlarray objects that represent sequences depend on the type of sequence data and must be consistent with the InputDataFormats training option."

Sign in to comment.

Preparing input data for classification using LSTM

2 Comments
Show NoneHide None

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Preparing input data for classification using LSTM

2 Comments Show NoneHide None

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

2 Comments
Show NoneHide None

5 Comments
Show 3 older commentsHide 3 older comments