LSTM cell operation with different number of hidden units

Question

Neon Argentus on 10 Jul 2020

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/563108-lstm-cell-operation-with-different-number-of-hidden-units

Commented: Asvin Kumar on 2 Dec 2020

In this link, workings of LSTM in MATLAB is explained to some degree, however I need some clarification:

Let, I have 400 time steps where each will include 100-long feature vectors.

a) Assume I set LSTM hidden unit number to 1. From my intuition, from time 0 to 399, this unit will receive all feature vectors in order and process them sequentially till the end, 400th vector at step 399.

b) Now assume hidden unit number is 50. Then what I understant from documentation is, the 50 stacked units will receive first feature vector at time step 0, and of course they will receive inputs from each other depending on the hidden unit topology. At time step 1, second feature vector will propagate through the topology with other inputs and so on.

Did I interpret it right or miss something else?

Another question, when I set LSTM hidden layer number to 50, how do I know the topology?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Asvin Kumar on 3 Aug 2020

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/563108-lstm-cell-operation-with-different-number-of-hidden-units#answer_474517

For your question, I am going to refer you to an earlier answer of mine:

https://www.mathworks.com/matlabcentral/answers/525184-lstm-more-input-steps-than-hidden-layers-how-does-matlab-handle-this#answer_433337?s_tid=prof_contriblnk

Have a closer look at this link from that answer to the definition of the ‘numHiddenUnits’ property:

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html#mw_9f7c5f93-4bf2-4ddb-b922-b1c122668b9a_sep_mw_7732d29e-17f2-4182-af4b-402fdb332b67

I’m also going to refer you back to the LSTM Layer Architecture section in the link you mentioned.

https://www.mathworks.com/help/deeplearning/ug/long-short-term-memory-networks.html#mw_0862a318-9d89-4952-8803-7b529c62214e

As you can see, it is always a sequential topology.

Think of it this way, the LSTM Network unrolls to the length of your sequence. The hidden state (with height numHiddenUnits) and the cell state from the first LSTM Cell will get passed onto the second LSTM Cell. The second LSTM cell receives them both and in addition it will also receive the second input. The second cell’s hidden state and cell state will get passed onto the third cell state. This way, the LSTM unrolls to the length of your sequence.

Note: You are only setting the number of hidden units (a.k.a the length of the hidden state). You are not setting the number of cells of the LSTM.

2 Comments
Show NoneHide None

Yildirim Kocoglu on 30 Nov 2020

Hello Asvin,

This answer was very helpful in understanding LSTM workflow and architecture.

I had an additional related question to understand it better if you don't mind.

My question comes from the example, sequence-to-sequence regression in Matlab: (https://www.mathworks.com/help/deeplearning/ug/sequence-to-sequence-regression-using-deep-learning.html).

In this example, as far as I understand, the related paper uses different engines (same type) to figure out when the engine (one specific type) will fail depending on the operating conditions and initial conditions of the engine. Each engine is tested for different lengths of sequences all the way to their failing point therefore, they each have different lenghts of sequences (they fail at different times due to different initial conditions (existing wear and tear = damage) + operating conditions in the air).

When you said: "Think of it this way, the LSTM Network unrolls to the length of your sequence." I understand it for one engine with a specific length of sequence but, what does LSTM exactly learn if let's say you have 50 engines with different lengths of sequences? Does the weight dimensions (input + recurrent) for each gate ( input, forget, cell canditate, output) change due to the increase in number of examples or no? Is the initial hidden state/ cell state different for each example in this case or are they the same values for each example?

I know that in feedforward neural networks (for static data), the number of examples do not affect the weight dimensions (it is rather the architecture of your network and number of features you have for each example that determines the weight dimensions) but, it is a little harder to comprehend for me when it comes to LSTM networks for the time being.

Thank you.

Asvin Kumar on 2 Dec 2020

Does the weight dimensions (input + recurrent) for each gate ( input, forget, cell canditate, output) change due to difference in sequence length?

No. Have a look at this paragraph from just below the first image in the Layer Architecture section:

" The first LSTM block uses the initial state of the network and the first time step of the sequence to compute the first output and the updated cell state. At time step t, the block uses the current state of the network (ct−1, ht−1) and the next time step of the sequence (xt) to compute the output and the updated cell state ct. "

This should clarify that the size of inputs to the different Weight matrices is of the same shape at each time step. It only depends on the sequence input at given time step (xt), the hidden state from previous time step (ht-1) and the cell state from previous time step (ct-1). There's no dependency on the sequence length at all. If h were a 10-length (numHiddenUnits) vector then h1, h2, ..., ht will all be the same length. Similar case for xt and ct. The vector lengths at a particular time step aren't dependent on the sequence length. So, the weight dimensions don't change with the increase or decrease in sequence length.

Does the weight dimensions (input + recurrent) for each gate ( input, forget, cell canditate, output) change due to the increase in number of examples or no?

No. Like feedforward networks, weight dimensions won't change with increase in number of examples. An LSTM is like a long feedforward network that takes input of some input size and gives output of same size and repeats this process N times where N is the sequence length.

In the same way that a feedforward network's architecture determines the weight dimensions, the LSTMs architecture determines its weight dimensions too. This does not depend on how many examples you predict/train using the LSTM.

Also, the networks that we create would always work with just 1 test point. You can always predict with exactly 1 test point. This is because the weights are independent of the dataset size similar to feedforward networks.

Is the initial hidden state/ cell state different for each example in this case or are they the same values for each example?

The initial values are the same for each example/datapoint.

Sign in to comment.

LSTM cell operation with different number of hidden units

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

LSTM cell operation with different number of hidden units

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None