A long short-term memory network is a type of recurrent neural network (RNN). LSTMs excel in learning, processing, and classifying sequential data. Common areas of application include sentiment analysis, language modeling, speech recognition, and video analysis.

The most popular way to train an RNN is by backpropagation through time. However, the problem of the vanishing gradients often causes the parameters to capture short-term dependencies while the information from earlier time steps decays. The reverse issue, exploding gradients, may also occur, causing the error to grow drastically with each time step.

Recurrent neural network.

Long short-term memory networks aim to overcome the issue of the vanishing gradients by using the gates to selectively retain information that is relevant and forget information that is not relevant. Lower sensitivity to the time gap makes LSTM networks better for analysis of sequential data than simple RNNs.

The architecture for an LSTM block is shown below. An LSTM block typically has a memory cell, input gate, output gate, and a forget gate in addition to the hidden state in traditional RNNs.

Long Short-Term Memory block.

The weights and biases to the input gate control the extent to which a new value flows into the cell. Similarly, the weights and biases to the forget gate and output gate control the extent to which a value remains in the cell and the extent to which the value in the cell is used to compute the output activation of the LSTM block, respectively.

For more details on the LSTM network, see Deep Learning Toolbox™.

See also: deep learning, machine learning, data science, MATLAB GPU computing, artificial intelligence

Deep Learning and Traditional Machine Learning: Choosing the Right Approach