Notation

Before we begin, note that in all the equations, the weight matrices (W) are indexed, with the first index being the vector that they process, while the second index referring to the representation (i.e., input gate, forget gate).

To avoid confusion and maximize understanding, we will use the common notation: matrices are depicted with capital letters while vectors are represented by lowercase letters. For the element-wise matrix multiplication, the dot with the outer circle symbol $\odot$ is used.

$c_t$ refers to the long term memory factor at timestep $t$ , which is the main improvement of LSTM cells compared to plain RNNs.
$x_t \in R^{N}$ is the input vector with $N$ elements at timestep $t$ .
${h}_{t} \in R^{H}$ denotes the hidden RNN vector at timestep $t$ . In the beginning, we initialize the vector with zeros for the first element of the sequence. $H$ is the hidden state dimension size.
$\sigma$ is the sigmoid non-linear activation function.

Equations of the LSTM cell

For $x_t \in R^{N}$ , where N is the feature length of each timestep, while $i_t,f_t,o_t,h_t,h_{t-1},c_t,c_{t-1},b \in R^{H}$ , where H is the hidden state dimension, the LSTM equations are the following:

i_t = \sigma( W_{xi} x_t + W_{hi} {h}_{t-1} + {W}_{ci} {c}_{t-1} + {b}_i) \quad\quad(1)

Learn Deep Learning

Neural Networks

Training Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

Autoencoders

Generative Adversarial Networks

Attention and Transformers

Graph Neural Networks

Conclusion

Final Quiz

LSTM: Long Short Term Memory Cells

How does LSTM work?

Notation

Equations of the LSTM cell