LSTM: Long Short Term Memory Cells
Dive into the mathematics and intuition of Long Short Term Memory Cells (LSTM).
How does LSTM work?
LSTM (Long Short Term Memory) cells are the most commonly used RNN cell nowadays. Don’t be scared of the math! We will slowly clarify every term by inspecting every equation separately.
LSTM is the most popular RNN because it has a special memory component denoted by in the equations. It represents long-term memory.
Here is a sketch overview:
Let’s start with some notations.
Notation
Before we begin, note that in all the equations, the weight matrices (W) are indexed, with the first index being the vector that they process, while the second index referring to the representation (i.e., input gate, forget gate).
To avoid confusion and maximize understanding, we will use the common notation: matrices are depicted with capital letters while vectors are represented by lowercase letters. For the element-wise matrix multiplication, the dot with the outer circle symbol is used.
-
refers to the long term memory factor at timestep , which is the main improvement of LSTM cells compared to plain RNNs.
-
is the input vector with elements at timestep .
-
denotes the hidden RNN vector at timestep . In the beginning, we initialize the vector with zeros for the first element of the sequence. is the hidden state dimension size.
-
is the sigmoid non-linear activation function.
Equations of the LSTM cell
For , where N is the feature length of each timestep, while , where H is the hidden state dimension, the LSTM equations are the following:
...