Understanding Deep Learning Applications in Rare Event Prediction/

...

Intricacies of Long Short-Term Memory Cells

Discover the core principles of LSTM cells and their unique state mechanisms that enable the retention and processing of long-term dependencies.

We'll cover the following...

LSTM cell
Cell state mechanism
Cell operations

Press + to interact

The input to an LSTM layer is a time window of observations. In the illustration above, it’s denoted as $x_{(T −τ ):T}$ . This represents $p$ -dimensional observations in a window of size $τ$ .

This window of observations serves as an input sample. The window allows the network to learn spatial and temporal relationships.

LSTM cell

The hidden layers in the above illustration are LSTM layers. The nodes in a layer form an LSTM cell (highlighted in orange). It’s called a cell because it performs a complex biological cell-like multi-step procedure. It’s important to know the distinguishing property that the cell mechanism brings to LSTM.

The cell mechanism in LSTM has an element called cell state. A cell state can be imagined as a Pensieve in “Harry Potter.” If you’re not a Harry Potter fan, it’s a magical device that could be used to store and review memories. Like a Pensieve, sans magic, a cell state preserves memories from current to distant past. Due to the cell state, it becomes easier to spot patterns and links by having current and distant memories. And this makes the difference for LSTMs.

Cell state mechanism

The cell state mechanism is explained with the help of an intuitive illustration below.

The blue-shaded larger box denotes an LSTM cell in the left illustration above. The cell operations are deconstructed inside the box and explained below.

The input sample to a cell is a time window of observations $x_{(T−τ):T}$ . For simplicity, $T − τ$ is replaced with $0$ in the illustration. The observations are, therefore, shown as $x_0,x_1,\ldots,x_T$ .
The cell sequentially processes the time-indexed observations.
The iterations are shown as green boxes sequentially laid inside the deconstructed cell.
A green box takes in one time-step $x_t$ . It performs some operations to compute the cell state, $c_t$ , and the output, $h_t$ .
Like the other RNNsrecurrent neural networks, the hidden output $h_t$ is transmitted to the next iteration and also returned as a cell output. This is shown with branched arrows with horizontal and vertical branches carrying $h_t$ . The horizontal branch goes to the next green box (iteration), and the vertical branch exits the cell as an output.
Differently from the other RNNs, an LSTM cell also transmits the cell state $c_t$ .
Imagine the iterations along the time-steps, $t = 0, \ldots , T$ , in a cell as a drive down a lane. Let’s call it a “memory lane.” A green box is a station on this lane. And, there’s a “truck of information” carrying the cell state, that is, the memory.
The truck starts from the left at the first station. At this station, the inputted observation $x_0$ is assessed to see whether the information therein is relevant or not. If yes, it’s loaded onto the truck. Otherwise, it’s ignored.
The loading on the truck is the cell state. In the illustration, $x_0$ is shown as important and loaded as part of the cell state to the truck.
The cell state $c_t$ as truckloads are denoted as $(x_·)$ to express that the state is some function of the $x$ ’s and not the original $x$ .
The truck then moves to the next station. Here, it’s unloaded, and the state/memory learned so far is taken out. The station assesses the unloaded state alongside the $x$ available in it.
Suppose this station is $t$ . Two assessments are made here:

First, is the information in the $x_t$ at the station relevant? If yes, add it to the state $c_t$ .
Second, in the presence of $x_t$ , is the memory from the prior $x$ ’s still relevant? If irrelevant, forget the memory.

For example, the station next to $x_0$ is $x_1$ . Here, $x_1$ is found to be relevant and added to the state. At the same time, it is found that $x_0$ is irrelevant in the presence of $x_1$ . And therefore, the memory of $x_0$ is taken out of the state. Or, in LSTM terminology, $x_0$ is “forgotten.”

After the processing, the state is loaded back on the truck.
The process of loading and unloading the truck of information is repeated till the last $x_T$ in the sample. Further down the lane, it’s shown that $x_2$ , $x_{T −2}$ ...

Getting Started

Rare Event Prediction

Multi-Layer Perceptrons (MLPs)

Long Short-Term Memory (LSTM) Networks

Convolutional Neural Networks (CNNs)

Autoencoders

Conclusion

Intricacies of Long Short-Term Memory Cells

LSTM cell

Cell state mechanism