...

/

Intricacies of Long Short-Term Memory Cells

Intricacies of Long Short-Term Memory Cells

Discover the core principles of LSTM cells and their unique state mechanisms that enable the retention and processing of long-term dependencies.

Long Short-Term Memories (LTSMs) are one of the most abstruse theories in elementary deep learning. Comprehending the fundamentals of an LSTM from its original paper(s) can be intimidating. It is deconstructed into its elements for an easier understanding, and every element is explained in this lesson. This begins with a typical neural network illustration shown below.

Press + to interact
A high-level representation of an LSTM network
A high-level representation of an LSTM network

The input to an LSTM layer is a time window of observations. In the illustration above, it’s denoted as x(Tτ):Tx_{(T −τ ):T}. This represents pp-dimensional observations in a window of size ττ.

This window of observations serves as an input sample. The window allows the network to learn spatial and temporal relationships.

LSTM cell

The hidden layers in the above illustration are LSTM layers. The nodes in a layer form an LSTM cell (highlighted in orange). It’s called a cell because it performs a complex biological cell-like multi-step procedure. It’s important to know the distinguishing property that the cell mechanism brings to LSTM.

The cell mechanism in LSTM has an element called cell state. A cell state can be imagined as a Pensieve in “Harry Potter.” If you’re not a Harry Potter fan, it’s a magical device that could be used to store and review memories. Like a Pensieve, sans magic, a cell state preserves memories from current to distant past. Due to the cell state, it becomes easier to spot patterns and links by having current and distant memories. And this makes the difference for LSTMs.

Cell state mechanism

The cell state mechanism is explained with the help of an intuitive illustration below.

Illustration of an unwrapped LSTM cell mechanism showing the time-step iterations
Illustration of an unwrapped LSTM cell mechanism showing the time-step iterations
A condensed form of the LSTM cell mechanism
A condensed form of the LSTM cell mechanism

The blue-shaded larger box denotes an LSTM cell in the left illustration above. The cell operations are deconstructed inside the box and explained below.

  1. The input sample to a cell is a time window of observations x(Tτ):Tx_{(T−τ):T}. For simplicity, TτT − τ is replaced with 00 in the illustration. The observations are, therefore, shown as x0,x1,,xTx_0,x_1,\ldots,x_T.

  2. The cell sequentially processes the time-indexed observations.

  3. The iterations are shown as green boxes sequentially laid inside the deconstructed cell.

  4. A green box takes in one time-step xtx_t. It performs some operations to compute the cell state, ctc_t, and the output, hth_t.

  5. Like the other RNNsrecurrent neural networks, the hidden output hth_t is transmitted to the next iteration and also returned as a cell output. This is shown with branched arrows with horizontal and vertical branches carrying hth_t. The horizontal branch goes to the next green box (iteration), and the vertical branch exits the cell as an output.

  6. Differently from the other RNNs, an LSTM cell also transmits the cell state ctc_t.

  7. Imagine the iterations along the time-steps, t=0,,Tt = 0, \ldots , T, in a cell as a drive down a lane. Let’s call it a “memory lane.” A green box is a station on this lane. And, there’s a “truck of information” carrying the cell state, that is, the memory.

  8. The truck starts from the left at the first station. At this station, the inputted observation x0x_0 is assessed to see whether the information therein is relevant or not. If yes, it’s loaded onto the truck. Otherwise, it’s ignored.

  9. The loading on the truck is the cell state. In the illustration, x0x_0 is shown as important and loaded as part of the cell state to the truck.

  10. The cell state ctc_t as truckloads are denoted as (x)(x_·) to express that the state is some function of the xx’s and not the original xx.

  11. The truck then moves to the next station. Here, it’s unloaded, and the state/memory learned so far is taken out. The station assesses the unloaded state alongside the xx available in it.

  12. Suppose this station is tt. Two assessments are made here:

  • First, is the information in the xtx_t at the station relevant? If yes, add it to the state ctc_t.

  • Second, in the presence of xtx_t, is the memory from the prior xx’s still relevant? If irrelevant, forget the memory.

    For example, the station next to x0x_0 is x1x_1. Here, x1x_1 is found to be relevant and added to the state. At the same time, it is found that x0x_0 is irrelevant in the presence of x1x_1. And therefore, the memory of x0x_0 is taken out of the state. Or, in LSTM terminology, x0x_0 is “forgotten.”

  1. After the processing, the state is loaded back on the truck.

  2. The process of loading and unloading the truck of information is repeated till the last xTx_T ...