...
/Intricacies of Long Short-Term Memory Cells
Intricacies of Long Short-Term Memory Cells
Discover the core principles of LSTM cells and their unique state mechanisms that enable the retention and processing of long-term dependencies.
We'll cover the following...
Long Short-Term Memories (LTSMs) are one of the most abstruse theories in elementary deep learning. Comprehending the fundamentals of an LSTM from its original paper(s) can be intimidating. It is deconstructed into its elements for an easier understanding, and every element is explained in this lesson. This begins with a typical neural network illustration shown below.
The input to an LSTM layer is a time window of observations. In the illustration above, it’s denoted as . This represents -dimensional observations in a window of size .
This window of observations serves as an input sample. The window allows the network to learn spatial and temporal relationships.
LSTM cell
The hidden layers in the above illustration are LSTM layers. The nodes in a layer form an LSTM cell (highlighted in orange). It’s called a cell because it performs a complex biological cell-like multi-step procedure. It’s important to know the distinguishing property that the cell mechanism brings to LSTM.
The cell mechanism in LSTM has an element called cell state. A cell state can be imagined as a Pensieve in “Harry Potter.” If you’re not a Harry Potter fan, it’s a magical device that could be used to store and review memories. Like a Pensieve, sans magic, a cell state preserves memories from current to distant past. Due to the cell state, it becomes easier to spot patterns and links by having current and distant memories. And this makes the difference for LSTMs.
Cell state mechanism
The cell state mechanism is explained with the help of an intuitive illustration below.
The blue-shaded larger box denotes an LSTM cell in the left illustration above. The cell operations are deconstructed inside the box and explained below.
-
The input sample to a cell is a time window of observations . For simplicity, is replaced with in the illustration. The observations are, therefore, shown as .
-
The cell sequentially processes the time-indexed observations.
-
The iterations are shown as green boxes sequentially laid inside the deconstructed cell.
-
A green box takes in one time-step . It performs some operations to compute the cell state, , and the output, .
-
Like the other
, the hidden output is transmitted to the next iteration and also returned as a cell output. This is shown with branched arrows with horizontal and vertical branches carrying . The horizontal branch goes to the next green box (iteration), and the vertical branch exits the cell as an output.RNNs recurrent neural networks -
Differently from the other RNNs, an LSTM cell also transmits the cell state .
-
Imagine the iterations along the time-steps, , in a cell as a drive down a lane. Let’s call it a “memory lane.” A green box is a station on this lane. And, there’s a “truck of information” carrying the cell state, that is, the memory.
-
The truck starts from the left at the first station. At this station, the inputted observation is assessed to see whether the information therein is relevant or not. If yes, it’s loaded onto the truck. Otherwise, it’s ignored.
-
The loading on the truck is the cell state. In the illustration, is shown as important and loaded as part of the cell state to the truck.
-
The cell state as truckloads are denoted as to express that the state is some function of the ’s and not the original .
-
The truck then moves to the next station. Here, it’s unloaded, and the state/memory learned so far is taken out. The station assesses the unloaded state alongside the available in it.
-
Suppose this station is . Two assessments are made here:
-
First, is the information in the at the station relevant? If yes, add it to the state .
-
Second, in the presence of , is the memory from the prior ’s still relevant? If irrelevant, forget the memory.
For example, the station next to is . Here, is found to be relevant and added to the state. At the same time, it is found that is irrelevant in the presence of . And therefore, the memory of is taken out of the state. Or, in LSTM terminology, is “forgotten.”
-
After the processing, the state is loaded back on the truck.
-
The process of loading and unloading the truck of information is repeated till the last ...