Understanding Long Short-Term Memory Networks

Learn about the essentials of LSTM networks, exploring their unique ability to capture long-term dependencies in sequential data.

Note: Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence (Olah 2015).

Recurrent neural networks (RNN)

Sequences and time series processes are like essays. The order of words in an essay and, likewise, the observations in sequences and time series processes are essential. Due to this, they have temporal patterns. This means the previous observations (the memory) have an effect on the future. Memory persistence is one approach to learning such temporal patterns. Recurrent neural networks (RNNs), such as long-term and short-term memory networks, were conceptualized for this purpose.

RNNs constitute a very powerful class of computational models capable of learning arbitrary dynamics. They work by keeping a memory of patterns in sequential orders. They combine knowledge from memories with current information to make a prediction. RNN development can be traced back to Rumelhart, G. E. Hinton, and R. J. Williams 1985. Several RNN variants have been developed since then. For example, Elman 1990, Jordan 1990, and time-delay neural networks by Lang, Waibel, and G. E. Hinton 1990. However, most of the RNNs became obsolete because of their inability to learn long-term memories. This was due to the vanishing gradient issue, which we’ll discuss later in this course.

Long short-term memory

Get hands-on with 1200+ tech skills courses.