Search⌘ K
AI Features

Introduction: Understanding Long Short-Term Memory Networks

Understand how long short-term memory networks (LSTMs) handle long-term dependencies in sequential data by overcoming vanishing gradients. Learn their architecture, compare them with vanilla RNNs, and explore improvements and variants such as bidirectional LSTMs, peephole connections, and gated recurrent units.

How long short-term memory networks work

In this chapter, we’ll discuss the fundamentals behind a more advanced RNN variant known as long short-term memory networks (LSTMs). Here, we’ll focus on understanding the theory behind LSTMs so we can discuss their implementation later. LSTMs are widely used in many sequential tasks (including stock market prediction, language modeling, and machine translation) and have proven to perform better than older sequential models (for example, standard RNNs), especially given the availability of large amounts of data. LSTMs are designed to avoid the problem of the vanishing gradient. ...