Attention
Learn about the attention mechanism and why it's important.
Chapter Goals:
- Learn about attention and understand why it's useful
- Incorporate attention into the decoder LSTM
A. Using the encoder
Based on the encoder-decoder model architecture, the only thing that the decoder gets from the encoder is the final state in each layer. The final states basically encapsulate the encoder's extracted information from the input sequence, which is passed into the decoder.
However, trying to encapsulate all the useful information from an input sequence into a final state is a difficult task, especially if the input sequence is large and contains long-term dependencies. This is a problem that has been shown to exist in practice, where decoders perform poorly on input sequences with long-term dependencies.
The obvious solution to this issue is to give the decoder access to each of the encoder's intermediate time step outputs. In the previous chapter's diagram, the encoder's outputs were not used. However, if we were to use the encoder's outputs as additional input for the decoder, it would give the decoder a lot more useful information about the input sequence. The way we do this is by using ...