Calculating Loss
Calculate the model's loss based on logits and sparse outputs.
We'll cover the following...
Chapter Goals:
- Calculate the training loss based on the model's logits and final token sequences
A. Final token sequence
So far, we've used the input sequences and ground truth sequences for training the encoder-decoder model. The final token sequences are used when calculating the loss.
If we view the decoder as a language model, the ground truth sequences act as the language model's input while the final token sequences act as the "correct" output for the language model.
In a language model, we calculate ...