Calculating Loss
Calculate the loss for your LSTM model.
We'll cover the following...
Chapter Goals:
- Convert your LSTM model's outputs into logits
- Use a padding mask to calculate the overall loss
A. Logits & loss
As mentioned in earlier chapters, the task for a language model is no different from regular multiclass classification. Therefore, the loss function will still be the regular softmax cross entropy loss. We use a final fully-connected layer to convert model outputs into logits for each of the possible classes (i.e. vocabulary words).
Press + to interact
import tensorflow as tf# Output from an LSTM# Shape: (batch_size, time_steps, cell_size)lstm_outputs = tf.compat.v1.placeholder(tf.float32, shape=(None, 10, 7))vocab_size = 100#print(lstm_outputs)logits = tf.keras.layers.Dense(units=vocab_size)(lstm_outputs)# Target tokenized sequences# Shape: (batch_size, time_steps)target_sequences = tf.compat.v1.placeholder(tf.int64, shape=(None, 10))loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=target_sequences,logits=logits)
The function used to calculate the softmax cross entropy loss for feed-forward neural networks is ...