Calculating Loss

Calculate the loss for your LSTM model.

Chapter Goals:

  • Convert your LSTM model's outputs into logits
  • Use a padding mask to calculate the overall loss

A. Logits & loss

As mentioned in earlier chapters, the task for a language model is no different from regular multiclass classification. Therefore, the loss function will still be the regular softmax cross entropy loss. We use a final fully-connected layer to convert model outputs into logits for each of the possible classes (i.e. vocabulary words).

Press + to interact
import tensorflow as tf
# Output from an LSTM
# Shape: (batch_size, time_steps, cell_size)
lstm_outputs = tf.compat.v1.placeholder(tf.float32, shape=(None, 10, 7))
vocab_size = 100
#print(lstm_outputs)
logits = tf.keras.layers.Dense(units=vocab_size)(lstm_outputs)
# Target tokenized sequences
# Shape: (batch_size, time_steps)
target_sequences = tf.compat.v1.placeholder(tf.int64, shape=(None, 10))
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=target_sequences,
logits=logits)

The function used to calculate the softmax cross entropy loss for feed-forward neural networks is ...