...

/

Evaluating the Model and Generating Captions from It

Evaluating the Model and Generating Captions from It

Learn to evaluate the model and generate captions on test images.

Evaluating the model

With the model trained, let’s test the model on our unseen test dataset. Testing logic is almost identical to the validation logic we discussed earlier during model training. Therefore, we won’t repeat our discussion here.

bleu_metric = BLEUMetric(tokenizer=tokenizer)
test_dataset, _ = generate_tf_dataset(
test_captions_df, tokenizer=tokenizer, n_vocab=n_vocab, batch_
size=batch_size, training=False
)
test_loss, test_accuracy, test_bleu = [], [], []
for ti, t_batch in enumerate(test_dataset):
print(f"{ti+1} batches processed", end='\r')
loss, accuracy = full_model.test_on_batch(t_batch[0], t_batch[1])
batch_predicted = full_model.predict_on_batch(t_batch[0])
bleu_score = bleu_metric.calculate_bleu_from_predictions(t_batch[1],
batch_predicted)
test_loss.append(loss)
test_accuracy.append(accuracy)
test_bleu.append(bleu_score)
print(
f"\ntest_loss: {np.mean(test_loss)} - test_accuracy: {np.mean(test_
accuracy)} - test_bleu: {np.mean(test_bleu)}"
)

This will output:

261 batches processed
test_loss: 1.057080413646625 - test_accuracy: 0.7914185857407434 - test_bleu: 0.10505496256163914

Great, we can see the model is showing a similar performance to what it did on the validation data. This means our model has not overfitted data and should perform reasonably well in the real world. Let’s now generate captions for a few sample images.

Captions generated for test images

With the help of ...