Search⌘ K
AI Features

Summary: Image Captioning with Transformers

Explore the process of building an image captioning model combining vision transformers and text decoders. Understand dataset analysis, tokenization, model components, and evaluation metrics. Gain practical insight into training and generating captions for new images.

Image captioning model

In this chapter, we focused on a very interesting task that involves generating captions for given images. Our image-captioning model was one of the most complex models in this course, which included the following:

  • A vision transformer model that produces an image representation

  • A text-based transformer ...