Summary: Image Captioning with Transformers
Explore the process of building an image captioning model combining vision transformers and text decoders. Understand dataset analysis, tokenization, model components, and evaluation metrics. Gain practical insight into training and generating captions for new images.
We'll cover the following...
We'll cover the following...
Image captioning model
In this chapter, we focused on a very interesting task that involves generating captions for given images. Our image-captioning model was one of the most complex models in this course, which included the following:
A vision transformer model that produces an image representation
A text-based transformer ...