...

VideoBERT Model

Learn about pre-training the VideoBERT model using the cloze and linguistic-visual alignment tasks.

We'll cover the following...

Pre-training a VideoBERT model
Cloze task
- Example: Cooking video
- Video for Training the model
Linguistic-visual alignment
The final pre-training objective

Now we'll learn about yet another interesting variant of BERT called VideoBERT. As the name suggests, along with learning the representation of language, VideoBERT also learns the representation of video. It is the first model that learns the representation of both video and language in a joint manner.

Just as we used a pre-trained BERT model and fine-tuned it for downstream tasks, we can also use a pre-trained VideoBERT model and fine-tune it for many interesting downstream tasks. VideoBERT is used for tasks such as image caption generation, video captioning, predicting the next frames of a video, and more.

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

VideoBERT Model