Summary: Working with VideoBERT, BART, and More

Let’s summarize what we have learned so far.

We'll cover the following

Key highlights

Summarized below are the main highlights of what we have learned in this chapter.

  • We started off by learning how VideoBERT works. We learned how VideoBERT is pre-trained by predicting the masked language and visual tokens. We also learned that VideoBERT's final pre-training objective function is the weighted combination of text-only, video-only, and text-video methods.

  • Later, we explored different applications of VideoBERT.

  • Then, we learned that BART is essentially a transformer model with an encoder and a decoder. We feed corrupted text to the encoder, and the encoder learns the representation of the given text and sends the representation to the decoder. The decoder takes the representation produced by the encoder and reconstructs the original uncorrupted text. We also saw that BART uses a bidirectional encoder and a unidirectional decoder.

Get hands-on with 1400+ tech skills courses.