Key highlights

Summarized below are the main highlights of what we have learned in this chapter.

We began this chapter by understanding the basic idea of BERT. We learned that BERT can understand the contextual meaning of words and generate embeddings according to context, unlike context-free models such as word2vec, which generate embeddings irrespective of the context.
We looked into the workings of BERT. We understood that Bidirectional Encoder Representation from Transformer (BERT), as the name suggests, is basically the transformer model.
We looked into the different configurations of BERT. We learned that the BERT-base consists of 12 encoder layers, 12 attention heads, and 768 hidden units, while BERT-large consists of 24 encoder layers, 16 attention heads, and 1,024 hidden units.

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons

Summary: Understanding the BERT Model