Configurations of BERT
Learn about some BERT configurations.
We'll cover the following...
Standard configurations of BERT
The researchers of BERT have presented the model in two standard configurations:
BERT-base
BERT-large
Let's take a look at each of these in detail.
BERT-base
BERT-base consists of 12 encoder layers, each stacked one on top of the other. All the encoders use 12 attention heads. The feedforward network in the ...