Chinese BERT

Learn about the Chinese BERT model, its training dataset, and pre-training, as well as explore the language technology platform along with its uses.

Along with M-BERT, Google Research has also open sourced the Chinese BERT model. The configuration of the Chinese BERT model is the same as the vanilla BERT-base model. It consists of 12 encoder layers, 12 attention heads, and 768 hidden units with 110 million parameters. The pre-trained Chinese BERT model can be downloaded from GitHub.

We can use the pre-trained Chinese BERT model with the transformers library, as shown here:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModel.from_pretrained("bert-base-chinese")

Now, let's look into another Chinese BERT model described in the paper "Pre-Training with Whole Word Masking for Chinese BERTCui, ...