...

/

RuBERT for Russian

RuBERT for Russian

Learn about the RuBERT model for the Russian language and how it is trained by transferring knowledge from M-BERT.

RuBERT is the pre-trained BERT for the Russian language. RuBERT is trained differently from other BERT variants.

Pre-training the RuBERT model

RuBERT is trained by transferring knowledge from M-BERT. We know that M-BERT is trained on Wikipedia text of 104 languages and has good knowledge of each language. So, instead of training the monolingual RuBERT from scratch, we train it by obtaining knowledge from M-BERT. Before training, we initialize all the parameters of RuBERT with the parameters of the M-BERT model, except the word embeddings. ...