RuBERT for Russian

Learn about the RuBERT model for the Russian language and how it is trained by transferring knowledge from M-BERT.

RuBERT is the pre-trained BERT for the Russian language. RuBERT is trained differently from other BERT variants.

Pre-training the RuBERT model

RuBERT is trained by transferring knowledge from M-BERT. We know that M-BERT is trained on Wikipedia text of 104 languages and has good knowledge of each language. So, instead of training the monolingual RuBERT from scratch, we train it by obtaining knowledge from M-BERT. Before training, we initialize all the parameters of RuBERT with the parameters of the M-BERT model, except the word embeddings.

RuBERT is trained using Russian Wikipedia text and news articles. Subword Neural Machine Translation (Subword NMT) is used to segment text into subword units. That is, we create a subword vocabulary using the Subword NMT. RuBERT's subword vocabulary will consist of longer and more Russian words compared to the vocabulary of the M-BERT model.

Common words from M-BERT and RuBERT

There will be some words that occur in both the M-BERT vocabulary and the monolingual RuBERT vocabulary. So, we can take the embeddings of them directly.

Get hands-on with 1400+ tech skills courses.