...

/

How Multilingual is Multilingual BERT?

How Multilingual is Multilingual BERT?

Learn whether the multilingual knowledge transfer of M-BERT depends on the vocabulary overlap.

We'll cover the following...

M-BERT is trained on the Wikipedia text of 104 different languages. M-BERT is also evaluated by fine-tuning it on the XNLI dataset. But how multilingual is our M-BERT? How is a single model able to transfer knowledge across multiple languages? To understand this, let's investigate the multilingual ability of M-BERT in more detail.

Effect of vocabulary overlap

M-BERT is trained on the Wikipedia text of 104 languages, and it consists of a shared vocabulary of 110k tokens. In this lesson, let's investigate whether the multilingual knowledge transfer of M-BERT depends on the vocabulary overlap.

Press + to interact

M-BERT is good at zero-shot transfer. That is, we can fine-tune M-BERT in one language and use the fine-tuned M-BERT model in other languages. Let's say we are performing a NER task. Suppose we fine-tune M-BERT for the NER task in the English language. We take this fine-tuned M-BERT and apply it to other languages—German, for example. But how is this possible? How is the ...