...

/

The Cross-Lingual Language Model (XLM)

The Cross-Lingual Language Model (XLM)

Learn about the XLM model, including its training dataset, different pre-training strategies, and the process for both pre-training and evaluation.

The M-BERT model is pre-trained just like the regular BERT model, without any specific cross-lingual objective. In this lesson, let's learn how to pre-train BERT with a cross-lingual objective. We refer to BERT trained with a cross-lingual objective as a cross-lingual language model (XLM). The XLM model performs better than M-BERT, and it learns cross-lingual representations.

Training dataset

The XLM model is pre-trained using the monolingual and parallel datasets. The parallel dataset consists of text in a language pair; that is, it consists of the same text in two different languages. Say we have an English sentence, and then we will have a corresponding sentence in another language, French, for example. We can call this parallel dataset a cross-lingual dataset.

Press + to interact

The monolingual dataset is obtained from Wikipedia, and the parallel dataset is obtained from several sources, including MultiUN (a multilingual corpus from United Nations documents), OPUS ( the open parallel corpus), and the IIT Bombay corpus. XLM uses byte pair encoding (BPE) and creates shared vocabulary across all languages.

Pre-training strategies

The XLM model is pre-trained using the following tasks:

  • Causal language modeling

  • Masked language modeling

  • Translation language modeling

Let's take a look at how each of the preceding tasks works.

Causal language modeling

Causal language modeling (CLM) is the simplest pre-training method. In CLM, the goal of our model is to predict the probability of a word given the previous set of words. It is represented as P(wtw1,w2, ...