Language-Specific BERT
Learn about monolingual BERT models for various languages and gain a detailed understanding of the FlauBERT, the pre-trained BERT model for the French language.
The M-BERT model is used in many different languages. However, instead of having a single M-BERT model for many languages, can we train a monolingual BERT for a specific target language? We can, and that is precisely what we will learn in this lesson.
M-BERT for various languages
Below are several interesting and popular monolingual BERT models for various languages:
FlauBERT for French
BETO for Spanish
BERTje for Dutch
German BERT
Chinese BERT
Japanese BERT
FinBERT for Finnish
UmBERTo for Italian
BERTimbay for Portuguese
RuBERT for Russian
FlauBERT for French
The French Language Understanding via BERT (FlauBERT) model is a pre-trained BERT model for the French language. The FlauBERT model performs better than the multilingual and cross-lingual models on many downstream French NLP tasks.
FlauBERT is trained on a huge heterogeneous French corpus. The French corpus consists of 24 subcorpora containing data from various sources, including Wikipedia, books, internal crawling, WMT19 data, French text from OPUS (an open-source parallel corpus), and Wikimedia.
Get hands-on with 1400+ tech skills courses.