BERTje for Dutch

Learn about BERTje and how to use it for the next sentence prediction task.

BERTje is the pre-trained monolingual BERT model for the Dutch language from the University of Groningen. The BERTje model is pre-trained using MLM and sentence order prediction (SOP) tasks with whole word masking (WWM).

The BERTje model is trained using several Dutch corpora, including TwNC (a Dutch news corpus), SoNAR-500 (a multi-genre reference corpus), Dutch Wikipedia text, web news, and books. The model has been pre-trained for about 1 ...