Japanese BERT
Explore how to use Japanese BERT models by understanding their tokenization methods with MeCab and WordPiece, the differences between subword and character splits, and how to apply pre-trained models for Japanese sentence representation using practical coding examples.
We'll cover the following...
We'll cover the following...
The Japanese BERT model is pre-trained using the Japanese Wikipedia text with WWM. We tokenize the Japanese texts using MeCab. MeCab is a morphological analyzer for Japanese text. After tokenizing with MeCab, we use the WordPiece tokenizer and obtain the subwords. Instead of using the WordPiece tokenizer and splitting the text into ...