We learned how BERT is pre-trained using the general Wikipedia corpus and how we can fine-tune and use it for downstream tasks. Instead of using the BERT that is pre-trained on the general Wikipedia corpus, we can also train BERT from scratch on a domain-specific corpus. This helps the BERT model to learn embeddings specific to a domain, and it also helps in learning the domain-specific vocabulary that may not be present in the general Wikipedia corpus.

We will look into two interesting domain-specific BERT models:

ClinicalBERT
BioBERT

We will learn how these models are pre-trained and how we can fine-tune them for downstream tasks.

ClinicalBERT

ClinicalBERT is a clinical domain-specific BERT pre-trained on a large clinical corpus. The clinical notes or progress notes contain very useful information about the patient. They include a record of patient visits, their symptoms, diagnosis, daily activities, observations, treatment plans, results of radiology, and many more. Understanding the contextual representation of clinical notes is challenging since they follow their own grammatical structure and use unique abbreviations and jargon. So, we pre-train ClinicalBERT with many clinical documents to understand the contextual representation of the clinical text. So, how is ClinicalBERT useful?

Uses of ClinicalBERT

The representation learned by ClinicalBERT can help us understand many clinical insights, summarize clinical notes, understand relationship between diseases and treatment measures, and much more. Once pre-trained, ClinicalBERT can be used for a variety of downstream tasks, such as readmission prediction, length of stay, mortality risk estimation, diagnosis prediction, and more.

Pre-training ClinicalBERT

ClinicalBERT is pre-trained using the MIMIC-III clinical notes. MIMIC-III is a large collection of health-related data from the Beth Israel Deaconess Medical Center. It includes a health-related dataset that observed over 40,000 patients who stayed in the ICU. ClinicalBERT is pre-trained using the masked language modeling and next sentence prediction tasks, just like how we pre-trained the BERT model, as shown in this figure:

As shown in the preceding figure, we feed two sentences with a masked word to our model and train the model to predict the masked word, as well as whether the second sentence would follow the first sentence. After pre-training, we can use our pre-trained model for any downstream tasks. Let's look into how to fine-tune the pre-trained ClinicalBERT.

Fine-tuning ClinicalBERT

After pre-training, we can fine-tune ClinicalBERT for a variety of downstream tasks, such as re-admission prediction, length of stay, mortality risk estimation, diagnosis prediction, and many more.

Suppose we fine-tune the pre-trained ClinicalBERT for the readmission prediction task. In the readmission prediction task, the goal of our model is to predict the probability of a patient being readmitted to the hospital within the next 30 days. As shown in the following figure, we feed the clinical notes to the pre-trained ClinicalBERT, and it returns the representation of the clinical notes. Then, we take the representation of the [CLS] token and feed it to a classifier (feedforward + sigmoid activation function), and the classifier returns the probability of the patient being readmitted within the next 30 days:

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

Domain-Specific BERT

ClinicalBERT

Uses of ClinicalBERT

Pre-training ClinicalBERT

Fine-tuning ClinicalBERT