Domain-Specific BERT
Learn about the domain-specific models and get detailed insight into ClinicalBERT by fine-tuning it for patient re-admission prediction tasks.
We'll cover the following...
We learned how BERT is pre-trained using the general Wikipedia corpus and how we can fine-tune and use it for downstream tasks. Instead of using the BERT that is pre-trained on the general Wikipedia corpus, we can also train BERT from scratch on a domain-specific corpus. This helps the BERT model to learn embeddings specific to a domain, and it also helps in learning the domain-specific vocabulary that may not be present in the general Wikipedia corpus.
We will look into two interesting domain-specific BERT models:
ClinicalBERT
BioBERT
We will learn how these models are pre-trained and how we can fine-tune them for downstream tasks.
ClinicalBERT
ClinicalBERT is a clinical domain-specific BERT pre-trained on a large clinical corpus. The clinical notes or progress notes contain very useful information about the patient. They include a record of patient visits, their symptoms, diagnosis, daily activities, observations, treatment plans, results of radiology, and many more. Understanding the contextual representation of clinical notes is challenging since they follow their own grammatical structure and use unique abbreviations and jargon. So, we pre-train ClinicalBERT with many clinical documents to understand the contextual representation of the clinical text. So, how is ClinicalBERT useful?
Uses of ClinicalBERT
The representation learned by ClinicalBERT can help us understand many clinical insights, summarize clinical notes, understand relationship between diseases and treatment measures, and much more. Once pre-trained, ClinicalBERT can be used for a variety of downstream tasks, such as readmission prediction, length of stay, mortality risk estimation, diagnosis prediction, and more.
Pre-training ClinicalBERT
ClinicalBERT is pre-trained using the MIMIC-III clinical notes. MIMIC-III is a large collection of health-related data from the Beth Israel Deaconess Medical Center. It includes a health-related dataset that observed over 40,000 patients who stayed in the ICU. ClinicalBERT is pre-trained using the masked language modeling and next sentence prediction tasks, just like how we pre-trained the BERT model, as shown in this figure:
As shown in the preceding figure, we feed two sentences with a masked word to our model and train the model to predict the masked word, as well as whether the second sentence would follow the first sentence. After pre-training, we can use our pre-trained model for any downstream tasks. Let's look into how to fine-tune the pre-trained ClinicalBERT.
Fine-tuning ClinicalBERT
After pre-training, we can fine-tune ClinicalBERT for a variety of downstream tasks, such as re-admission prediction, length of stay, mortality risk estimation, diagnosis prediction, and many more.
Suppose we fine-tune the pre-trained ClinicalBERT for the readmission prediction task. In the readmission prediction task, the goal of our model is to predict the probability of a patient being readmitted to the hospital within the next 30 days. As shown in the following figure, we feed the clinical notes to the pre-trained ClinicalBERT, and it returns the representation of the clinical notes. Then, we take the representation of the [CLS] token and feed it to a classifier (feedforward + sigmoid activation function), and the classifier returns the probability of the patient being readmitted within the next 30 days:
Wait! We know that in the BERT model, the maximum token length is 512. How can we make predictions when the clinical notes of a patient consist of more than 512 tokens? In that case, we can split the clinical notes (long sequence) into several subsequences. Now, we feed each subsequence to our model and ...