BioBERT
Learn about the BioBERT domain-specific BERT model and how to pre-train and fine-tune it for NER and question-answering tasks..
We'll cover the following...
As the name suggests, BioBERT is a biomedical domain-specific BERT model pre-trained on a large biomedical corpus. Since BioBERT understands biomedical domain-specific representations, once pre-trained, BioBERT performs better than the vanilla BERT on biomedical texts. The architecture of BioBERT follows the same as the vanilla BERT model. After pre-training, we can fine-tune BioBERT for many biomedical domain-specific downstream tasks, such as biomedical question answering, biomedical named entity recognition, and more.
Pre-training the BioBERT model
BioBERT is pre-trained using biomedical domain-specific texts. We use the biomedical datasets from the following two sources:
PubMed: This is a citation database. It includes more than 30 million citations for biomedical literature from life science journals, online books, and MEDLINE (an index of the biomedical journal, the National Library of Medicine).
PubMed Central (PMC): This is a free online repository that includes articles that have been published in biomedical and life sciences journals. ...