Transformers
Learn about Hugging Face's transformers library and how to generate and extract the embeddings of the sentence and tokens using transformers.
Hugging Face's transformers
Hugging Face is an organization that is on the path of democratizing AI through natural language. Their open-source transformers library is very popular among the natural language processing (NLP) community. It is very useful and powerful for several NLP and Natural Language Understanding (NLU) tasks. It includes thousands of pre-trained models in more than 100 languages. One of the many advantages of the transformer’s library is that it is compatible with both PyTorch
and TensorFlow
.
We can install transformers directly using pip
as shown here:
pip install transformers==4.30.0
As we can see, we use transformers version 4.30.0. Now that we have installed transformers, let's get started.
Generating BERT embeddings
Consider the sentence 'I love Paris'. Let's see how to obtain the contextualized word embedding of all the words in the sentence using the pre-trained BERT model with Hugging Face's transformers
library.
Import the modules
Let's import the necessary modules:
from transformers import BertModel, BertTokenizerimport torch
model = BertModel.from_pretrained('bert-base-uncased')
Download and load the tokenizer
We download and load the tokenizer that was used to pre-train the bert-base-uncased
model:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
Now, let's see how to preprocess the input before feeding it to BERT.