Sentence-BERT
Learn about Sentence-BERT, its fine-tuning architectures, and multiple ways to compute sentence representation.
Sentence-BERT was introduced by the Ubiquitous Knowledge Processing Lab (UKP-TUDA). As the name suggests, Sentence-BERT is used for obtaining fixed-length sentence representations. Sentence-BERT extends the pre-trained BERT model (or its variants) to obtain sentence representation.
Why do we need Sentence-BERT when we can use vanilla BERT or its variants for obtaining sentence representations?
Sentence-BERT is popularly used in tasks such as sentence pair classification, computing similarity between two sentences, and so on. Before understanding how Sentence-BERT works in detail, first, let's take a look at computing sentence representation using the pre-trained BERT model.
Computing sentence representation
Consider the sentence 'Paris is a beautiful city'. Suppose we need to compute the representation of the given sentence. First, we will tokenize the sentence and add a [CLS] token at the beginning and a [SEP] token at the end, so our tokens become the following:
Now, we feed the tokens to the pre-trained BERT, and it will return the representation