Sentence-BERT with a Triplet Network
Learn about the pre-trained Sentence-BERT models and their utilization of triplet network architecture for fine-tuning pre-trained BERT.
The Sentence-BERT uses the Siamese network architecture for fine-tuning the pre-trained BERT with the sentence pair inputs. Now, let's see how Sentence-BERT uses the triplet network architecture.
Computing similarity between three sentences
Suppose we have three sentences—an anchor sentence, a positive sentence (entailment), and a negative sentence (contradiction), as follows:
Our task is to compute a representation such that the similarity between the anchor and positive sentences should be high, and the similarity between the anchor and negative sentences should be low. Let's see how to fine-tune the pre-trained BERT model for this task. Since we have three sentences, in this case, Sentence-BERT uses the triplet network architecture.
First, we tokenize and feed the anchor, positive, and negative sentences to the three pre-trained BERT models and then obtain the representation of each of the sentences through pooling, as shown in the following figure:
As we can observe, in the preceding figure,