...

/

Sentence-BERT with a Triplet Network

Sentence-BERT with a Triplet Network

Learn about the pre-trained Sentence-BERT models and their utilization of triplet network architecture for fine-tuning pre-trained BERT.

The Sentence-BERT uses the Siamese network architecture for fine-tuning the pre-trained BERT with the sentence pair inputs. Now, let's see how Sentence-BERT uses the triplet network architecture.

Computing similarity between three sentences

Suppose we have three sentences—an anchor sentence, a positive sentence (entailment), and a negative sentence (contradiction), as follows:

Our task is to compute a representation such that the similarity between the anchor and positive sentences should be high, and the similarity between the anchor and negative sentences should be low. Let's see how to fine-tune the pre-trained BERT model for this task. Since we have three sentences, in this case, Sentence-BERT uses the triplet network architecture.

First, we tokenize and feed the anchor, positive, and negative sentences to the three pre-trained BERT models and then obtain the representation of each of the sentences through pooling, as shown in the following figure:

Press + to interact
Computing sentence representation for anchor, positive and negative sentences
Computing sentence representation for anchor, positive and negative sentences

As we can observe, in the preceding figure, SaS_a , Sp ...