...

/

BERTSUM for Extractive Summary

BERTSUM for Extractive Summary

Learn different ways to use BERTSUM to find the probability of a sentence being important enough to include in an extractive summary.

In extractive summarization, we create a summary by selecting only the important sentences from the given text. To perform extractive summarization, we obtain the representation of every sentence in the given text using a pre-trained BERT model.

Now let's see how to use BERTSUM in the following three ways:

  • BERTSUM with a simple classifier

  • BERTSUM with an inter-sentence transformer

  • BERTSUM with LSTM

BERTSUM with a classifier

We feed the representation of a sentence to a simple binary classifier, and the classifier tells us whether the sentence is important or not. That is, the classifier returns the probability of the sentence being included in the summary. The classification layer is often called the summarization layer. This is shown in the following figure:

Press + to interact
BERTSUM with a classifer
BERTSUM with a classifer

From the preceding figure, we can observe that we feed all the sentences from a given text to the pre-trained BERT model. The pre-trained BERT model will return the representation of each sentence, R1,R2,...,Ri,...,RnR_1, R_2,...,R_i,...,R_n. Then we feed the representation to a classifier (summarization layer). The classifier then returns the probability of the sentence being included in the summary.

For each sentence ii in the document, we will get the sentence representation RiR_i, and we feed the representation to the summarization layer, which returns the probability Y^i\hat{Y}_i of including the sentence in the summary:

From the preceding equation, we can observe that we are using a simple sigmoid classifier to obtain the probability Y^i\hat{Y}_i ...