In order to evaluate a text summarization task, we use a popular set of metrics called Recall-Oriented Understudy for Gisting Evaluation (ROUGE). First, we will understand how the ROUGE metric works, and then we will check the ROUGE score for text summarization with the BERTSUM model.

The ROUGE metric was first introduced in the ROUGE paperLin, Chin-Yew. 2004. “ROUGE: A Package for Automatic Evaluation of Summaries,” July, 74–81.. The five different ROUGE evaluation metrics include the following:

  • ROUGE-N

  • ROUGE-L

  • ROUGE-W

  • ROUGE-S

  • ROUGE-SU

We will focus only on ROUGE-N and ROUGE-L. First, let's understand how ROUGE-N is computed, and then we will look at ROUGE-L.

Get hands-on with 1400+ tech skills courses.