...

/

ROUGE Evaluation Metrics

ROUGE Evaluation Metrics

Learn about some ROUGE evaluation metrics to evaluate the text summarization task.

In order to evaluate a text summarization task, we use a popular set of metrics called Recall-Oriented Understudy for Gisting Evaluation (ROUGE). First, we will understand how the ROUGE metric works, and then we will check the ROUGE score for text summarization with the BERTSUM model.

The ROUGE metric was first introduced in the ROUGE paperLin, Chin-Yew. 2004. “ROUGE: A Package for Automatic Evaluation of Summaries,” July, 74–81.. The five different ROUGE evaluation metrics include the following:

  • ROUGE-N

  • ROUGE-L

  • ROUGE-W

  • ROUGE-S

  • ROUGE-SU

We will focus only on ROUGE-N and ROUGE-L. First, let's understand how ROUGE-N is computed, and then we will look at ROUGE-L.

Press + to interact
ROUGE evaluation metrics
ROUGE evaluation metrics

ROUGE-N metric

ROUGE-N is an n-gram recall between a candidate summary (predicted summary) and a reference summary (actual summary).

The recall is defined as a ratio of the total number of overlapping n-grams between the candidate and reference summary to the total number of n-grams in the reference summary:

Let's understand how ROUGE-N works with ROUGE-1 and ROUGE-2.

ROUGE-1

ROUGE-1 is a unigram recall between a candidate summary (predicted summary) and a reference summary (actual summary). Consider the following candidate and reference summary:

Candidate ...