ROUGE Evaluation Metrics
Learn about some ROUGE evaluation metrics to evaluate the text summarization task.
In order to evaluate a text summarization task, we use a popular set of metrics called Recall-Oriented Understudy for Gisting Evaluation (ROUGE). First, we will understand how the ROUGE metric works, and then we will check the ROUGE score for text summarization with the BERTSUM model.
The ROUGE metric was first introduced in the
ROUGE-N
ROUGE-L
ROUGE-W
ROUGE-S
ROUGE-SU
We will focus only on ROUGE-N and ROUGE-L. First, let's understand how ROUGE-N is computed, and then we will look at ROUGE-L.
ROUGE-N metric
ROUGE-N is an n-gram recall between a candidate summary (predicted summary) and a reference summary (actual summary).
The recall is defined as a ratio of the total number of overlapping n-grams between the candidate and reference summary to the total number of n-grams in the reference summary:
Let's understand how ROUGE-N works with ROUGE-1 and ROUGE-2.
ROUGE-1
ROUGE-1 is a unigram recall between a candidate summary (predicted summary) and a reference summary (actual summary). Consider the following candidate and reference summary:
Candidate ...