...

ROUGE Evaluation Metrics

Learn about some ROUGE evaluation metrics to evaluate the text summarization task.

We'll cover the following...

ROUGE-N metric
ROUGE-1
ROUGE-2
ROUGE-L metric
- Computing F-measure
  - Computing recall for ROUGE-L
  - Computing precision for ROUGE-L
The performance of the BERTSUM model
- ROUGE score of the extractive summarization
- ROUGE score of the abstractive summarization

In order to evaluate a text summarization task, we use a popular set of metrics called Recall-Oriented Understudy for Gisting Evaluation (ROUGE). First, we will understand how the ROUGE metric works, and then we will check the ROUGE score for text summarization with the BERTSUM model.

The ROUGE metric was first introduced in the ROUGE paperLin, Chin-Yew. 2004. “ROUGE: A Package for Automatic Evaluation of Summaries,” July, 74–81.. The five different ROUGE evaluation metrics include the following:

ROUGE-N
ROUGE-L
ROUGE-W
ROUGE-S
ROUGE-SU

We will focus only on ROUGE-N and ROUGE-L. First, let's understand how ROUGE-N is computed, and then we will look at ROUGE-L.

Press + to interact

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Semantic Search with Transformers

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

Similarity Detection in English Language Using RoBERTa

ROUGE Evaluation Metrics

ROUGE-N metric

ROUGE-1