Evaluating Machine Translation with BLEU

Learn to determine the performance of machine translation using the BLEU method.

BLEU provides a method to evaluate candidate translations produced by machine translation models.

Papineni et al. (2002) came up with an efficient way to evaluate a human translation in the paper "BLEU: a Method for Automatic Evaluation of Machine Translation."The paper can be accessed at: https://aclanthology.org/P02-1040.pdf In this paper, they named their method the Bilingual Evaluation Understudy Score (BLEU).

BLEU is a French word for ‘blue’

The human baseline was difficult to define. However, they realized that we could obtain efficient results if we compared human translation with machine translation, word for word.

In this lesson, we will use the Natural Language Toolkit (NLTK) to implement BLEU.

We will begin with geometric evaluations.

Geometric evaluations

The BLEU method compares the parts of a candidate sentence to a reference sentence or several reference sentences.

The program imports the nltk module:

Get hands-on with 1400+ tech skills courses.