The BLEU Score: Evaluating Machine Translation Systems
Learn how the BLEU score is used to evaluate machine translation systems.
We'll cover the following
BLEU stands for “bilingual evaluation un.derstudy” and is a way of automatically evaluating machine translation systems. This metric was first introduced in the paper
Let’s consider an example to learn the calculations of the BLEU score. Say we have two candidate sentences (that is, sentences predicted by our MT system) and a reference sentence (that is, the corresponding actual translation) for some given source sentence:
Reference 1: The cat sat on the mat.
Candidate 1: The cat is on the mat.
To see how good the translation is, we can use the precision measure. Precision is a measure of how many words in the candidate are actually present in the reference. In general, if we consider a classification problem with two classes (denoted by negative and positive), precision is given by the following formula:
Get hands-on with 1400+ tech skills courses.