The BLEU score – evaluating the machine translation systems
BLEU stands for Bilingual Evaluation Understudy and is a way of automatically evaluating machine translation systems. This metric was first introduced in the paper, BLEU: A Method for Automatic Evaluation of Machine Translation, Papineni and others, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002: 311-318. We will be implementing the BLEU score calculation algorithm and is available as an exercise in bleu_score_example.ipynb
. Let's understand how this is calculated.
Let's consider an example to learn the calculations of the BLEU score. Say, we have two candidate sentences (that is, a sentence predicted by our MT system) and a reference sentence (that is, corresponding actual translation) for some given source sentence:
Reference 1: The cat sat on the mat
Candidate 1: The cat is on the mat
To see how good the translation is, we can use one measure, precision. Precision...