The BLEU score – evaluating the machine translation systems
BLEU stands for Bilingual Evaluation Understudy and is a way of automatically evaluating machine translation systems. This metric was first introduced in the paper BLEU: A Method for Automatic Evaluation of Machine Translation, Papineni and others, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002: 311-318. We will be using an implementation of the BLEU score found at https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py. Let’s understand how this is calculated in the context of machine translation.
Let’s consider an example to learn the calculations of the BLEU score. Say we have two candidate sentences (that is, a sentence predicted by our MT system) and a reference sentence (that is, the corresponding actual translation) for some given source sentence:
- Reference 1: The cat sat on the mat
- Candidate...