Measuring translation performance
The most straightforward way to evaluate an MT system is to ask humans (preferably, professional translators) to assign a score to each output. However, this leads to other problems, which include the subjectiveness of the evaluator, the number of sentences that can be assessed, potential costs, and so forth. As in every machine learning task, we can incorporate automatic metrics to assess the quality of the output. Accuracy, precision, recall, and F-score were encountered in Chapter 2, Detecting Spam Emails, so let’s see how they can be incorporated to evaluate an MT system.
Consider the source phrase in English and in the rain your letters flow in the rivers, which has a reference translation in French of et sous la pluie tes lettres coulent dans les rivières. Let’s assume that the system outputs the prediction sous la pluie les lettres coulent dans la rivière, as illustrated in Figure 6.26:
Figure...