Introducing statistical machine translation
EBMT paved the way for data-driven approaches, where the primary source of knowledge is the observed data. As a result, less emphasis is given to the representation logic, such as creating hand-crafted rules. Instead, analyzing the data directly, especially when there’s a large amount of it, can reveal information we couldn’t easily identify otherwise. RBMT techniques follow a top-down approach, and domain experts are required to create models that can replicate the data. Conversely, data-driven approaches are bottom-up, and the data derives the model. This section focuses on statistical machine translation (SMT), which involves exploiting models whose parameters are learned from bilingual text corpora. Strictly speaking, SMT systems do not follow the Vauquois triangle as neither a source nor a target representation is incorporated. Intuitively, they work on the assumption that every sentence in one language can be translated...