´╗┐Putting Text-Level Linguistics into Statistical Machine Translation

Andrei Popescu-Belis
Idiap Research Institute; Swiss Federal Institute of Technology in Lausanne

Statistical machine translation systems are quite successful at translating individual sentences, in particular when sufficient training data is available for a given language pair. However, these systems do not yet take advantage of the relationships between the sentences of a text, and hence do not yet ensure a coherent translation of entire texts. In this talk, I will show how to make available text-level linguistic knowledge to a phrase-based statistical machine translation system. This approach, which has been pioneered by a Swiss-based consortium of linguists collaborating with language engineers, will be exemplified on three types of phenomena: discourse connectives, verb tenses, and noun phrases. I will explain how theoretical and data-driven linguistic modeling has guided the design of automatic labeling modules, which enabled machine translation systems to generate more coherent translations of entire texts.