Reinforcing language model for speech translation with auxiliary data
Abstract
Language model domain adaption usually uses a large quantity of auxiliary data in different genres and domains. It has mostly been relying on scoring functions for selection and it is typically independent of intended applications such as machine translation. In this paper, we present a novel domain adaptation approach that is directly motivated by the need of translation engine. We first identify interesting phrases by examining phrase translation tables, and then use those phrases as anchors to select useful and relevant sentences from general domain data, with the goal of improving domain coverage or providing additional contextual information. The experimental results on Farsi to English translation in military force protection domain and Chinese to English translation in travel domain show statistical significant gain using the reinforced language models over the baseline. © 2009 IEEE.