Context dependent word modeling for statistical machine translation using part-of-speech tags
Abstract
Word based translation models in particular and phrase based translation models in general assume that a word in any context is equivalent to the same word in any other context. Yet, this is not always true. The words in a sentence are not generated independently. The usage of each word is strongly affected by its immediate neighboring words. The state-of-the-art machine translation (MT) methods use words and phrases as basic modeling units. This paper introduces Context Dependent Words (CDWs)1 as the new basic translation units. The context classes are defined using Part-of-Speech (POS) tags. Experimental results using CDW based language models demonstrate encouraging improvements in the translation quality for the translation of dialectal Arabic to English. Analysis of the results reveals that improvements are mainly in fluency.