Use of direct modeling in natural language generation for Chinese and English translation
Abstract
This paper proposes a new direct-modeling-based approach to improve a maximum entropy based natural language generation (NLG) in the IBM MASTOR system, an interlingua-based speech translation system. Due to the intrinsic disparity between Chinese and English sentences, the previous method employed only linguistic constituents from output language sentences to train the NLG model. The new algorithm exploits a direct-modeling scheme to admit linguistic constituent information from both source and target languages into the training process seamlessly when incorporating a concept padding scheme. When concept sequences from the top level of semantic parse trees are considered, the concept error rate (CER) is significantly reduced to 14.3%, compared to 23.9% in the baseline NLG. Similarly, when concept sequences from all levels of semantic parse trees are tested, the direct-modeling scheme yields a CER of 10.8% compared to 17.8% in the baseline. A sensible improvement on the overall translation is made when the direct-modeling scheme improves the BLEU score from 0.252 to 0.294. ©2004 IEEE.