A STATISTICAL APPROACH TO LANGUAGE MODELLING FOR THE ATIS TASK
Abstract
The goal of this research is to develop an effective natural language component for IBM's spoken language understanding system for the ATIS domain. We use training data to assign a probability distribution to the reference interpretation, the NLParse, which minimizes the observed perplexity of the test data. We limit our scope to deal only with those ATIS2 sentences which can be understood unambiguously out of context (the so-called "Class A" queries). The decoder component of the finished system will use the natural language probabilities to select the most probable NLParse translations for a given English input. The NLParse translation can then be deterministically converted to SQL to query the ATIS database for the correct answer. We use a number of different deleted interpolation and maximum entropy techniques to improve on the standard trigram model, and we achieve a reduction in test perplexity from 15.9 to 14.1 bits per item.