Impact of bucketing on performance of linearly interpolated language models
Abstract
N-gram models are used to model language in various applications. For large vocabularies, even a very large corpus is insufficient to estimate a raw ratioof-counts trigram model. One common way to overcome this problem is by linear interpolation of the trigram model with lower order models. The interpolation weights can be varied as a function of the current history, to reflect the confidence we have in the estimates of various orders. Since the number of histories is large we cannot hope to estimate a set of weights for each history. Thus sets of histories are tied together and the same weights are used for all histories within the set. In this paper we study the effect of the algorithm used to tie together the various histories. We report word error rate (WER) results on a large-vocabulary speech recognition task.