FAST COMPUTATION OF MAXIMUM ENTROPY/MINIMUM DIVERGENCE FEATURE GAIN
Abstract
Maximum entropy/minimum divergence modeling is a powerful technique for constructing probability models, which has been applied to a wide variety of problems in natural language processing. A maximum entropy/minimum divergence (MEMD) model is built from a base model, and a set of feature functions, also called simply features, whose empirical expectations on some training corpus are known. A fundamental difficulty with this technique is that while there are typically millions of features that could be incorporated into a given model, in general it is not computationally feasible, or even desirable, to use them all. Thus some means must be devised for determining each feature's predictive power, also known as its gain. Once the gains are known, the features can be ranked according to their utility, and only the most gainful ones retained. This paper presents a new algorithm for computing feature gain that is fast, accurate and memory-efficient.