Active learning with minimum expected error for spoken language understanding
Abstract
Active learning is a strategy to minimize the annotation effort required to train statistical models, such as a statistical classifier used for natural language call routing or user intent classification. Most variants of active learning are "certainty-based;" they typically select, for human labeling, samples that are most likely to be mis-classified by automatic procedures. This approach, while selecting informative samples, completely disregards any interaction between "similar" samples - something that has recently been factored into active learning procedures to further reduce the labeling effort. In this paper we present a procedure, motivated by a recently proposed minimum expected error criterion for active learning, that also attempts to exploit the similarity between samples in an effort to maximize the gains that can be obtained from labeling a given number of samples. We evaluated the proposed algorithm on two natural language call routing tasks. On both the tasks a significant gain (up to 5% absolute for systems with over 80% accuracy) over baseline active learning was seen at small sample sizes. The gain, however, diminished with increasing sample sizes and no significant label saving was observed in achieving the maximum accuracy levels.