Boosting and combination of classifiers for natural language call routing systems
Abstract
In this paper, we present different techniques to improve natural language call routing. We first describe methods to improve a single classifier: boosting, discriminative training (DT) and automatic relevance feedback (ARF). An interesting feature of some of these algorithms is the ability to re-weight the training data in order to focus the classifier on documents judged difficult to classify. We explore ways of deriving and combining uncorrelated classifiers in order to improve accuracy; we discuss specifically the linear interpolation and the constrained minimization techniques. All these approaches are probabilistic and are inspired from the information retrieval domain. They are evaluated using two similarity metrics, a common cosine measure from the vector space model, and a beta measure which had given good results in the similar task of e-mail steering. Compared to the baseline classifiers, we show an interesting improvement in the classification accuracy on call routing for a banking task: Up to 20% reported for the ARF method, up to 30% for the boosting technique, and more than 45% for the DT approach. Another relative improvement of 11% is also obtained when we combine the classifiers with the constrained minimization approach using a confusion measure and DT. More importantly, synergistic effects of DT on the boosting algorithm were demonstrated: More iterations were possible because DT reduced the classification error rate of individual classifiers trained on re-weighted data by an average of 72%. © 2003 Elsevier B.V. All rights reserved.