Publication
Intelligent Data Analysis
Paper

Estimating performance gains for voted decision trees

View publication

Abstract

Decision tree induction is a prominent learning method, typically yielding quick results with competitive predictive performance. However, it is not unusual to find other automated learning methods that exceed the predictive performance of a decision tree on the same application. To achieve near-optimal classification results, resampling techniques can be employed to generate multiple decision-tree solutions. These decision trees are individually applied and their answers voted. The potential for exceptionally strong performance is counterbalanced by the substantial increase in computing time to induce many decision trees. We describe estimators of predictive performance for voted decision trees induced from bootstrap (bagged) or adaptive (boosted) resampling. The estimates are found by examining the performance of a single tree and its pruned subtrees over a single, training set and a large test set. Using publicly available collections of data, we show that these estimates are usually quite accurate, with occasional weaker estimates. The great advantage of these estimates is that they reveal the predictive potential of voted decision trees prior to applying expensive computational procedures. © 1998 Elsevier Science B.Y.

Date

Publication

Intelligent Data Analysis

Authors

Topics

Share