Discovering unrevealed properties of probability estimation trees: On algorithm selection and performance explanation
Abstract
There has been increasing interest to design better probability estimation trees, or PETs, for ranking and probability estimation. Capable of generating class membership probabilities, PETs have been shown to be highly accurate and flexible for many difficult problems, such as cost-sensitive learning and matching skewed distributions. There are a large number of PET algorithms available, and about ten of them are well-known. This large number provides an advantage, but it also creates confusion in practice. One would ask "given a new dataset, which algorithm to choose and what performance to expect and not to expect? What are the reasons to explain either good or bad performance under different situations?" In this paper, we systematically, for the first time, answer these important questions by conducting a large-scale empirical comparison of five popular PETs by examining their A UC, MSE and error rate "learning curves" (instead of training-test split based cross-validation). Using the maximum AUC achieved by any of the evaluated probability estimation tree algorithms, we demonstrate that the preference of a probability estimation tree on different evaluation metrics can be accurately characterized by the "signal-noise separability" of the dataset, as well as some other observable statistics of the dataset explained further in the paper. Moreover, in order to understand their relative performance, many important and previously unrepealed properties of each PET's mechanism and heuristics are analyzed and evaluated. Importantly, a practical guide for choosing the most appropriate PET algorithm given a new data mining problem is provided. © 2006 IEEE.