Active mining of data streams

Wei Fan; Yi-an Huang; Haixun Wang; Philip S. Yu

doi:10.1137/1.9781611972740.46

Publication

SDM 2004

Conference paper

Active mining of data streams

SDM 2004

View publication

Abstract

Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are refreshed periodically, that is usually synchronized with data availability schedule. There are several undesirable consequences of this "passive periodic refresh". In this paper, we propose a new concept of demand-driven active data mining. It estimates the error of the model on the new data stream without knowing the true class labels. When significantly higher error is suspected, it investigates the true class labels of a selected number of examples in the most recent data stream to verify the suspected higher error.

Date

22 Apr 2004

Publication

SDM 2004

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share