A fast online learning algorithm for distributed mining of bigdata
Abstract
BigData analytics require that distributed mining of numerous data streams is performed in real-time. Unique challenges associated with designing such distributed mining systems are: online adaptation to incoming data characteristics, online processing of large amounts of heterogeneous data, limited data access and communication capabilities between distributed learners, etc. We propose a general frameworkfor distributed data mining and develop an efficientonline learning algorithm based on this. Our frameworkconsists of an ensemble learner and multiple local learners, which can only access different parts of the incoming data. By exploiting the correlations of the learning models among local learners, our proposed learning algorithms can optimize the prediction accuracy while requiring significantly less information exchange and computational complexity than existing state-of-the-art learning solutions.