Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams ⋆
Ying Yang1, Xindong Wu2, and Xingquan Zhu2
1 School of Computer Science and Software Engineering,
Monash University, Melbourne, VIC 3800, Australia, yyang@csse.monash.edu.au
2 Department of Computer Science,
University of Vermont, Burlington, VT 05405, USA {xwu,xqzhu}@cs.uvm.edu
- Abstract. Prediction in streaming data is an important activity in the
modern society. Two major challenges posed by data streams are (1) the data may grow without limit so that it is difficult to retain a long his- tory of raw data; and (2) the underlying concept of the data may change
- ver time. The novelties of this paper are in four folds. First, it uses
a measure of conceptual equivalence to organize the data history into a history of concepts. This contrasts to the common practice that only keeps recent raw data. The concept history is compact while still retains essential information for learning. Second, it learns concept-transition patterns from the concept history and anticipates what the concept will be in the case of a concept change. It then proactively prepares a pre- diction model for the future change. This contrasts to the conventional methodology that passively waits until the change happens. Third, it incorporates proactive and reactive predictions. If the anticipation turns
- ut to be correct, a proper prediction model can be launched instantly
upon the concept change. If not, it promptly resorts to a reactive mode: adapting a prediction model to the new data. Finally, an efficient and effective system RePro is proposed to implement these new ideas. It carries out prediction at two levels, a general level of predicting each
- ncoming concept and a specific level of predicting each instance’s class.
Experiments are conducted to compare RePro with representative exist- ing prediction methods on various benchmark data sets that represent diversified scenarios of concept change. Empirical evidence offers inspir- ing insights and demonstrates the proposed methodology is an advisable solution to prediction in data streams. Key Words: Data Stream; Concept Change; Classification; Proactive Learning; Reactive Learning; Conceptual Equivalence.
⋆ A preliminary and shorter version of this paper has been published in the Proceedings
- f the 11th ACM SIGKDD International Conference on Knowledge Discovery and