Concept Drift Detection the State-of-the-Art Shujian Yu, Ph.D. - PowerPoint PPT Presentation

Concept Drift Detection – the State-of-the-Art Shujian Yu, Ph.D. Candidate Department of Electrical and Computer Engineering yusjlcy9011@ufl.edu

Acknowledgements • Joint work with my supervisors/mentors: • Dr. Jose C. Principe Distinguished Professor at Department of ECE • Dr. Zubin Abraham Senior Data Mining Research Scientist at Robert Bosch Research Center, CA • Dr. Xiaoyang Wang Machine Learning Research Scientist at Nokia Bell Labs, NJ • Some contents were/will be presented in: • Bay Area Machine Learning Symposium (2016. 10) • SIAM International Conference on Data Mining (2017. 4) • Nokia Bell Labs (2017. 9) • International Joint Conference on Artificial Intelligence (2018.7) • …

Acknowledgements • Related publications • Yu, Shujian, and Zubin Abraham. “Concept drift detection with hierarchical hypothesis testing .” In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 768-776. Society for Industrial and Applied Mathematics, 2017. • Yu, Shujian, Xiaoyang Wang, and José C. Prıncipe . “Request -and- Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels .” In Proceedings of the 2018 International Joint Conference on Artificial Intelligence, pp. 3033- 3039. • Yu, Shujian, etc. “Concept drift detection and adaptation with hierarchical hypothesis testing.” To appear in Journal of The Franklin Institute (under minor revision). • …

Background Examples of sources Call center records Sensor data Network traffic

Background • What are the applications? • Network monitoring and traffic engineering • Business: credit card transaction flows • Telecommunication call records • Challenges? • Infinite length • Concept drift several years later several years later Color 𝑧 𝑢 = 𝑔 1 ( 𝐘 𝑢 ) 𝑧 𝑢 = 𝑔 2 ( 𝐘 𝑢 ) 𝑧 𝑢 = 𝑔 3 ( 𝐘 𝑢 ) 𝐘 𝑢 = Price Size y 𝑢 = 1, like 0, dislike

Previous works and general framework • Drift Detection Method (DDM) Gama, Joao, Pedro Medas, Gladys Castillo, • error monitor + hypothesis testing and Pedro Rodrigues. "Learning with drift detection." In Brazilian Symposium on Artificial New data in the stream Intelligence , pp. 286- to be classified 295. Springer Berlin Heidelberg, 2004. Relearn a classifier is Make a prediction using drift is found current classifier Only single Make a decision on the EDDM statistic is STEPD occurence of drift evaluated and DDM- OCI … tracked.

Hierarchical Hypothesis Testing (HLFR) Framework • Hierarchical Hypothesis Testing (HHT) framework • HHT features two layers of hypothesis test: Layer-I outputs potential drift points, Layer-II reduce false alarms • Hierarchical Linear Four Rates (HLFR) is developed under HHT framework Hierarchical Hypothesis Testing Architecture Layer-I Hypothesis Testing Detection Results / Confirm Detection / Potential Detection / Classifier Restart the testing Information of drift update Layer-II Hypothesis Testing

Hierarchical Linear Four Rates (HLFR) Algorithm • Layer-I test: Linear Four Rates (LFR) test Predict 0 1 True NPV= 0 TN FN TN/(TN+FV) PPV= 1 FP TP TP/(FV+TP) geometrically weighted sum of Bernoulli random variables TNR= TPR= TN/(TN+FP) TP/(FN+TP) Monitor four rates (i.e., positive predictive rate, negative predictive rate, true positive rate and true negative rate) associated with the confusion matrix and ALARM loudly if there is any significant change.

Hierarchical Linear Four Rates (HLFR) Algorithm • Layer-II test: permutation test f { X t+1 , y t+1 } ... { X t-1 , y t-1 } ... zero-one loss: e { X t-2 , y t-2 } { X t+2 , y t+2 } { X t+N , y t+N } { X t-N , y t-N } Merge samples { X t-2 , y t-2 } { X t-N , y t-N } { X t+100 , y t+100 }{ X t+N , y t+N } ... { X t-1 , y t-1 } { X t-6 , y t-6 } { X t+3 , y t+3 } Resampling { X ? , y ? } ... { X ? , y ? } ... { X ? , y ? } ... { X ? , y ? } ... f 1 f 2 ... f 3 f P { X ? , y ? }... { X ? , y ? }... { X ? , y ? }... { X ? , y ? }... ... e 1 e P e 2 e 3 H 0 : false decision H A : true decision

Conclusions • A novel Hierarchical Hypothesis Testing (HHT) framework is developed for concept drift detection. • Hierarchical Linear Four Rates (HLFR) is designed under HHT framework • HLFR significantly outperforms benchmark approaches in terms of accuracy, G-mean, recall, delay of detection. • Perfect? No! • Let us continue …

Concept drift detection in the context of expensive labels: methods and applications

Recall the general framework New data in 𝐘 𝑢 the stream to • General framework be classified 𝑔 • “indicator” monitoring + hypothesis test Make a Relearn a prediction classifier if drift using current • State of the art is found classifier • Supervised 𝑧 𝑢 𝑔 + re-training strategy 𝑜𝑓𝑥 Make a • HLFR , STEPD, etc. decision on the occurrence of • Unsupervised A single indicator drift is evaluated and + active training strategy tracked. • MD3, CDBD, etc. supervised indicator : classification error, confusion matrix, etc. unsupervised indicator : margin density, classification score divergence, etc. • Limitations and motivations • Expensive labels --> Accurate detection with minimum labels • Multi-class streaming data --> Explicit handle multi-class 12 scenario

Our methods • A novel Hierarchical Hypothesis Testing (HHT) framework • HHT features two layers of hypothesis test: Layer-I outputs potential drift points, Layer-II reduce false alarms Hierarchical Hypothesis Testing Unsupervised Architecture manner Layer-I Hypothesis Testing { } 𝐘 𝑢 Detection Results / Classifier update Confirm Detection / Potential Detection / Restart the testing Information of drift Layer-II Hypothesis Testing Labels request y 𝑢 { } 13

Our methods 14

Our methods 𝑔 𝐵 Set A Set B 𝑔 𝐶 Merge samples Set A U Set B H0: false decision HA: true decision 15

Our methods Illustration of the one-dimensional Kolmogorov – Smirnov (KS) statistic. Red and blue lines each correspond to an empirical distribution function, and the black arrow is the two-sample KS statistic. . 16

Our methods [1] Peacock, J. A. "Two-dimensional goodness-of-fit testing in astronomy." Monthly Notices of the Royal Astronomical Society, vol. 202, no. 3, pp: 615-627, 1983. 17

Results • Public available data • UG-2C-2D: Two Bi-dimensional unimodal Gaussian Classes Precision-Range curve 1 HLFR HLFR 0.9 LFR DDM 0.8 HHT with uncertainty LFR supervised HHT with KS test 0.7 MD3 CDBD 0.6 Precision 0.5 DDM 0.4 0.3 HHT-UM 0.2 0.1 HHT-AG 0 50 100 150 200 250 unsupervised Detection Range MD3 Recall-Range curve 1 HLFR 0.9 LFR CDBD DDM 0.8 HHT with uncertainty HHT with KS test 0.7 The red columns denote the ground truth of drift points, the blue MD3 CDBD 0.6 columns represent the histogram of detected drift points generated Recall 0.5 from 100 Monte-Carlo simulations. 0.4 Our HHT methods (4th and 5th row) provide consistently superior 0.3 performance than state-of-the-art unsupervised methods. Besides, it is 0.2 interesting to find that HHT-UM is even better than the benchmark 0.1 18 0 supervised method. 50 100 150 200 250 Detection Range

Real applications 19

Real applications 20

Real applications • Analysis of encrypted wireless video stream • In collaboration with New York University, Columbia University and Nokia Bell Labs. • As the initial step, NYU identified the three buffer status to classify: Filling the Buffer (F) vs. Steady (S) vs. Draining the Buffer (D). • However, when the network conditions is compromised, the buffer status could become “ugly”. It brings down the performance of classifiers. 21

Real applications • Analysis of encrypted wireless video stream • Concept Drift : detect the “good” to “congested” drift of network condition, and apply a different classifier for a different network condition. 22

Future work • Open toolbox to support various state-of-the-art concept drift detection methods • 13 methods in total. • Matlab and R • 2019 Spring • Improve Hoeffding’s inequality • Relax i.i.d. assumption 23

Thank you!

Concept Drift Detection the State-of-the-Art Shujian Yu, Ph.D. - PowerPoint PPT Presentation

Concept Drift Detection the State-of-the-Art Shujian Yu, Ph.D. Candidate Department of Electrical and Computer Engineering yusjlcy9011@ufl.edu Acknowledgements Joint work with my supervisors/mentors: Dr. Jose C. Principe

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

Genetic drift (two types) Genetic drift: changes in allele frequencies due to chance. Founder

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Concept Drift: Learning on Data Streams Pdraig Cunningham Director Insight @ UCD PI @ CeADAR

Recurrent Concept Drift in Data Streams YUN SING KOH ykoh@cs.auckland.ac.nz

Implications of long drift Filippo Resnati (CERN) Module of Opportunity for DUNE - BNL - 12 th

Concept Drift Albert Bifet March 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

Online advertisement blocker detection: A look at the state of the art for counter-detection and

Overview of Presentation Public Art Definitions Why is Public Art Important ? Percent for Art

ART OF CHANGE 21 PRSENTATION 2 ART OF CHANGE 21 ABOUT US Art of Change 21 works in the field

2017 Lynn Canal (District 15) Commercial Drift Gillnet Fishery Season Summary Mark Sogge Area

2016 Lynn Canal (District 15) Commercial Drift Gillnet Fishery Season Summary Mark Sogge Area

Random genetic drift Genetic drift and mutation balance Population size is an important number

Backside Illuminated Drift Backside Illuminated Drift Silicon Photomultiplier Silicon

Drift cage electrical elements production Drift cage electrical elements production and QA and

Overview on S-Box Design Principles Debdeep Mukhopadhyay Assistant Professor Department of

Successive Integer-Forcing and its Sum-Rate Optimality Or Ordentlich Joint work with Uri Erez

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random

Pr(all rolls are not 5s) of how language can be challenging for any student learning

Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and

Online Topology Inference from Streaming Stationary Graph Signals Rasoul Shafipour Dept. of

E [ X ] = X 1 ( a ) := { | X ( ) = a } . a Pr [ X = a ] . 3. Important

Chapter 2 Discrete Random Variables Peng-Hua Wang Graduate Institute of Communication