Concept Drift Albert Bifet March 2012 COMP423A/COMP523A Data - PowerPoint PPT Presentation

Concept Drift Albert Bifet March 2012

COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern Mining 10. Distributed Streaming

Data Streams Big Data & Real Time

Data Mining Algorithms with Concept Drift. DM Algorithm DM Algorithm output input output input ✲ ✲ ✲ ✲ Estimator 5 Static Model Estimator 4 ✻ Estimator 3 Estimator 2 ✲ ✛ Estimator 1 Change Detect.

Introduction. Problem Given an input sequence x 1 , x 2 , · · · , x t we want to output at instant t an alarm signal if there is a distribution change and also a prediction � x t + 1 minimizing prediction error: | � x t + 1 − x t + 1 | Outputs ◮ an estimation of some important parameters of the input distribution, and ◮ a signal alarm indicating that distribution change has recently occurred.

Change Detectors and Predictors Estimation ✲ x t ✲ Estimator

Change Detectors and Predictors Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect.

Change Detectors and Predictors Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect. ✻ ✻ ❄ ✲ Memory

Concept Drift Evaluation Mean Time between False Alarms (MTFA) Mean Time to Detection (MTD) Missed Detection Rate (MDR) Average Run Length (ARL( θ )) The design of a change detector is a compromise between detecting true changes and avoiding false alarms.

Data Stream Algorithmics ◮ High accuracy in the prediction ◮ Low mean time to detection (MTD), false positive rate (FAR) and missed detection rate (MDR) ◮ Low computational cost: minimum space and time needed ◮ Theoretical guarantees ◮ No parameters needed Main properties of an optimal change detector and predictor system.

The CUSUM Test ◮ The cumulative sum (CUSUM algorithm), gives an alarm when the mean of the input data is significantly different from zero. ◮ The CUSUM test is memoryless, and its accuracy depends on the choice of parameters υ and h . g 0 = 0 , g t = max ( 0 , g t − 1 + ǫ t − υ ) if g t > h then alarm and g t = 0 Cumulative sum algorithm (CUSUM).

Page Hinckley Test ◮ The CUSUM test g 0 = 0 , g t = max ( 0 , g t − 1 + ǫ t − υ ) if g t > h then alarm and g t = 0 ◮ The Page Hinckley Test g 0 = 0 , g t = g t − 1 + ( ǫ t − υ ) G t = min ( g t ) if g t − G t > h then alarm and g t = 0

Geometric Moving Average Test ◮ The CUSUM test g 0 = 0 , g t = max ( 0 , g t − 1 + ǫ t − υ ) if g t > h then alarm and g t = 0 ◮ The Geometric Moving Average Test g 0 = 0 , g t = λ g t − 1 + ( 1 − λ ) ǫ t if g t > h then alarm and g t = 0 The forgetting factor λ is used to give more or less weight to the last data arrived.

Statistical test µ 1 ∈ N ( 0 , σ 2 0 + σ 2 µ 0 − ˆ ˆ 1 ) , under H 0 Example: Probability of false alarm of 5 %    | ˆ µ 0 − ˆ µ 1 |  = 0 . 05 Pr � > h σ 2 0 + σ 2 1 As P ( X < 1 . 96 ) = 0 . 975 the test becomes µ 1 ) 2 (ˆ µ 0 − ˆ > 1 . 96 2 σ 2 0 + σ 2 1

Concept Drift 6 sigma

Concept Drift concept 0.8 drift Drift level Error rate Warning level new window p min + s min 0 0 Number of examples processed (time) 5000 Statistical Drift Detection Method (Joao Gama et al. 2004)

ADWIN : Adaptive Data Stream Sliding Window Let W = 101010110111111 ◮ Equal & fixed size subwindows: 1010 1011011 1111 ◮ Equal size adjacent subwindows: 1010101 1011 1111 ◮ Total window against subwindow: 10101011011 1111 ◮ ADWIN: All adjacent subwindows: 1 01010110111111 1010 10110111111 1010101 10111111 1010101101 11111 10101011011111 1

Data Stream Sliding Window 101100011110101 0111010 Sliding Window We can maintain simple statistics over sliding windows, using ǫ log 2 N ) space, where O ( 1 ◮ N is the length of the sliding window ◮ ǫ is the accuracy parameter M. Datar, A. Gionis, P . Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002

Exponential Histograms M = 2 1010101 101 11 1 1 1 Content: 4 2 2 1 1 1 Capacity: 7 3 2 1 1 1 1010101 101 11 11 1 Content: 4 2 2 2 1 Capacity: 7 3 2 2 1 1010101 10111 11 1 Content: 4 4 2 1 Capacity: 7 5 2 1

Exponential Histograms 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 Error < content of the last bucket W / M ǫ = 1 / ( 2 M ) and M = 1 / ( 2 ǫ ) M · log ( W / M ) buckets to maintain the data stream sliding window

Exponential Histograms 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 To give answers in O ( 1 ) time, it maintain three counters L AST , T OTAL and V ARIANCE . M · log ( W / M ) buckets to maintain the data stream sliding window

Algorithm AD aptive Sliding WIN dow ADWIN : A DAPTIVE W INDOWING A LGORITHM 1 Initialize W as an empty list of buckets 2 Initialize WIDTH, VARIANCE and TOTAL 3 for each t > 0 4 do SET I NPUT ( x t , W ) 5 output ˆ µ W as TOTAL/WIDTH and ChangeAlarm SET I NPUT (item e, List W) 1 INSERT E LEMENT ( e , W ) 2 repeat DELETE E LEMENT ( W ) 3 until | ˆ µ W 0 − ˆ µ W 1 | < ǫ cut holds 4 for every split of W into W = W 0 · W 1

Algorithm AD aptive Sliding WIN dow INSERT E LEMENT (item e, List W) 1 create a new bucket b with content e and capacity 1 2 W ← W ∪ { b } (i.e., add e to the head of W ) 3 update WIDTH, VARIANCE and TOTAL 4 COMPRESS B UCKETS ( W ) DELETE E LEMENT (List W) 1 remove a bucket from tail of List W 2 update WIDTH, VARIANCE and TOTAL 3 ChangeAlarm ← true

Algorithm AD aptive Sliding WIN dow COMPRESS B UCKETS (List W) 1 Traverse the list of buckets in increasing order 2 do If there are more than M buckets of the same capacity 3 do merge buckets 4 COMPRESS B UCKETS (sublist of W not traversed)

Algorithm AD aptive Sliding WIN dow Theorem At every time step we have: 1. (False positive rate bound). If µ t remains constant within W, the probability that ADWIN shrinks the window at this step is at most δ . 2. (False negative rate bound). Suppose that for some partition of W in two parts W 0 W 1 (where W 1 contains the most recent items) we have | µ W 0 − µ W 1 | > 2 ǫ cut . Then with probability 1 − δ ADWIN shrinks W to W 1 , or shorter. ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters.

Algorithm AD aptive Sliding WIN dow ADWIN using a Data Stream Sliding Window Model, ◮ can provide the exact counts of 1’s in O ( 1 ) time per point. ◮ tries O ( log W ) cutpoints ◮ uses O ( 1 ǫ log W ) memory words ◮ the processing time per example is O ( log W ) (amortized and worst-case). Sliding Window Model 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1

Concept Drift Albert Bifet March 2012 COMP423A/COMP523A Data - PowerPoint PPT Presentation

Concept Drift Albert Bifet March 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent

Genetic drift (two types) Genetic drift: changes in allele frequencies due to chance. Founder

Concept Drift: Learning on Data Streams Pdraig Cunningham Director Insight @ UCD PI @ CeADAR

Recurrent Concept Drift in Data Streams YUN SING KOH ykoh@cs.auckland.ac.nz

2017 Lynn Canal (District 15) Commercial Drift Gillnet Fishery Season Summary Mark Sogge Area

2016 Lynn Canal (District 15) Commercial Drift Gillnet Fishery Season Summary Mark Sogge Area

Random genetic drift Genetic drift and mutation balance Population size is an important number

Implications of long drift Filippo Resnati (CERN) Module of Opportunity for DUNE - BNL - 12 th

Backside Illuminated Drift Backside Illuminated Drift Silicon Photomultiplier Silicon

Drift cage electrical elements production Drift cage electrical elements production and QA and

Surfing and Drift Acceleration of Surfing and Drift Acceleration of Electrons at High Mach Number

Double Drift in DUNE: Ideas for SP vertical drift CERN, 20/6/2019 F. Pietropaolo ThGEM-like

Positive ionic drift in T2K gas Final report Fred Hartjes NIKHEF Nikhef/Bonn LepCol meeting

Brownian motion with variable drift can have drift can have isolated zeros isolated zeros

Countering Language Drift with Seeded Iterated Learning Yuchen Lu Content Language Drift Problem

DRIFT Progress with DRIFT Mark Pipe Dark matter signals The WIMP wind Galaxy is

More Events CS 51 and CSCI E-51 April 5, 2014 . Road map The concept Using events

Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017

MA162: Finite mathematics . Jack Schmidt University of Kentucky December 3, 2012 Schedule:

Experiences of of La Landing Machine Le Learning onto Market-Scale Mobile Malware Detection

Information Theory and Software Testing David Clark David Clark IT and ST Papers Squeeziness: A

12. Classical statistics Andrej Bogdanov Estimators X = ( X 1 , , X n ) independent samples ^

1 2 Where in the World is Stepping Up? American Psychiatric Association (San Diego, Calif.) 3

Logistic regression on Sonar Machine Learning Toolbox Classification models Categorical