Learning Decision Trees Adaptively from Data Streams with Time Drift - - PowerPoint PPT Presentation

learning decision trees adaptively from data streams with
SMART_READER_LITE
LIVE PREVIEW

Learning Decision Trees Adaptively from Data Streams with Time Drift - - PowerPoint PPT Presentation

Introduction ADWIN-DT Decision Tree Experiments Conclusions Learning Decision Trees Adaptively from Data Streams with Time Drift Albert Bifet and Ricard Gavald LARCA: Laboratori dAlgorsmica Relacional, Complexitat i Aprenentatge


slide-1
SLIDE 1

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Learning Decision Trees Adaptively from Data Streams with Time Drift

Albert Bifet and Ricard Gavaldà

LARCA: Laboratori d’Algorísmica Relacional, Complexitat i Aprenentatge Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

September 2007

slide-2
SLIDE 2

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Introduction: Data Streams

Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number

slide-3
SLIDE 3

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Introduction: Data Streams

Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Use a n-bit vector to memorize all the numbers (O(n) space)

slide-4
SLIDE 4

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Introduction: Data Streams

Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space.

slide-5
SLIDE 5

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Introduction: Data Streams

Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space. Store n(n + 1) 2 −

  • j≤i

π−1[j].

slide-6
SLIDE 6

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Data Streams

Data Streams At any time t in the data stream, we would like the per-item processing time and storage to be simultaneously O(logk(N, t)). Approximation algorithms Small error rate with high probability An algorithm (ǫ, δ)−approximates F if it outputs ˜ F for which Pr[|˜ F − F| > ǫF] < δ.

slide-7
SLIDE 7

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Data Streams Approximation Algorithms

Frequency moments Frequency moments of a stream A = {a1, . . . , aN}: Fk =

v

  • i=1

f k

i

where fi is the frequency of i in the sequence, and k ≥ 0 F0: number of distinct elements on the sequence F1: length of the sequence F2: self-join size, the repeat rate, or as Gini’s index of homogeneity Sketches can approximate F0, F1, F2 in O(log v + log N) space. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximation the frequency moments. 1996

slide-8
SLIDE 8

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Classification

Data set that describes e-mail features for deciding if it is spam. Example Contains Domain Has Time “Money” type attach. received spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day no yes cat no day yes Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?

slide-9
SLIDE 9

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Classification

Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?

slide-10
SLIDE 10

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Decision Trees

Basic induction strategy: A ← the “best” decision attribute for next node Assign A as decision attribute for node For each value of A, create new descendant of node Sort training examples to leaf nodes If training examples perfectly classified, Then STOP , Else iterate over new leaf nodes

slide-11
SLIDE 11

Introduction ADWIN-DT Decision Tree Experiments Conclusions

VFDT / CVFDT

Very Fast Decision Tree: VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate

slide-12
SLIDE 12

Introduction ADWIN-DT Decision Tree Experiments Conclusions

VFDT / CVFDT

Concept-adapting Very Fast Decision Trees: CVFDT

  • G. Hulten, L. Spencer, and P

. Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct “alternative branches” as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs

slide-13
SLIDE 13

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Decision Trees: CVFDT

No theoretical guarantees on the error rate of CVFDT CVFDT parameters :

1

W: is the example window size.

2

T0: number of examples used to check at each node if the splitting attribute is still the best.

3

T1: number of examples used to build the alternate tree.

4

T2: number of examples used to test the accuracy of the alternate tree.

slide-14
SLIDE 14

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Decision Trees: ADWIN-DT

ADWIN-DT improvements consist in : replace frequency statistics counters by estimators

don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators

change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Summary:

1

Theoretical guarantees

2

No Parameters

slide-15
SLIDE 15

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Time Change Detectors and Predictors: A General Framework

xt Estimator

Estimation

slide-16
SLIDE 16

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Time Change Detectors and Predictors: A General Framework

xt Estimator

Estimation

✲ ✲

Alarm Change Detect.

slide-17
SLIDE 17

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Time Change Detectors and Predictors: A General Framework

xt Estimator

Estimation

✲ ✲

Alarm Change Detect. Memory

✲ ✻ ✻ ❄

slide-18
SLIDE 18

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1 01010110111111

slide-19
SLIDE 19

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10 1010110111111

slide-20
SLIDE 20

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101 010110111111

slide-21
SLIDE 21

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010 10110111111

slide-22
SLIDE 22

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101 0110111111

slide-23
SLIDE 23

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010 110111111

slide-24
SLIDE 24

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101 10111111

slide-25
SLIDE 25

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011 0111111

slide-26
SLIDE 26

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010110 111111

slide-27
SLIDE 27

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101101 11111

slide-28
SLIDE 28

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011011 1111

slide-29
SLIDE 29

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010110111 111

slide-30
SLIDE 30

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101101111 11

slide-31
SLIDE 31

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Window Management Models

W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011011111 1 11

slide-32
SLIDE 32

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 1 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-33
SLIDE 33

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 1 W1 = 01010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-34
SLIDE 34

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 10 W1 = 1010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-35
SLIDE 35

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 101 W1 = 010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-36
SLIDE 36

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 1010 W1 = 10110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-37
SLIDE 37

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 10101 W1 = 0110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-38
SLIDE 38

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 101010 W1 = 110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-39
SLIDE 39

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 1010101 W1 = 10111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-40
SLIDE 40

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 W0= 10101011 W1 = 0111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-41
SLIDE 41

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 |ˆ µW0 − ˆ µW1| ≥ ǫc : CHANGE DET.! W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-42
SLIDE 42

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 101010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-43
SLIDE 43

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN

Example W= 01010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW

slide-44
SLIDE 44

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Algorithm ADWIN [BG07]

ADWIN has rigorous guarantees (theorems) On ratio of false positives On ratio of false negatives On the relation of the size of the current window and change rates Other methods in the literature: [Gama+ 04], [Widmer+ 96], [Last 02] don’t provide rigorous guarantees.

slide-45
SLIDE 45

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Data Streams Algorithm ADWIN2 [BG07]

ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W) cutpoints uses O(1

ǫ log W) memory words

the processing time per example is O(log W) (amortized and worst-case). Essentially same guarantees as ADWIN (up to a multiplicative O(. . .) factor depending on ǫ).

slide-46
SLIDE 46

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Decision Trees: ADWIN-DT

ADWIN-DT improvements consist in : replace frequency statistics counters by ADWIN

don’t need a window to store examples, due to the fact that we maintain the statistics data needed with ADWINs

change the way of checking the substitution of alternate subtrees, using ADWIN as change detector Summary:

1

Theoretical guarantees

2

No parameters needed

slide-47
SLIDE 47

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Experiments

10 12 14 16 18 20 22 24 1 2 3 4 5 6 7 8 9 1 1 1 1 3 3 1 5 5 1 7 7 1 9 9 2 2 1 2 4 3 2 6 5 2 8 7 3 9 3 3 1 3 5 3 3 7 5 3 9 7

Examples Error Rate (%)

ADWIN-DT CVFDT

Figure: Learning curve of SEA Concepts using continuous attributes

slide-48
SLIDE 48

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Experiments

0,00 0,50 1,00 1,50 2,00 2,50 3,00 3,50 ADWIN-DT Det CVFDT w=1,000 CVFDT w=10,000 CVFDT w=100,000 ADWIN-DT Est ADWIN-DT Det+Est Memory (Mb) 1000 10000 100000

Figure: Memory used on SEA Concepts experiments

slide-49
SLIDE 49

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Experiments

20 40 60 80 100 120 140 160 180 200 ADWIN-DT Det CVFDT w=1,000 CVFDT w=10,000 CVFDT w=100,000 ADWIN-DT Est ADWIN-DT Det+Est Time (sec)

1000 10000 100000

Figure: Time on SEA Concepts experiments

slide-50
SLIDE 50

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Experiments

10% 12% 14% 16% 18% 20% 22% 1.000 5.000 10.000 15.000 20.000 25.000 CVFDT Window Width On-line Error CVFDT ADWIN-DT

Figure: On-line error on UCI Adult dataset, ordered by the education attribute.

slide-51
SLIDE 51

Introduction ADWIN-DT Decision Tree Experiments Conclusions

Conclusions

ADWIN-DT improvements consist in : replace frequency statistics counters by ADWIN

don’t need a window to store examples, due to the fact that we maintain the statistics data needed with ADWINs

change the way of checking the substitution of alternate subtrees, using ADWIN as change detector Summary:

1

Theoretical guarantees

2

No parameters needed

3

Higher accuracy

4

Less space needed