Introduction ADWIN-DT Decision Tree Experiments Conclusions
Learning Decision Trees Adaptively from Data Streams with Time Drift - - PowerPoint PPT Presentation
Learning Decision Trees Adaptively from Data Streams with Time Drift - - PowerPoint PPT Presentation
Introduction ADWIN-DT Decision Tree Experiments Conclusions Learning Decision Trees Adaptively from Data Streams with Time Drift Albert Bifet and Ricard Gavald LARCA: Laboratori dAlgorsmica Relacional, Complexitat i Aprenentatge
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Introduction: Data Streams
Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Introduction: Data Streams
Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Use a n-bit vector to memorize all the numbers (O(n) space)
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Introduction: Data Streams
Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Introduction: Data Streams
Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space. Store n(n + 1) 2 −
- j≤i
π−1[j].
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Data Streams
Data Streams At any time t in the data stream, we would like the per-item processing time and storage to be simultaneously O(logk(N, t)). Approximation algorithms Small error rate with high probability An algorithm (ǫ, δ)−approximates F if it outputs ˜ F for which Pr[|˜ F − F| > ǫF] < δ.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Data Streams Approximation Algorithms
Frequency moments Frequency moments of a stream A = {a1, . . . , aN}: Fk =
v
- i=1
f k
i
where fi is the frequency of i in the sequence, and k ≥ 0 F0: number of distinct elements on the sequence F1: length of the sequence F2: self-join size, the repeat rate, or as Gini’s index of homogeneity Sketches can approximate F0, F1, F2 in O(log v + log N) space. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximation the frequency moments. 1996
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Classification
Data set that describes e-mail features for deciding if it is spam. Example Contains Domain Has Time “Money” type attach. received spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day no yes cat no day yes Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Classification
Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Decision Trees
Basic induction strategy: A ← the “best” decision attribute for next node Assign A as decision attribute for node For each value of A, create new descendant of node Sort training examples to leaf nodes If training examples perfectly classified, Then STOP , Else iterate over new leaf nodes
Introduction ADWIN-DT Decision Tree Experiments Conclusions
VFDT / CVFDT
Very Fast Decision Tree: VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate
Introduction ADWIN-DT Decision Tree Experiments Conclusions
VFDT / CVFDT
Concept-adapting Very Fast Decision Trees: CVFDT
- G. Hulten, L. Spencer, and P
. Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct “alternative branches” as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Decision Trees: CVFDT
No theoretical guarantees on the error rate of CVFDT CVFDT parameters :
1
W: is the example window size.
2
T0: number of examples used to check at each node if the splitting attribute is still the best.
3
T1: number of examples used to build the alternate tree.
4
T2: number of examples used to test the accuracy of the alternate tree.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Decision Trees: ADWIN-DT
ADWIN-DT improvements consist in : replace frequency statistics counters by estimators
don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators
change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Summary:
1
Theoretical guarantees
2
No Parameters
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Time Change Detectors and Predictors: A General Framework
✲
xt Estimator
✲
Estimation
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Time Change Detectors and Predictors: A General Framework
✲
xt Estimator
✲
Estimation
✲ ✲
Alarm Change Detect.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Time Change Detectors and Predictors: A General Framework
✲
xt Estimator
✲
Estimation
✲ ✲
Alarm Change Detect. Memory
✲ ✻ ✻ ❄
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1 01010110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10 1010110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101 010110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010 10110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101 0110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010 110111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101 10111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011 0111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010110 111111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101101 11111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011011 1111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 101010110111 111
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 1010101101111 11
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Window Management Models
W = 101010110111111 Equal & fixed size subwindows 1010 1011011 1111 [Kifer+ 04] Equal size adjacent subwindows 1010101 1011 1111 [Dasu+ 06] Total window against subwindow 10101011011 1111 [Gama+ 04] ADWIN: All Adjacent subwindows 10101011011111 1 11
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 1 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 1 W1 = 01010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 10 W1 = 1010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 101 W1 = 010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 1010 W1 = 10110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 10101 W1 = 0110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 101010 W1 = 110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 1010101 W1 = 10111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 W0= 10101011 W1 = 0111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 |ˆ µW0 − ˆ µW1| ≥ ǫc : CHANGE DET.! W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 101010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN
Example W= 01010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪ {xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until |ˆ µW0 − ˆ µW1| ≥ ǫc holds 6 for every split of W into W = W0 · W1 7 Output ˆ µW
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Algorithm ADWIN [BG07]
ADWIN has rigorous guarantees (theorems) On ratio of false positives On ratio of false negatives On the relation of the size of the current window and change rates Other methods in the literature: [Gama+ 04], [Widmer+ 96], [Last 02] don’t provide rigorous guarantees.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Data Streams Algorithm ADWIN2 [BG07]
ADWIN2 using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W) cutpoints uses O(1
ǫ log W) memory words
the processing time per example is O(log W) (amortized and worst-case). Essentially same guarantees as ADWIN (up to a multiplicative O(. . .) factor depending on ǫ).
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Decision Trees: ADWIN-DT
ADWIN-DT improvements consist in : replace frequency statistics counters by ADWIN
don’t need a window to store examples, due to the fact that we maintain the statistics data needed with ADWINs
change the way of checking the substitution of alternate subtrees, using ADWIN as change detector Summary:
1
Theoretical guarantees
2
No parameters needed
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Experiments
10 12 14 16 18 20 22 24 1 2 3 4 5 6 7 8 9 1 1 1 1 3 3 1 5 5 1 7 7 1 9 9 2 2 1 2 4 3 2 6 5 2 8 7 3 9 3 3 1 3 5 3 3 7 5 3 9 7
Examples Error Rate (%)
ADWIN-DT CVFDT
Figure: Learning curve of SEA Concepts using continuous attributes
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Experiments
0,00 0,50 1,00 1,50 2,00 2,50 3,00 3,50 ADWIN-DT Det CVFDT w=1,000 CVFDT w=10,000 CVFDT w=100,000 ADWIN-DT Est ADWIN-DT Det+Est Memory (Mb) 1000 10000 100000
Figure: Memory used on SEA Concepts experiments
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Experiments
20 40 60 80 100 120 140 160 180 200 ADWIN-DT Det CVFDT w=1,000 CVFDT w=10,000 CVFDT w=100,000 ADWIN-DT Est ADWIN-DT Det+Est Time (sec)
1000 10000 100000
Figure: Time on SEA Concepts experiments
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Experiments
10% 12% 14% 16% 18% 20% 22% 1.000 5.000 10.000 15.000 20.000 25.000 CVFDT Window Width On-line Error CVFDT ADWIN-DT
Figure: On-line error on UCI Adult dataset, ordered by the education attribute.
Introduction ADWIN-DT Decision Tree Experiments Conclusions
Conclusions
ADWIN-DT improvements consist in : replace frequency statistics counters by ADWIN
don’t need a window to store examples, due to the fact that we maintain the statistics data needed with ADWINs
change the way of checking the substitution of alternate subtrees, using ADWIN as change detector Summary:
1
Theoretical guarantees
2
No parameters needed
3
Higher accuracy
4