Adaptive Learning and Mining for Data Streams and Frequent Patterns - PowerPoint PPT Presentation

Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Ph.D. dissertation, 24 April 2009 Advisors: Ricard Gavaldà and José L. Balcázar LARCA

Future Data Mining Future Data Mining Structured data Find Interesting Patterns Predictions On-line processing 2 / 59

Mining Evolving Massive Structured Data The basic problem Finding interesting structure on data Mining massive data Mining time varying data Mining on real time Mining structured data The Disintegration of Persistence of Memory 1952-54 Salvador Dalí 3 / 59

Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Approximation algorithms Small error rate with high probability An algorithm ( ε , δ ) − approximates F if it outputs ˜ F for which Pr [ | ˜ F − F | > ε F ] < δ . 4 / 59

Tree Pattern Mining Given a dataset of trees, find the complete set of frequent subtrees Frequent Tree Pattern (FT): Include all the trees whose support is no less than min_sup Closed Frequent Tree Pattern (CT): Trees are sanctuaries. Include no tree which has a Whoever knows how super-tree with the same to listen to them, support can learn the truth. CT ⊆ FT Herman Hesse 5 / 59

Outline Mining Evolving Tree Mining Mining Evolving Tree Data Streams Data Streams 6 Closure Operator 10 Incremental Framework 1 on Trees Method 2 ADWIN 7 Unlabeled Tree 11 Sliding Window Mining Methods Classifiers 3 Method 8 Deterministic MOA 4 12 Adaptive Method Association Rules ASHT 5 13 Logarithmic 9 Implicit Rules Relaxed Support 14 XML Classification 6 / 59

Outline Introduction 1 Mining Evolving Data Streams 2 Tree Mining 3 Mining Evolving Tree Data Streams 4 Conclusions 5 7 / 59

Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input output input output ✲ ✲ ✲ ✲ Counter 5 Static Model Counter 4 ✻ Counter 3 Counter 2 ✲ ✛ Counter 1 Change Detect. 8 / 59

Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input output input output ✲ ✲ ✲ ✲ Counter 5 Estimator 5 Counter 4 Estimator 4 Counter 3 Estimator 3 Counter 2 Estimator 2 Counter 1 Estimator 1 8 / 59

Time Change Detectors and Predictors (1) General Framework Problem Given an input sequence x 1 , x 2 ,..., x t ,... we want to output at instant t a prediction � x t + 1 minimizing prediction error: | � x t + 1 − x t + 1 | an alert if change is detected considering distribution changes overtime. 9 / 59

Time Change Detectors and Predictors (1) General Framework Estimation ✲ x t ✲ Estimator 10 / 59

Time Change Detectors and Predictors (1) General Framework Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect. 10 / 59

Time Change Detectors and Predictors (1) General Framework Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect. ✻ ✻ ❄ ✲ Memory 10 / 59

Optimal Change Detector and Predictor (1) General Framework High accuracy Fast detection of change Low false positives and false negatives ratios Low computational cost: minimum space and time needed Theoretical guarantees No parameters needed Estimator with Memory and Change Detector 11 / 59

Algorithm AD aptive Sliding WIN dow (2) ADWIN Example W = 101010110111111 W 0 = 1 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W for each t > 0 2 3 do W ← W ∪{ x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until | ˆ µ W 0 − ˆ µ W 1 | < ε c holds 6 for every split of W into W = W 0 · W 1 7 Output ˆ µ W 12 / 59

Algorithm AD aptive Sliding WIN dow (2) ADWIN Example W = 101010110111111 W 0 = 1 W 1 = 01010110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪{ x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until | ˆ µ W 0 − ˆ µ W 1 | < ε c holds 6 for every split of W into W = W 0 · W 1 7 Output ˆ µ W 12 / 59

Algorithm AD aptive Sliding WIN dow (2) ADWIN Example W = 101010110111111 | ˆ µ W 0 − ˆ µ W 1 | ≥ ε c : CHANGE DET.! W 0 = 101010110 W 1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪{ x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until | ˆ µ W 0 − ˆ µ W 1 | < ε c holds 6 for every split of W into W = W 0 · W 1 7 Output ˆ µ W 12 / 59

Algorithm AD aptive Sliding WIN dow (2) ADWIN Example W = 101010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪{ x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until | ˆ µ W 0 − ˆ µ W 1 | < ε c holds 6 for every split of W into W = W 0 · W 1 7 Output ˆ µ W 12 / 59

Algorithm AD aptive Sliding WIN dow (2) ADWIN Example W = 01010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t > 0 3 do W ← W ∪{ x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until | ˆ µ W 0 − ˆ µ W 1 | < ε c holds 6 for every split of W into W = W 0 · W 1 7 Output ˆ µ W 12 / 59

Adaptive Learning and Mining for Data Streams and Frequent Patterns - PowerPoint PPT Presentation

Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informtics Universitat Politcnica de Catalunya

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Querying and Mining Data Streams: Querying and Mining Data Streams: You Only Get One Look You

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Clustering megavariate data Dhammika Amaratunga Team Leader - Statistics in Drug Discovery

Evolution of Data Management Systems: from Uniprocessor to Largescale Distributed Systems

On Privacy Risk of Releasing Data and Models Ashish Dandekar Supervised by: A/P St ephane

Chapter 15: Network Applications Helmut Simonis Cork Constraint Computation Centre Computer

IPAS: a linked open data registry of Italian public services

A Case Study of Long-Running Business Processes: Digital Information Preservation Yannis

from Requirements to Im fr Implementation th through Agile Min ind Mapping Methods Robert

a Teaching Experience Report Prof. Dr. Robert Buchmann, Lect. Dr. Ana-Maria Ghiran University

Sambuz

Useful Links

Newsletter

Mail Us

Adaptive Learning and Mining for Data Streams and Frequent Patterns - PowerPoint PPT Presentation

Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informtics Universitat Politcnica de Catalunya

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Querying and Mining Data Streams: Querying and Mining Data Streams: You Only Get One Look You

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Clustering megavariate data Dhammika Amaratunga Team Leader - Statistics in Drug Discovery

Evolution of Data Management Systems: from Uniprocessor to Largescale Distributed Systems

On Privacy Risk of Releasing Data and Models Ashish Dandekar Supervised by: A/P St ephane

Chapter 15: Network Applications Helmut Simonis Cork Constraint Computation Centre Computer

IPAS: a linked open data registry of Italian public services

A Case Study of Long-Running Business Processes: Digital Information Preservation Yannis

from Requirements to Im fr Implementation th through Agile Min ind Mapping Methods Robert

a Teaching Experience Report Prof. Dr. Robert Buchmann, Lect. Dr. Ana-Maria Ghiran University

Sambuz

Useful Links

Newsletter

Mail Us

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams