Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017 - PowerPoint PPT Presentation

Nov 26, 2022 •279 likes •437 views

Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017 Projects MOA (University of Waikato) (10.000 downloads/year, 50 contributors) Apache SAMOA (Yahoo Labs) 2 Data Set Classifier Algorithm builds Model Model Analytic

Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017
Projects • MOA (University of Waikato) (10.000 downloads/year, 50 contributors) • Apache SAMOA (Yahoo Labs) 2
Data Set Classifier Algorithm builds Model Model Analytic Standard Approach Finite training sets   Static models 3
D D D D D D D D D D D D Update Model M M M M M M M M M M M M Data Stream Approach Infinite training sets   Dynamic models 4
Data Stream Mining • Maintain models online • Incorporate data on the fly • Unbounded training sets • Resource efficient • Detect changes and adapts • Dynamic models 5
MOA • {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. • It is closely related to WEKA • It includes a collection of offline and online as well as tools for evaluation: • classification, regression • clustering, frequent pattern mining • Easy to extend, design and run experiments
7
P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” KDD ’00 HOEFFDING TREE • Sample of stream enough for near optimal decision • Estimate merit of alternatives from prefix of stream • Choose sample size based on statistical principles • When to expand a leaf? • Let x 1 be the most informative attribute,   x 2 the second most informative one R 2 ln(1 / δ ) r • Hoeffding bound: split if G(x 1 ) - G(x 2 ) > ε = 2 n
Adaptive Random Forest • Why Random Forests? • Off-the-shelf learner • Good learning performance Related publication Adaptive random forests for evolving data stream classification. Gomes, H M; Bifet, A; Read, J; Barddal, J P; Enembreck, F; Pfharinger, B; Holmes, G; Abdessalem, T. Machine Learning, Springer, 2017. • Based on the original Random Forest by Breiman 9
ADWIN 10
ADWIN 11
12
http://samoa-project.net APACHE SAMOA G. De Francisci Morales, A. Bifet: “SAMOA: Scalable Advanced Massive Online Analysis”. JMLR (2014) 13
Summary • Data Streaming useful for finding approximate solutions with reasonable amount of time & limited resources • MOA: Massive Online Analytics • Available and open-source • http://moa.cms.waikato.ac.nz/ • SAMOA: A Platform for Mining Big Data Streams • Available and open-source (incubating @ASF) • http://samoa.incubator.apache.org 14

Recommend

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

What is Web Mining? Wh t i W b Mi i What is Web Mining? Wh t i W b Mi i ? ? Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques to automat cally d scover and extract nformat on automatically

774 views • 20 slides

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Introduction Motivation: Why data mining? Introduction What is data mining? to Data Mining: On what kind of data? Data Mining Data mining functionalities Major issues in data mining 2 Motivation: Necessity is

438 views • 14 slides

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

What is Web Mining? What is Web Mining? Web Mining Web Mining Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services (Etzioni, 1996, CACM 39(11)) Web mining aims to

571 views • 22 slides

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Nominal Media Clock: Ts (implicit, not distributed) Stream A: ? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps Stream D: from different Talkers A-F Stream E: Listener must Stream F: align

57 views • 3 slides

Introduction What is data mining? to Data mining functionalities Data Mining Major

Introduction Motivation: Why data mining? Introduction What is data mining? to Data mining functionalities Data Mining Major issues in data mining 2 Motivation: Necessity is the Mother of Motivation: Necessity is the

575 views • 14 slides

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008 1 / 37 What is Data Mining? ? Introduction Data mining September 2008 2 / 37 What is Data Mining? ? Introduction Data mining September 2008

831 views • 50 slides

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data mining is the use of efficient techniques for the analysis of very large collections of data and the extraction of useful and possibly unexpected

2.4k views • 94 slides

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade provable security for practicality Stream cipher is initialized with short key Key is stretched into long keystream Keystream is used like

512 views • 35 slides

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020 Ad Feelders ( Universiteit Utrecht ) Data Mining October 2, 2020 1 / 45 Frequent Pattern Mining 1 Item Set Mining 2 Sequence Mining 3 Tree Mining

928 views • 45 slides

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline Web Mining Outline Goal: Examine the use of data mining on Examine the use of data mining on Goal: the World Wide Web the World Wide Web Web

1.14k views • 18 slides

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining? Data mining is also called knowledge discovery and data mining (KDD) Data mining is extraction of useful patterns from data sources , e.g.,

1.3k views • 50 slides

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data Input stream: a stream going into your program (eg. from a keyboard or file) cin is an input stream from the keyboard Output stream: a

351 views • 16 slides

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Outline Related Data Mining Background Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline Typical Dataset in Data Mining Related Data Mining Background Dataset consists of records.

433 views • 8 slides

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques for the analysis of very large

1.28k views • 62 slides

Towards Benchmarking Stream Data Warehouses Arian Br, Lukasz Golab 02.11.2012 Stream Data

Towards Benchmarking Stream Data Warehouses Arian Br, Lukasz Golab 02.11.2012 Stream Data Warehouses A data warehouse that is (nearly) continuously loaded Enables real-time/historical analytics and applications Stream Data Warehouses

482 views • 12 slides

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates and Mining Cement, Aggregates

925 views • 46 slides

Memory Testing 1 Memory Cells Per Chip 2 1 Test Time in Seconds (Memory Size n Bits, Memory

Memory Testing 1 Memory Cells Per Chip 2 1 Test Time in Seconds (Memory Size n Bits, Memory Cycle Time 60ns) Size Number of Test Algorithm Operations n 2 n 3/2 n n n X log 2 n 64.5 0.06 1 Mb 1.26 18.3 hr 515.4 4 Mb 0.25

559 views • 30 slides

VLSI Testing Fault Simulation Virendra Singh Associate Professor Computer Architecture and

VLSI Testing Fault Simulation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

958 views • 26 slides

CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming Languages Foundations of

CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming Languages Foundations of Computer Systems Ben Wood Ben Wood Digital Logic Gate way to computer science transistors, gates, circuits, Boolean algebra

884 views • 24 slides

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2)

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 20162017 Introduction Machine learning for NLP Standard approach: linear model trained over

503 views • 37 slides

LEARNING REGRESSION TREES from Time-Changing Data Streams Bla Sovdat August 27, 2014 THE

LEARNING REGRESSION TREES from Time-Changing Data Streams Bla Sovdat August 27, 2014 THE STREAM MODEL Example (adult, female, 3.141, 0.577) Data arrives in the form of examples (tuples) (child, male, 2.1728, 0.1123) (child, female,

459 views • 14 slides

Development and Evaluation of AI-based Parkinsons Disease Related Motor Symptom Detection

Development and Evaluation of AI-based Parkinsons Disease Related Motor Symptom Detection Algorithms Ahlrichs, Claas Department of Computer Science University of Bremen July 6, 2015 Ahlrichs, Claas (University of Bremen) PD and AI July

718 views • 44 slides

Context For Semantic Segmentation Gang Yu Collaborators Changqian Yu

Context For Semantic Segmentation Gang Yu Collaborators Changqian Yu Jingbo Wang Chao Peng Xiangyu Zhang Changxin Gao Nong Sang Gang Yu Jian Sun Outline Revisit Semantic Segmentation Context for Semantic

774 views • 43 slides

Video Propagation Networks V. Jampani, R. Gadde and P. V. Gehler, CVPR 2017 s Jon a

Video Propagation Networks V. Jampani, R. Gadde and P. V. Gehler, CVPR 2017 s Jon a Ser ych 2019-09-05 The Task Given: Video sequence Per-pixel information (color, segmentation, . . . ) on few frames Propagate the

649 views • 28 slides