data stream mining
play

Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017 - PowerPoint PPT Presentation

Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017 Projects MOA (University of Waikato) (10.000 downloads/year, 50 contributors) Apache SAMOA (Yahoo Labs) 2 Data Set Classifier Algorithm builds Model Model Analytic


  1. Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017

  2. Projects • MOA (University of Waikato) (10.000 downloads/year, 50 contributors) • Apache SAMOA (Yahoo Labs) 2

  3. Data Set Classifier Algorithm builds Model Model Analytic Standard Approach Finite training sets 
 Static models 3

  4. D D D D D D D D D D D D Update Model M M M M M M M M M M M M Data Stream Approach Infinite training sets 
 Dynamic models 4

  5. Data Stream Mining • Maintain models online • Incorporate data on the fly • Unbounded training sets • Resource efficient • Detect changes and adapts • Dynamic models 5

  6. MOA • {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. • It is closely related to WEKA • It includes a collection of offline and online as well as tools for evaluation: • classification, regression • clustering, frequent pattern mining • Easy to extend, design and run experiments

  7. 7

  8. P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” KDD ’00 HOEFFDING TREE • Sample of stream enough for near optimal decision • Estimate merit of alternatives from prefix of stream • Choose sample size based on statistical principles • When to expand a leaf? • Let x 1 be the most informative attribute, 
 x 2 the second most informative one R 2 ln(1 / δ ) r • Hoeffding bound: split if G(x 1 ) - G(x 2 ) > ε = 2 n

  9. Adaptive Random Forest • Why Random Forests? • Off-the-shelf learner • Good learning performance Related publication Adaptive random forests for evolving data stream classification. Gomes, H M; Bifet, A; Read, J; Barddal, J P; Enembreck, F; Pfharinger, B; Holmes, G; Abdessalem, T. Machine Learning, Springer, 2017. • Based on the original Random Forest by Breiman 9

  10. ADWIN 10

  11. ADWIN 11

  12. 12

  13. http://samoa-project.net APACHE SAMOA G. De Francisci Morales, A. Bifet: “SAMOA: Scalable Advanced Massive Online Analysis”. JMLR (2014) 13

  14. Summary • Data Streaming useful for finding approximate solutions with reasonable amount of time & limited resources • MOA: Massive Online Analytics • Available and open-source • http://moa.cms.waikato.ac.nz/ • SAMOA: A Platform for Mining Big Data Streams • Available and open-source (incubating @ASF) • http://samoa.incubator.apache.org 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend