SAMOA: A Platform for Mining Big Data Streams
Nicolas Kourtellis Associate Researcher Telefonica I+D, Barcelona
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 1
SAMOA: A Platform for Mining Big Data Streams Nicolas Kourtellis - - PowerPoint PPT Presentation
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 1 SAMOA: A Platform for Mining Big Data Streams Nicolas Kourtellis Associate Researcher Telefonica I+D, Barcelona SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 2
Nicolas Kourtellis Associate Researcher Telefonica I+D, Barcelona
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 1
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 2
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 3
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 4
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 5
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 6
Machine Learning Distributed Batch Hadoop Mahout Stream S4, Storm
SAMOA
Non Distributed Batch R, WEKA, … Stream MOA
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 7
SA
SAMOA%
Machine Learning Algorithms Distributed Stream Processing Engines
Flink
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 8
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 9
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 10
TopologyBuilder builder; Processor sourceOne = new SourceProcessor(); builder.addProcessor(sourceOne); Stream streamOne = builder.createStream(sourceOne);
!
Processor sourceTwo = new SourceProcessor(); builder.addProcessor(sourceTwo); Stream streamTwo = builder.createStream(sourceTwo);
!
Processor join = new JoinProcessor()); builder.addProcessor(join) .connectInputShuffle(streamOne) .connectInputKey(streamTwo);
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 11
SAMOA-S4.jar SAMOA-API.jar SAMOA-Storm.jar samoa-storm-deployable.jar samoa-s4-deployable.s4r S4 bindings Storm bindings
depends only on this To S4 cluster To Storm cluster
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 12
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 13
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 14
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 15
bin/samoa storm target/SAMOA-Storm-0.3.0-SNAPSHOT.jar "PrequentialEvaluation
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 16
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 17
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 18
Stats Stats Stats Stream Histograms Model Instances Model Updates Horizontal Parallelism
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 19
Stats Stats Stats Stream Model Attributes Splits Vertical Parallelism
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 20
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 21
Control Split Result Source (n) Model (n) Stats (n) Evaluator (1) Instance Stream Shuffle Grouping Key Grouping All Grouping
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 22
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 23
20 40 60 80 100 4 8 16 local Correct Classification % Parallelism Level
Classification Accuracy vs. Parallelism Level vs. Number of Attributes
100 words 1000 words 10000 words
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 24
1 2 3 4 5 4 8 16 Speedup Parallelism Level
Speedup vs. Parallelism Level vs. Number of Attributes
100 words 1000 words 10000 words
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 25
Albert Bifet Gianmarco De Francici Morales Nicolas Kourtellis Matthieu Morel Arinto Murdopo Olivier Van Laere
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 26
§ Released version 0.3.0 in July
§ Local FS § HDFS § Kafka [pending]
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 27
§ Vertical Hoeffding Tree (classification) § CluStream (clustering) § Adaptive Model Rules (regression)
§ Distributed Naïve Bayes § Stochastic Gradient Descent § Adaptive + Boosting VHT § Parallelized Gradient Boosted Decision Tree § PARMA (frequent pattern mining) § …
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 28
@ApacheSAMOA http://samoa.incubator.apache.org/ https://github.com/apache/incubator-samoa Nicolas Kourtellis @kourtellis nicolas.kourtellis@telefonica.com
SAMOA: Scalable Advanced Machine Online Analysis 30/09/15 29