Distributed Streaming Albert Bifet May 2012 COMP423A/COMP523A Data - - PowerPoint PPT Presentation

distributed streaming
SMART_READER_LITE
LIVE PREVIEW

Distributed Streaming Albert Bifet May 2012 COMP423A/COMP523A Data - - PowerPoint PPT Presentation

Distributed Streaming Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent


slide-1
SLIDE 1

Distributed Streaming

Albert Bifet May 2012

slide-2
SLIDE 2

COMP423A/COMP523A Data Stream Mining

Outline

  • 1. Introduction
  • 2. Stream Algorithmics
  • 3. Concept drift
  • 4. Evaluation
  • 5. Classification
  • 6. Ensemble Methods
  • 7. Regression
  • 8. Clustering
  • 9. Frequent Pattern Mining
  • 10. Distributed Streaming
slide-3
SLIDE 3

Data Streams

Big Data & Real Time

slide-4
SLIDE 4

Distributed Systems Hadoop, S4 and Storm

slide-5
SLIDE 5

Hadoop Hadoop

slide-6
SLIDE 6

Hadoop

Hadoop architecture

slide-7
SLIDE 7

Apache Mahout

Mahout: open source framework

slide-8
SLIDE 8

Pig

Pig: Similar to SQL

slide-9
SLIDE 9

Pig

◮ A = LOAD ’data’ USING PigStorage() AS

(f1:int, f2:int, f3:int);

◮ B = GROUP A BY f1; ◮ C = FOREACH B GENERATE COUNT ($0); ◮ DUMP C;

Pig: Similar to SQL

slide-10
SLIDE 10

Apache S4

Apache S4

slide-11
SLIDE 11

Apache S4

slide-12
SLIDE 12

Storm

Storm from Twitter

slide-13
SLIDE 13

Storm

Stream, Spout, Bolt, Topology