Partition and Compose: Parallel Complex Event Processing Martin - - PowerPoint PPT Presentation

partition and compose parallel complex event processing
SMART_READER_LITE
LIVE PREVIEW

Partition and Compose: Parallel Complex Event Processing Martin - - PowerPoint PPT Presentation

Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1 ? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over


slide-1
SLIDE 1

Partition and Compose: Parallel Complex Event Processing

Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS

1

slide-2
SLIDE 2

CEP = Stream Processing?

2

Event (Stream) Processing Complex Event Processing

?

 CEP as an operator in a streaming language? Aggregate Enrich Filter Join Parse Use pattern over “simple events” to detect and report “composite events” …

slide-3
SLIDE 3

Background: SPL

  • IBM Streams Processing Language
  • SPL is the language for

InfoSphere Streams (IBM Product)

  • This paper is based on System S

= research branch of InfoSphere Streams

3 ¡

slide-4
SLIDE 4

Scenario: Financial analysis

4

M-shape (double-top) stock pattern Source: http://www.cs.cornell.edu/bigreddata/cayuga/ Series of rising peaks and troughs Deep drop below start

  • f match
slide-5
SLIDE 5

M-Shape pattern in SPL

5

Composite events Simple events Regular expression Aggregation Key

 Operator only, no extensions to SPL syntax

slide-6
SLIDE 6

Regular expressions

6

 Pattern language familiar from string matching

slide-7
SLIDE 7

Aggregations

7

 Operator-specific intrinsic functions

slide-8
SLIDE 8

Matching semantics

  • Standard regular expression semantics
  • Non-greedy (right-minimal)
  • Partition-isolated
  • (Partition-)Contiguous
  • Non-overlapping

(submit longest: left-maximal)

8

slide-9
SLIDE 9

9

Implementation overview

 All C++ operators in SPL are code generators

MatchRegex

  • perator

instance Upstream

  • perator

instance C

  • m

p

  • s

i t e e v e n t s S i m p l e e v e n t s At compile-time At runtime param,

  • utput

Automaton MatchRegex

  • perator

invocation MatchRegex

  • perator

generator Downstream

  • perator

instance

slide-10
SLIDE 10

4 rise 2 5 1 3 drop drop deep . rise deep rise rise drop drop 6

10

. rise+ drop+ rise+ drop* deep

Automaton

 NFA (non-deterministic finite automaton)

Create new partial match Report completed match and flush Update and filter partial match

slide-11
SLIDE 11

11

Partitioning

:PartitionMap 0..* :SimpleEvent ts symbol price size seqNum key

4 rise 2 5 1 3 drop drop deep rise deep rise rise drop drop 6 .

:PartialMatch state aggr

slide-12
SLIDE 12

Generated C++ code

12

 Incremental aggregation

4 rise 2 5 1 3 drop drop deep rise deep rise rise drop drop 6 .

slide-13
SLIDE 13

13 ¡

Paralleli- zation

key :SimpleEvent symbol …

MatchRegex

  • perator

PartitionMap

Up-stream

  • perator

Down-stream

  • perator

Simple events Composite events Parallelize MatchRegex (replica 2)

PartitionMap

key for partition map :SimpleEvent symbol …

MatchRegex (replica 1)

PartitionMap

MatchRegex (replica 0)

PartitionMap

Up-stream

  • perator

Down-stream

  • perator

Simple events Composite events

key for hash-split

 Schneider et al. [PACT’12]

slide-14
SLIDE 14

Safety and determinism

  • SPL compiler checks …

– Syntax and names in expressions – Expression and function types

  • MatchRegex operator checks …

– Syntax and names in regular expression pattern – Starting predicate aggregation-free

  • Auto-parallelizer checks …

– Partitioning – Absence of stateful expressions – Sequence numbers and pulses

14

 Enables simple output validation with “diff”

slide-15
SLIDE 15

Data sets …

15

… and benchmarks

slide-16
SLIDE 16

Absolute throughput in events per second

16

 Large speedup when low sequential throughput

slide-17
SLIDE 17

Speedups

17 ¡

1 Machine x 8 Cores 4 Machines x 8 Cores = 32

 Motivates elasticity and auto-width controller

slide-18
SLIDE 18

Related work

Engine / language Complex events Parallelism NiagaraCQ / XML-QL Algebraic No SQL-TS Back-tracking No Amit Back-tracking No NFAb / SASE Automaton No MATCH_RECOGNIZE ANSI proposal No EventScript Automaton No Cayuga / CEL Automaton Yes, by hand EventJava Index data structures Yes, per task

[Woods,Teubner VLDB]

Automaton Yes, on FPGA This paper Automaton Yes, partitioned

18

2000 today

slide-19
SLIDE 19

Conclusions

  • CEP as an SPL operator

– Use CEP for pattern matching – Use other operators for filtering, enrichment, parsing, joining, etc.

  • Up to 830K events/second

– Incremental aggregation – C++ code generation – Parallelism (up to 14x speedup)

19

slide-20
SLIDE 20

Backup

20

slide-21
SLIDE 21

21 ¡

Shuffle in twitter02 and twitter03

ParseTweet (replica 2) ParseTweet (replica 1) ParseTweet (replica 0) Source Down-stream

  • perator

Raw tweets as XML documents MatchRegex (replica 2) MatchRegex (replica 1) MatchRegex (replica 0) Tweets as simple events Composite events