partition and compose parallel complex event processing
play

Partition and Compose: Parallel Complex Event Processing Martin - PowerPoint PPT Presentation

Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1 ? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over


  1. Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1

  2. ? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over “simple events” to detect and report Join “composite events” Parse …  CEP as an operator in a streaming language? 2

  3. Background: SPL • IBM Streams Processing Language • SPL is the language for InfoSphere Streams (IBM Product) • This paper is based on System S = research branch of InfoSphere Streams 3 ¡

  4. Scenario: Financial analysis Series of rising peaks and troughs Deep drop below start of match M-shape (double-top) stock pattern Source: http://www.cs.cornell.edu/bigreddata/cayuga/ 4

  5. M-Shape pattern in SPL Composite events Simple events Regular expression Key Aggregation  Operator only, no extensions to SPL syntax 5

  6. Regular expressions  Pattern language familiar from string matching 6

  7. Aggregations  Operator-specific intrinsic functions 7

  8. Matching semantics • Standard regular expression semantics • Non-greedy (right-minimal) • Partition-isolated • (Partition-)Contiguous • Non-overlapping (submit longest: left-maximal) 8

  9. Implementation overview MatchRegex MatchRegex operator operator param, invocation generator output Automaton At compile-time At runtime MatchRegex Downstream Upstream operator C e o operator operator m l p p o m instance s i t i e S instance instance s e v t e n n e t s v e  All C++ operators in SPL are code generators 9

  10. Automaton . rise+ drop+ rise+ drop* deep Update and filter rise rise partial match drop 2 4 drop rise rise 5 . drop deep 0 1 3 deep 6 Create new drop partial match Report completed match and flush  NFA (non-deterministic finite automaton) 10

  11. Partitioning :PartitionMap :SimpleEvent ts 0..* :PartialMatch symbol key state price rise aggr rise drop size 2 4 rise rise 5 drop . drop deep 0 1 3 seqNum deep 6 drop 11

  12. Generated C++ code  Incremental aggregation rise rise drop 2 4 rise rise 5 drop . drop deep 0 1 3 deep 6 drop 12

  13. Paralleli- Simple Composite events events Up-stream MatchRegex Down-stream zation operator operator operator PartitionMap :SimpleEvent …  Schneider et al. symbol key … [PACT’12] Parallelize MatchRegex Simple Composite (replica 0) events events PartitionMap Up-stream MatchRegex Down-stream operator (replica 1) operator PartitionMap key for hash-split MatchRegex (replica 2) PartitionMap :SimpleEvent … symbol key for partition map 13 ¡ …

  14. Safety and determinism • SPL compiler checks … – Syntax and names in expressions – Expression and function types • MatchRegex operator checks … – Syntax and names in regular expression pattern – Starting predicate aggregation-free • Auto-parallelizer checks … – Partitioning – Absence of stateful expressions – Sequence numbers and pulses  Enables simple output validation with “diff” 14

  15. Data sets … … and benchmarks 15

  16. Absolute throughput in events per second  Large speedup when low sequential throughput 16

  17. Speedups 1 Machine x 8 Cores 4 Machines x 8 Cores = 32  Motivates elasticity and auto-width controller 17 ¡

  18. Related work Engine / language Complex events Parallelism 2000 NiagaraCQ / XML-QL Algebraic No SQL-TS Back-tracking No Amit Back-tracking No NFA b / SASE Automaton No M ATCH _R ECOGNIZE ANSI proposal No EventScript Automaton No Cayuga / CEL Automaton Yes, by hand EventJava Index data structures Yes, per task [Woods,Teubner VLDB] Automaton Yes, on FPGA today This paper Automaton Yes, partitioned 18

  19. Conclusions • CEP as an SPL operator – Use CEP for pattern matching – Use other operators for filtering, enrichment, parsing, joining, etc. • Up to 830K events/second – Incremental aggregation – C++ code generation – Parallelism (up to 14x speedup) 19

  20. Backup 20

  21. Shuffle in twitter02 and twitter03 ParseTweet MatchRegex (replica 0) (replica 0) Down-stream ParseTweet MatchRegex Source operator (replica 1) (replica 1) ParseTweet MatchRegex (replica 2) (replica 2) Raw tweets as Tweets as Composite XML documents simple events events 21 ¡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend