in High-Rate Event Streams Olga Poppe*, Chuan Lei**, Salah Ahmed*, - - PowerPoint PPT Presentation

in high rate event streams
SMART_READER_LITE
LIVE PREVIEW

in High-Rate Event Streams Olga Poppe*, Chuan Lei**, Salah Ahmed*, - - PowerPoint PPT Presentation

Complete Event Trend Detection in High-Rate Event Streams Olga Poppe*, Chuan Lei**, Salah Ahmed*, and Elke A. Rundensteiner* *Worcester Polytechnic Institute, **NEC Labs America SIGMOD May 16, 2017 Funded by NSF grants CRI 1305258, IIS


slide-1
SLIDE 1

Complete Event Trend Detection in High-Rate Event Streams

Olga Poppe*, Chuan Lei**, Salah Ahmed*, and Elke A. Rundensteiner*

*Worcester Polytechnic Institute, **NEC Labs America

SIGMOD May 16, 2017

Funded by NSF grants CRI 1305258, IIS 1343620

slide-2
SLIDE 2

Worcester Polytechnic Institute

Real-time Event Trend Analytics

E-commerce Financial fraud Traffic control Health care Cluster monitoring Stock market

Event trend: Irregular heart rate Event trend: Items often bought together Event trend: Uneven load distribution Event trend: Circular check kite Event trend: Aggressive driving

2

Event trend: Head-and-shoulders

Event trend = event sequence of any length

slide-3
SLIDE 3

Worcester Polytechnic Institute

Check Kiting Fraud

3

slide-4
SLIDE 4

Worcester Polytechnic Institute

Check Kiting Fraud

  • In 2013, a bank fraud scheme netted $5 million from six New

York City banks [FBI]

  • In 2014, 12 people were charged in a large-scale “bust out” scheme,

costing banks over $15 million [The Press Enterprise]

3

slide-5
SLIDE 5

Worcester Polytechnic Institute

Complete Event Trend Detection

PATTERN Check+ C [ ] WHERE C.type = not-covered AND C.destination = Next(C).source WITHIN 12 hours SLIDE 1 minute Cash withdrawal W: Event type 9: Time stamp B: Source bank

Event Stream

Check deposit C: Event type 1: Time stamp A: Source bank B: Destination bank

CETs: Complete Event Trends CET Detection Query

4

slide-6
SLIDE 6

Worcester Polytechnic Institute

Problem Statement & Challenges

Problem Statement

CET optimization problem is to detect all CETs matched by Kleene query q in stream I with minimal CPU processing costs while staying within memory M

Challenges

  • 1. Expressive yet efficient

Exponential number of event trends of arbitrary length

  • 2. Real-time yet lightweight

Common event sub-trend storage versus their re-computation

  • 3. Optimal yet feasible

NP-hard event stream partitioning problem

5

slide-7
SLIDE 7

Worcester Polytechnic Institute

State-of-the-Art Approaches

1. Limited expressive power Neither Kleene closure nor the skip-till-any-match semantics are supported [1,2,3] 2. Delayed system responsiveness Common event sub-trends are re-computed [1,2,3,4]

1)

  • Flink. https://flink.apache.org/

2) A.Demers, et al. Cayuga: A General Purpose Event Monitoring System. In CIDR’07. 3) Y.Mei, et al. ZStream: A Cost-based Query Processor for Adaptively Detecting Composite Events. In SIGMOD’09. 4) H.Zhang, et al. On Complexity and Optimization of Expensive Queries in Complex Event Processing. In SIGMOD’14.

6

slide-8
SLIDE 8

Worcester Polytechnic Institute

Base-Line CET Detection

7

Cases of the base-line algorithm:

  • 1. Start a new CET
slide-9
SLIDE 9

Worcester Polytechnic Institute

Base-Line CET Detection

7

Cases of the base-line algorithm:

  • 1. Start a new CET
  • 2. Append to an existing CET
slide-10
SLIDE 10

Worcester Polytechnic Institute

Base-Line CET Detection

7

Cases of the base-line algorithm:

  • 1. Start a new CET
  • 2. Append to an existing CET
  • 3. Replicate the prefix of an existing CET and append to it
slide-11
SLIDE 11

Worcester Polytechnic Institute

Base-Line CET Detection

Problem: Exponential time & space complexity

7

slide-12
SLIDE 12

Worcester Polytechnic Institute

Overview of Our CET Approach

Input event stream Step 1: Compact CET Encoding as CET Graph

Quadratic time & space complexity

Step 2: Graph-based CET Detection

Trade-off between time & space complexity

Event trend output stream CET graph

8

slide-13
SLIDE 13

Worcester Polytechnic Institute

Step 1: CET Graph Construction

9

c1

Cases of the graph construction algorithm:

  • 1. Start a new CET
slide-14
SLIDE 14

Worcester Polytechnic Institute

Step 1: CET Graph Construction

9

c1 c2

Cases of the graph construction algorithm:

  • 1. Start a new CET
  • 2. Append to an existing CET
slide-15
SLIDE 15

Worcester Polytechnic Institute

Step 1: CET Graph Construction

9

c1 c2 c4

Cases of the graph construction algorithm:

  • 1. Start a new CET
  • 2. Append to an existing CET
  • 3. Append to the prefix of an existing CET
slide-16
SLIDE 16

Worcester Polytechnic Institute

Step 1: CET Graph Construction

Compact CET encoding = CET graph

  • Matched event = vertex
  • Event adjacency relation = edge
  • CET = Path through the graph

9

c1 c2 c4 c6 c5 c7

Quadratic time & space complexity

slide-17
SLIDE 17

Worcester Polytechnic Institute

Step 2: Graph-based CET Detection

T-CET: Time-optimal BFS-based algorithm

Spectrum of CET Detection Algorithms

Is a middle ground possible?

M-CET: Memory-optimal DFS-based algorithm

10

slide-18
SLIDE 18

Worcester Polytechnic Institute

Our Proposed H-CET (Hybrid) Algorithm

Step 2: Graph-based CET Detection

How do we partition the graph?

11

Graphlet 1 Graphlet 2

slide-19
SLIDE 19

Worcester Polytechnic Institute

Graph Partitioning Search Space

Graph partitioning search is exponential in # of atomic graphlets Goal: Optimal graph partitioning plan

12

Atomic graphlet

slide-20
SLIDE 20

Worcester Polytechnic Institute

Balanced Graph Partitioning

  • Theorem. The closer a graph partitioning is to balanced, the

lower are CPU & memory costs of the CET detection. CPU: 27 connect operations Memory: 42 events CPU: 27 connect operations Memory: 36 events

13

slide-21
SLIDE 21

Worcester Polytechnic Institute

Graph Partitioning Algorithm

Pruning principles:

  • 1. Unbalanced node pruning

14

slide-22
SLIDE 22

Worcester Polytechnic Institute

2 Graphlets

Number of Graphlets

3 Graphlets

CPU: 27 connect operations Memory: 42 events CPU: 38 connect operations Memory: 18 events

  • Theorem. If we add a cut to the graph, memory costs of CET

detection goes down, while CPU processing time goes up.

15

slide-23
SLIDE 23

Worcester Polytechnic Institute

Graph Partitioning Algorithm

Pruning principles:

  • 1. Unbalanced node pruning
  • 2. Infeasible level pruning

16

slide-24
SLIDE 24

Worcester Polytechnic Institute

Graph Partitioning Algorithm

Pruning principles:

  • 1. Unbalanced node pruning
  • 2. Infeasible level pruning
  • 3. Inefficient branch pruning

17

slide-25
SLIDE 25

Worcester Polytechnic Institute

Experimental Setup

Execution infrastructure: Java 7, 1 Linux machine with 16-core 3.4 GHz CPU and 128GB of RAM Data sets:

  • Stock real data set (ST) [1]

CETs = Stock trends

  • Physical activity monitoring real data set (PA) [2]

CETs = Behavioral patterns per person

  • Financial transaction synthetic data set (FT)

CETs = Circular check kites

[1] Stock trade traces. http://davis.wpi.edu/datasets/Stock Trace Data/ [2] A. Reiss and D. Stricker. Creating and benchmarking a new dataset for physical activity monitoring. In PETRA, pages 40:1-40:8, 2012.

18

slide-26
SLIDE 26

Worcester Polytechnic Institute

Experimental Setup

CET detection algorithms:

  • Base line (BL) maintains a set of CETs
  • SASE++ is memory-optimized [1,2]
  • Flink is a popular open-source streaming engine that

supports event pattern matching but not Kleene closure. Thus, we flatten our queries [3] CET graph partitioning algorithms:

  • Exhaustive (Exh)
  • Greedy
  • Branch and bound (B&B)

[1] J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching

  • ver event streams. In SIGMOD, pages 147-160, 2008.

[2] H. Zhang, Y. Diao, and N. Immerman. On complexity and optimization of expensive queries in Complex Event Processing. In SIGMOD, pages 217-228, 2014. [3] Apache Flink. https://ink.apache.org/

19

slide-27
SLIDE 27

Worcester Polytechnic Institute

CET Detection Algorithms

CET

  • utilizes available memory to achieve 42-fold speed-up

compared to SASE++

  • is 2 orders of magnitude faster and requires 2 orders of

magnitude less memory than Flink

(FT) (FT)

20

CPU costs Memory costs

slide-28
SLIDE 28

Worcester Polytechnic Institute

CET Graph Partitioning

B&B is

  • 2 orders of magnitude faster than Exhaustive but 3-fold

slower than Greedy

  • CET detection in a greedily partitioned CET graph is almost 3-

fold slower than in an optimally partitioned CET graph

(FT) (FT)

21

Quality of partitioning plan Graph partitioning algorithms

slide-29
SLIDE 29

Worcester Polytechnic Institute

Conclusions

We are the first to enable real-time Kleene closure computation over event streams under memory constraints

  • 1. CET graph compactly encodes all CETs and defines the

spectrum of CET detection algorithms

  • 2. Hybrid CET detection algorithm utilizes available

memory to achieve 42-fold speed-up

  • 3. Graph partitioning algorithm prunes large portions of

search to efficiently find an optimal graph partitioning

22

slide-30
SLIDE 30

Worcester Polytechnic Institute

Acknowledgement

  • DSRG group at WPI
  • SIGMOD reviewers
  • NSF grants CRI 1305258, IIS 1343620

23