A Model for Continuous Query Latencies in Data Streams R. Baldoni - - PowerPoint PPT Presentation

a model for continuous query latencies in data streams
SMART_READER_LITE
LIVE PREVIEW

A Model for Continuous Query Latencies in Data Streams R. Baldoni - - PowerPoint PPT Presentation

Outline Introduction Problem Statement Model Example Conclusions and Future Works A Model for Continuous Query Latencies in Data Streams R. Baldoni G. Di Luna D. Firmani G. Lodi Sapienza, University of Rome September 19,


slide-1
SLIDE 1

Outline Introduction Problem Statement Model Example Conclusions and Future Works

A Model for Continuous Query Latencies in Data Streams

  • R. Baldoni◦
  • G. Di Luna◦
  • D. Firmani◦
  • G. Lodi◦
  • Sapienza, University of Rome

September 19, 2011

1/17

slide-2
SLIDE 2

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Introduction Problem Statement Model Example Conclusions and Future Works

2/17

slide-3
SLIDE 3

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Data Streams Query Processing

In recent years we have witnessed an increased adoption of Data Streams Query Processing in several application domains.

◮ Data Base Management Systems

◮ Mostly static data, ad-hoc one-time queries ◮ Fire the queries at the data, return result sets

◮ Data Stream Management Systems /

Complex Event Processing Systems

◮ Mostly transient data, continuous queries ◮ Fire the data at the queries, incrementally update result

streams

3/17

slide-4
SLIDE 4

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Programming Paradigm

At a very high level, a programmer, in order to solve a continuous query:

◮ defines a set of functions; ◮ describes how incoming flows of information, i.e. data streams

  • r events, have to be processed to timely produce the target

stream as output;

◮ by producing intermediate streams useful to the computation.

4/17

slide-5
SLIDE 5

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Application Domains

◮ Financial analysis: Algorithmic trading: detect advantageous

market conditions, automatically execute trades;

◮ Network Monitoring: Intrusion detection; ◮ Fraud Detection; ◮ Sensor Networks: Health care, Habitat monitoring.

Latency is fundamental.

5/17

slide-6
SLIDE 6

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Related Works

A lot of work has been done to minimize latency on DSMS:

◮ optimized query evaluation planning; ◮ avoiding overload of operators in distributed environments; ◮ resilient operator placement; ◮ ...

Our focus is to propose a cost evaluation tecnique to estimate whether a given strategy to solve a query fits the QoS requirements, independenlty from the used DSMS and before an experimental validation phase.

6/17

slide-7
SLIDE 7

Outline Introduction Problem Statement Model Example Conclusions and Future Works

How can we evaluate latency?

QoS requirements from latency point of view:

◮ time cost to produce a new output stream item; ◮ rate of the output stream; ◮ possibility to improve the time needed to trigger a solution.

Our approach is to compare different data-flow graphs in a platform independent framework.

7/17

slide-8
SLIDE 8

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Data Flow Graph: Definition

◮ EPU. An Event Processing Unit is a function that takes

streams as input, performs a computation and originates a single stream as output for downstream consumption. An EPU can be:

◮ a relational operator (e.g., Esper); ◮ any user-defined operator (e.g., Spade).

◮ DFG. A data flow graph, that represents a strategy to solve a

query, is a DAG G = (V , E) s.t.

◮ V contains all the EPU nodes needed for the computation; ◮ in E there exists an edge (v, u) iff there exists an EPU v ∈ V

that produces an event stream which is consumed by an EPU u ∈ V .

8/17

slide-9
SLIDE 9

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Data Flow Graph: Example

producer u1 market data stream time based u2 ticks per sec event based consumer u3 detect fall-off

EPU

  • peration

u1 String symbol; FeedEnum feed; double bidPrice; double askPrice; u2 insert into TicksPerSecond select feed, count(∗) as cnt from MarketDataEvent.win:time batch(1 second) group by feed u3 select feed, avg(cnt) as avgCnt, cnt as feedCnt from TicksPerSecond.win:time(10 seconds) group by feed having cnt < avg(cnt) ∗ 0.75

Query: Process a raw market data feed and detect when the data rate of a feed falls off unexpectedly, in order to alert when there is a possible problem with the feed.

9/17

slide-10
SLIDE 10

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Metrics

t Output Latency Activity Latency Reactivity Latency stream #1 cons. stream #2 consumption

  • utput production

Given a data-flow graph G and a set of input streams S that produces an

  • utput stream, compute:

◮ Output Lat: begin of the input → begin of the output streams ◮ Activity Lat: begin of the input → end of the output streams ◮ Reactivity Lat: end of the input → begin of the output streams ◮ Complexity: minimum dimension of the input streams necessary to

produce a non empty output stream

10/17

slide-11
SLIDE 11

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Model Abstraction

EPU behavior: ASB/O All-Streams Batch/Online Processing (logical and/or); EB/TB Event/Time Based (detect fall-off/ticks per sec)

EPU parameters: tu(v) time window of a TB u wrt v that produces the u input; nu(v) average number of events that a EB u consumes from v in order to produce a single event; n(u) dimension of co-domain of the function computed by the u; p(u) average time in which u computes the function (update output stream).

11/17

slide-12
SLIDE 12

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Evaluating EPU Metrics

Evaluation of the DFG metrics on the basis of evaluation of the metrics that each EPU has wrt its input consumption and output production.

u v w input set input set

  • utput

set

  • utput

set silence out silence in

u v input set input set

  • utput

set

  • utput

set σu(v) AL(v) OL(v) AL(v) − n(u)

ρ(u)

EPU metric evaluation performed by computing:

◮ input and output rate ρu(∗), ρ(u); ◮ input and output silence period σu(∗), σ(u).

12/17

slide-13
SLIDE 13

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Evaluating DFG Metrics

Algorithms for computing DFG metrics.

◮ evaluate EPU metrics following any topological sort of

data-flow graph G;

◮ algorithm for a metric M(G) consists in a graph visit that

finds the M-critical path, i.e., the set of EPU that determines the final value of M(G).

13/17

slide-14
SLIDE 14

Outline Introduction Problem Statement Model Example Conclusions and Future Works

A simple Example

Back to market data feed example. I(u) nu(∗) tu(∗) n(u) p(u) u1 {} {} {} 1 y1 u2 {u1}

  • {1}

1 y2 u3 {u2}

{x3} x3∈[1,∞)

  • 1

y3 Reactivity Latency evaluation:

◮ Let us consider that at ti in the marked data stream we have

a fall-off, and at to the strategy represented by the DFG effectively detects it;

◮ In the performed abstraction we obtain that the relationship

between ti and to is given by the sum of processing times of the EPUs and the timing of the TB EPU.

14/17

slide-15
SLIDE 15

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Future Works

TCP syn syn-ack rst rst-ack ack Ho-patternT Ho-patternB Ho-patternA Ho-patternO Cp-pattern Hu-patternT Hu-patternO groupby count UDO User Defined Operator count groupby pattern producer consumer filter

◮ The critical path changes as a function of the EPU parameters; ◮ Metrics may be complex functions, difficult to compute and study.

At this end:

◮ We implemented a software tool to handle complex DFGs; ◮ Experimentally evaluation of the model is still in progress.

15/17

slide-16
SLIDE 16

Outline Introduction Problem Statement Model Example Conclusions and Future Works

Conclusions

◮ We propose a formal model to evaluate some cost metrics of a

continuous streaming computation, represented as a data-flow query graph where each node is a basic query (EPU).

◮ The model is able to associate several metrics with a

data-flow in order to evaluate the expected latency before its effective implementation on a DSMS.

16/17

slide-17
SLIDE 17

Outline Introduction Problem Statement Model Example Conclusions and Future Works

THANKS FOR YOUR ATTENTION. ¨ ⌣ QUESTIONS?

17/17