regular programming for quantitative properties of data streams - - PowerPoint PPT Presentation

regular programming for quantitative properties of data
SMART_READER_LITE
LIVE PREVIEW

regular programming for quantitative properties of data streams - - PowerPoint PPT Presentation

regular programming for quantitative properties of data streams Rajeev Alur Dana Fisman Mukund Raghothaman ESOP 2016 University of Pennsylvania 0 data streams EOM P1 EOM P2 P2 P2 EOD P2 P1


slide-1
SLIDE 1

regular programming for quantitative properties of data streams

Rajeev Alur Dana Fisman Mukund Raghothaman ESOP 2016

University of Pennsylvania

slide-2
SLIDE 2

data streams

web-page requests

∙ P2 ∙ EOD ∙ P2 ∙ P1 ∙ P1 ∙ EOM ∙ P2 ∙ P2 ∙ P2 ∙ P1 ∙ EOD ∙ P2 ∙ P2 ∙ P1 ∙ P2 ∙ P1 ∙ EOM ∙ P2 ∙ P2 ∙ P2 ∙ P1 ∙ P2 ∙ EOM ∙ P1 ∙ P2 ∙ EOD ∙ P2 ∙ P2 ∙ P2 ∙ EOM ∙ P1 ∙ · · ·

1

slide-3
SLIDE 3

data streams

web-page requests

∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg month-countP1 ∙ What is the total number of requests for P1? countP1 iter-sum day-countP1 ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2

2

slide-4
SLIDE 4

data streams

web-page requests

∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 iter-sum day-countP1 ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2

2

slide-5
SLIDE 5

data streams

web-page requests

∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 = iter-sum(day-countP1) ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2

2

slide-6
SLIDE 6

data streams

web-page requests

∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 = iter-sum(day-countP1) ∙ What is the maximum of the total number of requests for P1 and P2? max(countP1, countP2)

2

slide-7
SLIDE 7

quantitative regular expressions

Languages Σ∗ → bool ≡ Regular expressions Cost functions Σ∗ → R ≡ QREs

3

slide-8
SLIDE 8

function combinators

slide-9
SLIDE 9

function combinators

basic expressions, a → d

a → d ∙ If w = a, then output d, otherwise undefined ∙ P1 → 0, P1 → 1, EOD → 5, … ∙ Analogue of basic regular expressions

5

slide-10
SLIDE 10

function combinators

f else g

f else g ∙ If f(w) is defined, then output f(w), otherwise output g(w) ∙ Analogue of regular expression union

6

slide-11
SLIDE 11

function combinators

  • p

∙ f + g: Map w to f(w) + g(w) ∙ f − g: Map w to f(w) − g(w) ∙ max(f, g): Map w to max(f(w), g(w)) ∙ · · · ∙ op(f1, f2, . . . , fk) : Map w to op(f1(w), f2(w), . . . , fk(w)) ∙ Analogue of regular expression intersection

7

slide-12
SLIDE 12

concatenation

split-op(f, g)

split-plus f g , split-max f g , …, split-op f g w1 w2 f g Output max

  • p

8

slide-13
SLIDE 13

concatenation

split-op(f, g)

split-plus(f, g) , split-max f g , …, split-op f g w1 w2 f g Output + max

  • p

8

slide-14
SLIDE 14

concatenation

split-op(f, g)

split-plus(f, g) , split-max(f, g) , …, split-op f g w1 w2 f g Output + max

  • p

8

slide-15
SLIDE 15

concatenation

split-op(f, g)

split-plus(f, g) , split-max(f, g) , …, split-op(f, g) w1 w2 f g Output + max · · ·

  • p

8

slide-16
SLIDE 16

iteration

iter-op(f)

w1 w2 wk−1 wk

  • p

9

slide-17
SLIDE 17

function combinators

attempt 1

Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?

10

slide-18
SLIDE 18

function combinators

attempt 1

Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?

10

slide-19
SLIDE 19

function combinators

attempt 1

Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?

10

slide-20
SLIDE 20

parse trees and computation trees

split-plus(countP1, Σ5 → 0) ∙ Attempt 1 annotates the parse tree with cost operations to

  • btain the computation tree

∙ What if QREs map input streams to computation trees?

11

slide-21
SLIDE 21

parse trees and computation trees

split-plus(countP1, Σ5 → 0) Σ5 ∙ Attempt 1 annotates the parse tree with cost operations to

  • btain the computation tree

∙ What if QREs map input streams to computation trees?

11

slide-22
SLIDE 22

parse trees and computation trees

split-plus(countP1, Σ5 → 0) Σ5 · Σ5 ∗ P1 · · · P2 P2 P1 ∙ Attempt 1 annotates the parse tree with cost operations to

  • btain the computation tree

∙ What if QREs map input streams to computation trees?

11

slide-23
SLIDE 23

parse trees and computation trees

split-plus(countP1, Σ5 → 0) Σ5 + + 1 · · · 1 ∙ Attempt 1 annotates the parse tree with cost operations to

  • btain the computation tree

∙ What if QREs map input streams to computation trees?

11

slide-24
SLIDE 24

parse trees and computation trees

split-plus(countP1, Σ5 → 0) Σ5 + + 1 · · · 1 ∙ Attempt 1 annotates the parse tree with cost operations to

  • btain the computation tree

∙ What if QREs map input streams to computation trees?

11

slide-25
SLIDE 25

quantitative regular expressions

Key Insight QREs map input streams to terms over the cost domain ∙ Terms contain parameters ∙ f(aaab) = 5 + p, where p is a parameter ∙ Parameter substitution is also an operation

12

slide-26
SLIDE 26

quantitative regular expressions

Key Insight QREs map input streams to terms over the cost domain ∙ Terms contain parameters ∙ f(aaab) = 5 + p, where p is a parameter ∙ Parameter substitution is also an operation

12

slide-27
SLIDE 27

function combinators

substitution

Parameter subtitution, f[p/g] ∙ Say f(w) = tf, ∙ and g(w) = tg ∙ f[p/g](w) = tf[p/tg]

13

slide-28
SLIDE 28

function combinators

concatenation, take 2

split(f →p g) w1 w2 f g Output

14

slide-29
SLIDE 29

function combinators

concatenation, take 2

split(f →p g) w1 w2 f g Output p

14

slide-30
SLIDE 30

function combinators

concatenation, take 2

Simulating split-plus(f, g) w1 w2 f g′ Output p ∙ split-plus(f, g) = split(f →p g′) ∙ g w p g w

15

slide-31
SLIDE 31

function combinators

concatenation, take 2

Simulating split-plus(f, g) w1 w2 f g′ Output p ∙ split-plus(f, g) = split(f →p g′) ∙ g′(w) = p + g(w)

15

slide-32
SLIDE 32

function combinators

iteration, take 2

w1 w2 wk−1 wk

  • p

f f f f p p p

16

slide-33
SLIDE 33

function combinators

iteration, take 2

w1 w2 wk−1 wk

  • p

f f f f p p p

16

slide-34
SLIDE 34

quantitative regular expressions

Basic functions: a → d, regex → term Conditional choice: f else g Concatenation: split(f →p g), split(f ←p g) Function iteration: iter→(f), iter←(f) Cost operations: op(f1, f2, . . . , fk), f[p/g] ∙ QREs map string to terms ∙ Structural operators decoupled from cost operators

17

slide-35
SLIDE 35

quantitative regular expressions

Basic functions: a → d, regex → term Conditional choice: f else g Concatenation: split(f →p g), split(f ←p g) Function iteration: iter→(f), iter←(f) Cost operations: op(f1, f2, . . . , fk), f[p/g] ∙ QREs map string to terms ∙ Structural operators decoupled from cost operators

17

slide-36
SLIDE 36

quantitative regular expressions

∙ Was our choice of combinators ad-hoc? What functions can the formalism express? ∙ Given f and an input stream w, can we efficiently compute f(w)?

18

slide-37
SLIDE 37

expressiveness

slide-38
SLIDE 38

regular cost functions [lics 2013]

Languages Σ∗ → bool ≡ Finite automata Cost functions Σ∗ → R ≡ ? Regular languages have many appealing properties: many natural equi-expressive characterizations, robust closure properties, decidable analysis problems, practical utility

20

slide-39
SLIDE 39

cost register automata [lics 2013]

q0 start q1

d ≥ 0/bal := bal + d d = endm/bal := bal + 10 d < 0/bal := bal + d d ∈ R/bal := bal + d d = endm

∙ Finite state space, finitely many registers ∙ Registers hold terms: Parameterized by set of operations ∙ Equivalent to MSO definable string-to-term transducers ∙ Closed under regular lookahead, input reversal, etc

21

slide-40
SLIDE 40

cost register automata

Theorem QREs can express exactly the same functions as CRAs

22

slide-41
SLIDE 41

cras → qres

Taste of the completeness proof ∙ Piggy-back on DFA to regular expression translation Construct Ri(q, q′): all strings from q to q′ while only traversing states less than qi ∙ Construct fi(q, q′, x): express final value of register x as a DReX expression

23

slide-42
SLIDE 42

cras → qres

Taste of the completeness proof ∙ Capture data flows using “shapes” ∙ Construct a partial order over shapes, and use as basis for induction x y z x y z

24

slide-43
SLIDE 43

fast evaluation algorithms

slide-44
SLIDE 44

evaluation algorithms

Given a QRE f and an input stream w, find f(w) ∙ QREs separate intent from evaluation ∙ If QREs are unambiguous, then f(w) can be computed with a single pass over w in time O(|w| · poly(|f|))

26

slide-45
SLIDE 45

evaluation algorithms

split-plus(countP1, Σ5 → 0) 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w

27

slide-46
SLIDE 46

evaluation algorithms

split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w

27

slide-47
SLIDE 47

evaluation algorithms

split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w

27

slide-48
SLIDE 48

evaluation algorithms

split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w

27

slide-49
SLIDE 49

evaluation algorithms

split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O(poly(|f|)), irrespective of |w|

27

slide-50
SLIDE 50

evaluation algorithms

+ 3 min 9 + 2 x min x min x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!

28

slide-51
SLIDE 51

evaluation algorithms

+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!

28

slide-52
SLIDE 52

evaluation algorithms

+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min 12 + 5 x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!

28

slide-53
SLIDE 53

evaluation algorithms

+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min 12 + 5 x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are ∗, +, −, min, max, avg, then space usage is polynomially bounded too!

28

slide-54
SLIDE 54

conclusion

slide-55
SLIDE 55

conclusion

∙ Introduced Quantitative Regular Expressions (QRE) ∙ Idea of function combinators

∙ Function descriptions are modular ∙ Regular parsing of the input data stream

∙ Simple, expressive programming model for stream processing, with strong theoretical foundations and fast evaluation algorithms

30

slide-56
SLIDE 56

quantitative regular expressions

Languages Σ∗ → bool ≡ Regular expressions Cost functions Σ∗ → R ≡ QREs ∙ Also works for Σ∗ → N, Σ∗ → Q, … ∙ Σ∗ → D, for arbitrary cost domain D ∙ Key insight: Generalizing to string-to-term transformations

31

slide-57
SLIDE 57

conclusion

∙ Expressively equivalent to regular cost functions / cost register automata ∙ Fast one-pass evaluation algorithms for unambiguous expressions ∙ Low space usage if only operations used are ∗, +, −, min, max, avg

32

slide-58
SLIDE 58

conclusion

future work

∙ Approximate evaluation algorithms for certain operations such as iter-median (Jointly with Sanjeev Khanna and Kostas Mamouras) ∙ Exploring connections to streaming databases

33

slide-59
SLIDE 59

fin!

questions, comments, brickbats?