SLIDE 1
regular programming for quantitative properties of data streams - - PowerPoint PPT Presentation
regular programming for quantitative properties of data streams - - PowerPoint PPT Presentation
regular programming for quantitative properties of data streams Rajeev Alur Dana Fisman Mukund Raghothaman ESOP 2016 University of Pennsylvania 0 data streams EOM P1 EOM P2 P2 P2 EOD P2 P1
SLIDE 2
SLIDE 3
data streams
web-page requests
∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg month-countP1 ∙ What is the total number of requests for P1? countP1 iter-sum day-countP1 ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2
2
SLIDE 4
data streams
web-page requests
∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 iter-sum day-countP1 ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2
2
SLIDE 5
data streams
web-page requests
∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 = iter-sum(day-countP1) ∙ What is the maximum of the total number of requests for P1 and P2? max countP1 countP2
2
SLIDE 6
data streams
web-page requests
∙ What is the maximum number of daily requests for P2? iter-max(day-countP2) ∙ What is the number of requests for P1 in an average month? iter-avg(month-countP1) ∙ What is the total number of requests for P1? countP1 = iter-sum(day-countP1) ∙ What is the maximum of the total number of requests for P1 and P2? max(countP1, countP2)
2
SLIDE 7
quantitative regular expressions
Languages Σ∗ → bool ≡ Regular expressions Cost functions Σ∗ → R ≡ QREs
3
SLIDE 8
function combinators
SLIDE 9
function combinators
basic expressions, a → d
a → d ∙ If w = a, then output d, otherwise undefined ∙ P1 → 0, P1 → 1, EOD → 5, … ∙ Analogue of basic regular expressions
5
SLIDE 10
function combinators
f else g
f else g ∙ If f(w) is defined, then output f(w), otherwise output g(w) ∙ Analogue of regular expression union
6
SLIDE 11
function combinators
- p
∙ f + g: Map w to f(w) + g(w) ∙ f − g: Map w to f(w) − g(w) ∙ max(f, g): Map w to max(f(w), g(w)) ∙ · · · ∙ op(f1, f2, . . . , fk) : Map w to op(f1(w), f2(w), . . . , fk(w)) ∙ Analogue of regular expression intersection
7
SLIDE 12
concatenation
split-op(f, g)
split-plus f g , split-max f g , …, split-op f g w1 w2 f g Output max
- p
8
SLIDE 13
concatenation
split-op(f, g)
split-plus(f, g) , split-max f g , …, split-op f g w1 w2 f g Output + max
- p
8
SLIDE 14
concatenation
split-op(f, g)
split-plus(f, g) , split-max(f, g) , …, split-op f g w1 w2 f g Output + max
- p
8
SLIDE 15
concatenation
split-op(f, g)
split-plus(f, g) , split-max(f, g) , …, split-op(f, g) w1 w2 f g Output + max · · ·
- p
8
SLIDE 16
iteration
iter-op(f)
w1 w2 wk−1 wk
- p
9
SLIDE 17
function combinators
attempt 1
Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?
10
SLIDE 18
function combinators
attempt 1
Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?
10
SLIDE 19
function combinators
attempt 1
Basic functions: a → d Conditional choice: f else g Concatenation: split-op(f, g) Function iteration: iter-op(f) Cost operations: op(f1, f2, . . . , fk) ∙ What about non-binary, non-commutative or non-associative operators? ∙ Are these operators “sufficient”? If we add more operators?
10
SLIDE 20
parse trees and computation trees
split-plus(countP1, Σ5 → 0) ∙ Attempt 1 annotates the parse tree with cost operations to
- btain the computation tree
∙ What if QREs map input streams to computation trees?
11
SLIDE 21
parse trees and computation trees
split-plus(countP1, Σ5 → 0) Σ5 ∙ Attempt 1 annotates the parse tree with cost operations to
- btain the computation tree
∙ What if QREs map input streams to computation trees?
11
SLIDE 22
parse trees and computation trees
split-plus(countP1, Σ5 → 0) Σ5 · Σ5 ∗ P1 · · · P2 P2 P1 ∙ Attempt 1 annotates the parse tree with cost operations to
- btain the computation tree
∙ What if QREs map input streams to computation trees?
11
SLIDE 23
parse trees and computation trees
split-plus(countP1, Σ5 → 0) Σ5 + + 1 · · · 1 ∙ Attempt 1 annotates the parse tree with cost operations to
- btain the computation tree
∙ What if QREs map input streams to computation trees?
11
SLIDE 24
parse trees and computation trees
split-plus(countP1, Σ5 → 0) Σ5 + + 1 · · · 1 ∙ Attempt 1 annotates the parse tree with cost operations to
- btain the computation tree
∙ What if QREs map input streams to computation trees?
11
SLIDE 25
quantitative regular expressions
Key Insight QREs map input streams to terms over the cost domain ∙ Terms contain parameters ∙ f(aaab) = 5 + p, where p is a parameter ∙ Parameter substitution is also an operation
12
SLIDE 26
quantitative regular expressions
Key Insight QREs map input streams to terms over the cost domain ∙ Terms contain parameters ∙ f(aaab) = 5 + p, where p is a parameter ∙ Parameter substitution is also an operation
12
SLIDE 27
function combinators
substitution
Parameter subtitution, f[p/g] ∙ Say f(w) = tf, ∙ and g(w) = tg ∙ f[p/g](w) = tf[p/tg]
13
SLIDE 28
function combinators
concatenation, take 2
split(f →p g) w1 w2 f g Output
14
SLIDE 29
function combinators
concatenation, take 2
split(f →p g) w1 w2 f g Output p
14
SLIDE 30
function combinators
concatenation, take 2
Simulating split-plus(f, g) w1 w2 f g′ Output p ∙ split-plus(f, g) = split(f →p g′) ∙ g w p g w
15
SLIDE 31
function combinators
concatenation, take 2
Simulating split-plus(f, g) w1 w2 f g′ Output p ∙ split-plus(f, g) = split(f →p g′) ∙ g′(w) = p + g(w)
15
SLIDE 32
function combinators
iteration, take 2
w1 w2 wk−1 wk
- p
f f f f p p p
16
SLIDE 33
function combinators
iteration, take 2
w1 w2 wk−1 wk
- p
f f f f p p p
16
SLIDE 34
quantitative regular expressions
Basic functions: a → d, regex → term Conditional choice: f else g Concatenation: split(f →p g), split(f ←p g) Function iteration: iter→(f), iter←(f) Cost operations: op(f1, f2, . . . , fk), f[p/g] ∙ QREs map string to terms ∙ Structural operators decoupled from cost operators
17
SLIDE 35
quantitative regular expressions
Basic functions: a → d, regex → term Conditional choice: f else g Concatenation: split(f →p g), split(f ←p g) Function iteration: iter→(f), iter←(f) Cost operations: op(f1, f2, . . . , fk), f[p/g] ∙ QREs map string to terms ∙ Structural operators decoupled from cost operators
17
SLIDE 36
quantitative regular expressions
∙ Was our choice of combinators ad-hoc? What functions can the formalism express? ∙ Given f and an input stream w, can we efficiently compute f(w)?
18
SLIDE 37
expressiveness
SLIDE 38
regular cost functions [lics 2013]
Languages Σ∗ → bool ≡ Finite automata Cost functions Σ∗ → R ≡ ? Regular languages have many appealing properties: many natural equi-expressive characterizations, robust closure properties, decidable analysis problems, practical utility
20
SLIDE 39
cost register automata [lics 2013]
q0 start q1
d ≥ 0/bal := bal + d d = endm/bal := bal + 10 d < 0/bal := bal + d d ∈ R/bal := bal + d d = endm
∙ Finite state space, finitely many registers ∙ Registers hold terms: Parameterized by set of operations ∙ Equivalent to MSO definable string-to-term transducers ∙ Closed under regular lookahead, input reversal, etc
21
SLIDE 40
cost register automata
Theorem QREs can express exactly the same functions as CRAs
22
SLIDE 41
cras → qres
Taste of the completeness proof ∙ Piggy-back on DFA to regular expression translation Construct Ri(q, q′): all strings from q to q′ while only traversing states less than qi ∙ Construct fi(q, q′, x): express final value of register x as a DReX expression
23
SLIDE 42
cras → qres
Taste of the completeness proof ∙ Capture data flows using “shapes” ∙ Construct a partial order over shapes, and use as basis for induction x y z x y z
24
SLIDE 43
fast evaluation algorithms
SLIDE 44
evaluation algorithms
Given a QRE f and an input stream w, find f(w) ∙ QREs separate intent from evaluation ∙ If QREs are unambiguous, then f(w) can be computed with a single pass over w in time O(|w| · poly(|f|))
26
SLIDE 45
evaluation algorithms
split-plus(countP1, Σ5 → 0) 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w
27
SLIDE 46
evaluation algorithms
split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w
27
SLIDE 47
evaluation algorithms
split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w
27
SLIDE 48
evaluation algorithms
split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O poly f , irrespective of w
27
SLIDE 49
evaluation algorithms
split-plus(countP1, Σ5 → 0) Σ5 5 potential parse trees needed at each step ∙ Idea 1: Statically bound number of potential parse trees: O(poly(|f|)), irrespective of |w|
27
SLIDE 50
evaluation algorithms
+ 3 min 9 + 2 x min x min x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!
28
SLIDE 51
evaluation algorithms
+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!
28
SLIDE 52
evaluation algorithms
+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min 12 + 5 x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are , , , min, max, avg, then space usage is polynomially bounded too!
28
SLIDE 53
evaluation algorithms
+ 3 min 9 + 2 x min + 3 9 + 3 + 2 x min 12 + 5 x ∙ Term compression ensures intermediate terms of bounded size ∙ Idea 2: If only operators are ∗, +, −, min, max, avg, then space usage is polynomially bounded too!
28
SLIDE 54
conclusion
SLIDE 55
conclusion
∙ Introduced Quantitative Regular Expressions (QRE) ∙ Idea of function combinators
∙ Function descriptions are modular ∙ Regular parsing of the input data stream
∙ Simple, expressive programming model for stream processing, with strong theoretical foundations and fast evaluation algorithms
30
SLIDE 56
quantitative regular expressions
Languages Σ∗ → bool ≡ Regular expressions Cost functions Σ∗ → R ≡ QREs ∙ Also works for Σ∗ → N, Σ∗ → Q, … ∙ Σ∗ → D, for arbitrary cost domain D ∙ Key insight: Generalizing to string-to-term transformations
31
SLIDE 57
conclusion
∙ Expressively equivalent to regular cost functions / cost register automata ∙ Fast one-pass evaluation algorithms for unambiguous expressions ∙ Low space usage if only operations used are ∗, +, −, min, max, avg
32
SLIDE 58
conclusion
future work
∙ Approximate evaluation algorithms for certain operations such as iter-median (Jointly with Sanjeev Khanna and Kostas Mamouras) ∙ Exploring connections to streaming databases
33
SLIDE 59