Exploiting Incrementality with DBToaster Monitoring Programs - - PowerPoint PPT Presentation

exploiting incrementality with dbtoaster monitoring
SMART_READER_LITE
LIVE PREVIEW

Exploiting Incrementality with DBToaster Monitoring Programs - - PowerPoint PPT Presentation

Exploiting Incrementality with DBToaster Monitoring Programs Network Monitoring Server Status Task Allocations Task Properties Servers Per Task > Task QOS? Move Task to New Servers Computational Advertising Available Ads Site


slide-1
SLIDE 1

Exploiting Incrementality with DBToaster

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Monitoring Programs

slide-7
SLIDE 7

Network Monitoring

Server Status Task Allocations Task Properties Move Task to New Servers Servers Per Task > Task QOS?

slide-8
SLIDE 8

Computational Advertising

User Clicks Site Information Which Ad To Show? Available Ads Good Ad Offers

slide-9
SLIDE 9

Monitoring Programs

Views Reactions Actions State Updates Actions Internal State

Read On-Change

slide-10
SLIDE 10

Monitoring Programs

Spec

Agile View

slide-11
SLIDE 11
  • Existing Tools
  • DBToaster
  • Cumulus
slide-12
SLIDE 12

Stream Processors

slide-13
SLIDE 13

Stream Processors

slide-14
SLIDE 14

Stream Processors

slide-15
SLIDE 15

Stream Processors

slide-16
SLIDE 16

Stream Processors

slide-17
SLIDE 17

Stream Processors

slide-18
SLIDE 18

Stream Processors

slide-19
SLIDE 19

Stream Processors

No Persistent State

slide-20
SLIDE 20

Stream Processors

No Persistent State but also dynamic ^

slide-21
SLIDE 21

Incremental View Maintenance

QUERY

slide-22
SLIDE 22

R S T

Incremental View Maintenance

QUERY :=

ON CHANGE:

slide-23
SLIDE 23

R S T

Incremental View Maintenance

QUERY

ON CHANGE: += Δ(

)

Simpler But still slow

slide-24
SLIDE 24
  • Existing Tools
  • DBToaster
  • Cumulus
slide-25
SLIDE 25

DBToaster

C++

GCC

Spec

slide-26
SLIDE 26

DBToaster

C++

GCC

Spec

slide-27
SLIDE 27

DBToaster

  • Exploit Incrementality
  • Pick the Right Data Model
  • ... the right representation
  • ... understand the platform
  • Borrow (Liberally) from Other Fields
  • ... use set-at-a-time optimizations (PL)
  • ... generate machine code (Compilers)
  • ... and others

Deltas Materialization

Recursive View Compiler Functional Optimizer Materialization Optimizer Delta Computation Code Generators Runtimes C++ Hadoop Cumulus Multicore

slide-28
SLIDE 28

Recursive Delta Compilation

QUERY ON : += ΔR

S T

Δ ( )

ΔR

R

slide-29
SLIDE 29

Recursive Delta Compilation

QUERY ON : += ΔR

Δ (R⋈S⋈T)is Simpler than R⋈S⋈T

Usually ^ Δ is Closed

Δ (R⋈S⋈T) has Finite Support for ΔR

Usually ^

(Koch, PODS ‘10)

ΔR

S T

Δ ( )

ΔR

R

ΔR

slide-30
SLIDE 30

Recursive Delta Compilation

R(A,B) S(B,C) T(C,D) QUERY :=

⋈ B ⋈ C

  • f

SUM(R.A * T.D)

slide-31
SLIDE 31

Recursive Delta Compilation

S(B,C) T(C,D) QUERY +=

⋈ C

  • fSUM(

* T.D)

ON : +R(α,β)

α

S(β,C) T(C,D)

⋈ C

  • fSUM(T.D)

S(β,C)

slide-32
SLIDE 32

Recursive Delta Compilation

T(C,D) QUERY +=

⋈ C

  • fSUM(

* T.D)

ON : +R(α,β)

α

S(β,C) T(C,D)

⋈ C

  • f

SUM(T.D)

S(β,C)

m1[β := m1[β] ]

slide-33
SLIDE 33

Recursive Delta Compilation

QUERY +=

*

ON : +R(α,β)

α

T(C,D)

  • f

SUM(T.D)

+=

m1[β]

ON : +S(β’,ɣ)

] ’

T(ɣ,D)

m1[β

slide-34
SLIDE 34

Recursive Delta Compilation

QUERY +=

*

ON : +R(α,β)

α

  • f

SUM(T.D)

+=

m1[β]

ON : +S(β’,ɣ)

] ’

T(ɣ,D)

m1[β ] m2[ɣ m2[ɣ]

:=

slide-35
SLIDE 35

Recursive Delta Compilation

QUERY +=

*

ON : +R(α,β)

α SUM( )

+=

m1[β]

ON : +S(β’,ɣ)

] ’ m1[β ] m2[ɣ m2[ɣ]

ON : +T(ɣ’,δ)

’ += δ

slide-36
SLIDE 36

Recursive Delta Compilation

QUERY +=

*

ON : +R(α,β)

α

+=

m1[β]

ON : +S(β’,ɣ)

] ’ m1[β ] m2[ɣ m2[ɣ]

ON : +T(ɣ’,δ)

’ += δ

q m1 m2 m2 m1 +R +S +T

slide-37
SLIDE 37

q m1 m2 m2 m1 +R +S +T m4 m5 m7 +R +S m6 m9 m3 +S +T +T +T +T +R +R +S m8 +S +R

View Hierarchy

slide-38
SLIDE 38

Maintenance Program

ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D

slide-39
SLIDE 39

Maintenance Program

C++

(DBToaster; CIDR ’11)

slide-40
SLIDE 40

But...

Nested Subqueries Non-Equi-Joins

Δ (R⋈S⋈T)is Simpler than R⋈S⋈T

Usually ^

ΔR

Usually ^

Δ (R⋈S⋈T) has Finite Support for ΔR

ΔR

slide-41
SLIDE 41

Nested Subqueries

Nested Subqueries

R(A,B) S(C) QUERY :=

  • f

COUNT()

where A =

SUM(C) of ( )

slide-42
SLIDE 42

Nested Subqueries

S(C)

  • f

COUNT() SUM(C) of

Step 1: Step 2:

where

[result of step 1]

A =

R(A,B)

m1[]

  • f

COUNT()

R(A,B)

slide-43
SLIDE 43

Nested Subqueries

  • f

COUNT()

Step 2:

where

[result of step 1]

A =

R(A,B)

  • f

COUNT()

R(A,B)

m2[A]:= m2[

[result of step 1]]

slide-44
SLIDE 44

Partial Materialization

m1[] m2[

Perform computations at maintenance-time

]

Materialize the query in parts

slide-45
SLIDE 45

Non-Equality Predicates

R(A) S(B,C) QUERY :=

  • f

COUNT()

where A < B

slide-46
SLIDE 46

Non-Equality Predicates

S(B,C) QUERY +=

  • f

COUNT()

where α < B ON : +R(α)

m1[B] := S(B,C)

  • f

COUNT() group by B

Partial Materialization

SUM(m1[B])

slide-47
SLIDE 47

Partial Materialization

  • Nested Subqueries
  • Non-equality predicates
  • Memory Constraints
  • High Maintenance Cost
  • Specialized Datastructures

Materialization Optimizer

slide-48
SLIDE 48

VWAP

15 30 45 60 Time (min) Full Compilation Depth 1 (IVM) Depth 0 (Repeated) 1 2 3 4 Refreshes (1000/s) 10 20 30 40 0.2 0.4 0.6 0.8 1 Memory (MB) Fraction of Stream Trace Processed

SELECT sum(b1.price * b1.volume) FROM bids b1 WHERE 0.25 * (SELECT sum(b3.volume) FROM bids b3) > (SELECT sum(b2.volume) FROM bids b2 WHERE b2.price > b1.price);

Rate of View Refreshing Memory Usage Cumulative Time DBToaster IVM & Naive

slide-49
SLIDE 49

TPC-H Q3

2.5 5 7.5 10 Time (min) Full Compilation Depth 1 (IVM) Depth 0 (Repeated) 10 20 30 40 Refreshes (1000/s) 25 50 75 100 0.2 0.4 0.6 0.8 1 Memory (MB) Fraction of Stream Trace Processed

Half the Memory Usage

SELECT ORDERS.orderkey, ORDERS.orderdate, ORDERS.shippriority, SUM(extendedprice * (1 - discount)) FROM CUSTOMER, ORDERS, LINEITEM WHERE CUSTOMER.mktsegment = 'BUILDING' AND ORDERS.custkey = CUSTOMER.custkey AND LINEITEM.orderkey = ORDERS.orderkey AND ORDERS.orderdate < DATE('1995-03-15') AND LINEITEM.SHIPDATE > DATE('1995-03-15') GROUP BY ORDERS.orderkey, ORDERS.orderdate, ORDERS.shippriority;

slide-50
SLIDE 50
  • Existing Tools
  • DBToaster
  • Cumulus
slide-51
SLIDE 51

Maintenance Programs

ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D

Data-Parallel Computations Key/Value Style Datastructures

slide-52
SLIDE 52

Execution Model

ON Event(param1, param2, …) Statement 1 Statement 2 Statement 3 …

slide-53
SLIDE 53

Statement Execution

Read Compute Write [Old Version] [New Version]

slide-54
SLIDE 54

1 2

Epoch: 0

S

slide-55
SLIDE 55

1 2

Epoch: 0

S:<0,0,0> S

slide-56
SLIDE 56

1 2

Epoch: 0

S:<0,0,0>

slide-57
SLIDE 57

1 2

Epoch: 0

R T

slide-58
SLIDE 58

1 2

Epoch: 0

R:<0,0,1> R T:<0,1,0> T

slide-59
SLIDE 59

1 2

Epoch: 0

R:<0,0,1> T:<0,1,0>

slide-60
SLIDE 60

R

<0,1,3>

M2[1,3] += 2

<0,2,4>

slide-61
SLIDE 61

History

E

< , 1 , 3 >

M2[…] += M3[…]*M4[…] M3[…] M4[…] M3[…]*M4[…] M2[…] += …

<0,1,3>

Σδ

R

<0,1,3>

slide-62
SLIDE 62

M2[1,3] += 2

<0,2,4>

3

<0,2,5>

1

<0,3,1>

<1,3>→ 4

<0,1,3>

2

<0,1,2>

slide-63
SLIDE 63

M2[1,3] += 2

<0,2,4>

3

<0,2,5>

1

<0,3,1>

<1,3>→ 4

<0,1,3>

2

<0,1,2>

History

E

< , 2 , 5 >

δ

<0,2,5>

2

<0,2,4>

slide-64
SLIDE 64

Open Challenges

  • Data Placement
  • Migration
  • Batch Processing
  • Live Program Management
slide-65
SLIDE 65