SLIDE 3 10/30/17 3
Revenue Profit Price Product Price Product
Smoke: Fast Lineage + Interactions
backward_trace() view_refresh() refresh(backward_trace( ,input))
⨝
Revenue Profit
SPLOT = SELECT 8 AS radius, 'gray' AS stroke, 'gray' AS fill, lscale(revenue, sx) AS center_x, lscale(profit, sy) AS center_y, FROM A, B, sx, sy WHERE …; HIST = SELECT 4 AS width, 'blue' AS fill, hscale(price, hx) AS height FROM B, C, hx WHERE …; render(SELECT * FROM SPLOT); render(SELECT * FROM HIST); BT = BACKWARD TRACE FROM HIST@vnow-1 AS HS, clicked WHERE clicked.id = HS.id TO A; SPLOT = SELECT ..., 'red' AS fill FROM BT, B WHERE … UNION SELECT ..., 'gray' AS fill FROM (A EXCEPT BT), B WHERE … HIST = SELECT ..., 'red' AS fill FROM BT, C WHERE … UNION SELECT ..., 'blue’ AS fill FROM (A EXCEPT BT), C WHERE …
interaction(vis(database))
SQL(Lineage( )) SQL
Fine-grained Lineage Capture
22
id qty $ j1 1 6 40 j2 1 1 40 j3 2 9 5 id qty b 1 1 6 b 2 1 1 b 3 2 9 id $ a1 1 40 a2 2 5
𝛿"#,%&'()*+∗$)(A⨝B) ⨝
id
sum
1 280
2 45
γ
Fine-grained Lineage Capture
23
id
sum
1 280
2 45 id qty $ j1 1 6 40 j2 1 1 40 j3 2 9 5 id qty b 1 1 6 b 2 1 1 b 3 2 9 id $ a1 1 40 a2 2 5
Capture lineage graph w/ low-overhead to answer lineage queries efficiently
𝛿"#,%&'()*+∗$)(A⨝B)
How do people capture lineage today?
Lazy aka don’t capture Eager via Query rewrites Eager via Instrumentation
24
Lazy Approach
25
id
sum
1 280
2 45 id qty $ j1 1 6 40 j2 1 1 40 j3 2 9 5 id qty b 1 1 6 b 2 1 1 b 3 2 9 id $ a1 1 40 a2 2 5
Rewrite lineage qs into SQL
Backward_trace(o1,B) = σid=1(B)
⨝
[C [Cui et al. and Ikeda et al.]
No No c capt pture overhead Go Good f d for h high gh-se selectivity Ba Bad fo for low-se selectivity No No su support for non-in invertib ible le op
Co Complex rewrite predica tes
CO CONS PR PROS
γ
Eager Logical Denormalized
27
id $ pid $ pid qty
1 280 1 40 1 6
1 280 1 40 1 1
2 45 2 5 2 9 id qty $ j1 1 6 40 j2 1 1 40 j3 2 9 5 id qty b 1 1 6 b 2 1 1 b 3 2 9 id $ a1 1 40 a2 2 5
A B
Rewrite original query into single big query
⨝’
γ’
Le Leverage DB query
Fl Flex exibility Us Use existing da tab abas ase In Introduces redundanc y Re Resu sult must st be further processed In Index result to use it Ad Addtl pr project ion t t
get r real result
CO CONS PR PROS
[Perm, Gpro m, and DB No tes ]