scaling up symbolic reasoning for relational queries
play

Scaling Up Symbolic Reasoning for Relational Queries or, speeding - PowerPoint PPT Presentation

Scaling Up Symbolic Reasoning for Relational Queries or, speeding up debugging & verification of database queries Chenglong Wang , Alvin Cheung, Ras Bodik University of Washington 1 Relational Queries The language between human and


  1. Scaling Up Symbolic Reasoning for Relational Queries or, speeding up debugging & verification of database queries Chenglong Wang , Alvin Cheung, Ras Bodik University of Washington � 1

  2. Relational Queries • The language between human and relational databases (tables) Select (filter & projection) Select val From T Where color = red; � 2

  3. Relational Queries • The language between human and relational databases (tables) Select Join (filter & projection) Select val From T T 1 Join T 2 On T 1 .color=T 2 .color; Where color = red; � 3

  4. Relational Queries • The language between human and relational databases (tables) Select Group Join (filter & projection) & Aggregation Select val Select color, Sum(val) From T From T T 1 Join T 2 On T 1 .color=T 2 .color; Where color = red; Group by color; � 4

  5. “The Count Bug” Rewrite rules for nested queries q 1 q 2 S’ = (Select S.c n , Agg (S.c m ) FROM S Select R.c k Group By S.cn): From R Where R.c h = (Select Agg (S.c m ) Select R.c k From S From R Where R.c h = (Select Agg (S’.c m ) Where S.c n = S.c p ); From S’ 1982 “On Optimizing an Where T’.c n = R.c p ); SQL-like Nested Query” (Kim Won) � 5

  6. “The Count Bug” Rewrite rules for nested queries q 1 q 2 S’ = (Select S.c n , Agg (S.c m ) FROM S Select R.c k Group By S.cn): From R Where R.c h = (Select Agg (S.c m ) Select R.c k From S From R Where R.c h = (Select Agg (S’.c m ) Where S.c n = S.c p ); From S’ 1982 “On Optimizing an Where T’.c n = R.c p ); SQL-like Nested Query” (Kim Won) Found a bug in the 1982 paper! q 1 q 2 1987 “Optimization of Nested SQL Queries Revisited” (Ganski & Wong) � 6

  7. Reasoning Tasks Verification q ≡ q’ q, q’ “Are two queries equivalent on ALL inputs” Property Checking (for optimization) q q(T) = empty “Can the query return empty output on SOME input?” Mutation testing / Grading q, q’ q(T) ≠ q’(T) “Find a distinguishing input between queries.” � 7

  8. Relational Queries tens to hundreds of HUGE tables …… complex analytical functions generated by computer highly optimized “analyze transition plays important roles in industry history” can’t a ff ord 5 years to find a bug! � 8

  9. Automatic Reasoning Check whether q 1 is equivalent to q 2 (on ALL inputs) assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 (queries) (queries) (property) (property) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 9

  10. Automatic Reasoning (unbounded) equivalence is undecidable Check whether q 1 is equivalent to q 2 (on ALL inputs) assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 (queries) (queries) (property) (property) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 10

  11. Symbolic Reasoning Check whether q 1 is equivalent to q 2 (on ALL inputs within a search space) tables with at most assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 k rows (queries) (queries) (property) (property) (search space) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 11

  12. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL tables with at most k tuples” (1) Target queries q 2 : Select id, val q 1 : Select id, val From T From T Where id ≠ 1 Where id > 1 (2) Search space tables with at most k rows Solver (3) Property q 1 (T) ≠ q 2 ( T) � 12

  13. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL tables with at most k tuples” (1) Target queries q 2 : Select id, val q 1 : Select id, val From T From T id val Where id ≠ 1 Where id > 1 0 1 1 2 (2) Search space tables with at q 1 q 2 most k rows Solver id val id val (3) Property (empty) 0 1 q 1 (T) ≠ q 2 ( T) � 13

  14. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” Grouping (1) Target queries & aggregation q 1 , q 2 “ Select f(val) From T (2) Search space Group By id ” tables with at most k rows Solver (3) Property T out1 ≠ T out2 � 14

  15. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … id val id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 T out1 ≠ T out2 2 y 2 … k y k � 15

  16. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … id val id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 Computationally T out1 ≠ T out2 2 y 2 expensive … T out1 ⊂ T out2 & T out2 ⊂ T out1 k y k � 16

  17. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … Unsatisfying id val Scalability id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 Computationally T out1 ≠ T out2 2 y 2 expensive … T out1 ⊂ T out2 & T out2 ⊂ T out1 k y k � 17

  18. Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” Grouping (1) Target queries & aggregation “Small Model” q 1 , q 2 “ Select f(val) A smaller search space From T to achieve same (2) Search space Group By id ” reasoning guarantee Exponential ways to tables with at partition the table most k rows Solver (3) Property Computationally T out1 ≠ T out2 expensive T out1 ⊂ T out2 & T out2 ⊂ T out1 � 18

  19. Space Refinement “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” T out1 ≠ q 1, q 2 T out2 (queries) (property) tables with at most k rows tables with at most k rows S’ (refined search space) S (search space) “Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too. � 19

  20. Space Refinement “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” T out1 ≠ q 1, q 2 T out2 (queries) (property) provenance analysis tables with at most k rows tables with at most k rows S’ (refined search space) S (search space) “Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too. � 20

  21. Insight from Property • Many properties requires only one tuple in the output to invalidate. T from search space S “Check whether q 1 , q 2 are equivalent” q 1 (T) ≠ q 2 ( T ) q 1 q 2 Exists a row r with di ff erent r multiplicities in T out1 and T out2 T out1 , r ∉ T out2 → q 1 (T) ≠ q 2 ( T ) r ∈ � 21

  22. Insight from the Property • Many important properties requires only one tuple in the output to be invalidated. T from search space S T’ q 1 q 2 q 1 q 2 r r r ∈ T out1 , r ∉ T out2 r ∈ T out1 , r ∉ T out2 T’ can also distinguish q1 from q2! � 22

  23. Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 : Select id, max(val) … From T id val Group By id ? q 2 : Select id, min(val) id val q 2 From T a b Group By id … � 23

  24. Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 : Select id, max(val) … From T id val Group By id ??? q 2 : Select id, min(val) … id val q 2 From T a b Group By id … � 24

  25. Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 id val q 1 : Select id, max(val) … a c From T id val Group By id ??? id val a ? q 2 : Select id, min(val) … a ? id val q 2 From T id val q 2 a b Group By id a b … T’={r ∈ T| r.id = a} � 25

  26. Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 id val q 1 : Select id, max(val) … a c From T id val Group By id ??? id val a ? q 2 : Select id, min(val) … a ? id val q 2 From T id val q 2 a b Group By id a b … T’={r ∈ T| r.id = a} refine tables with at S most k rows tables with at most k rows S’ = {T ∈ S | T contain only one group} (the group with id “ a ”) � 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend