Scaling Up Symbolic Reasoning for Relational Queries or, speeding - - PowerPoint PPT Presentation

scaling up symbolic reasoning for relational queries
SMART_READER_LITE
LIVE PREVIEW

Scaling Up Symbolic Reasoning for Relational Queries or, speeding - - PowerPoint PPT Presentation

Scaling Up Symbolic Reasoning for Relational Queries or, speeding up debugging & verification of database queries Chenglong Wang , Alvin Cheung, Ras Bodik University of Washington 1 Relational Queries The language between human and


slide-1
SLIDE 1

Scaling Up Symbolic Reasoning for Relational Queries

Chenglong Wang, Alvin Cheung, Ras Bodik University of Washington

1

  • r, speeding up debugging & verification of database queries
slide-2
SLIDE 2

Relational Queries

  • The language between human and relational databases (tables)

Select (filter & projection)

2

Select val From T Where color = red;

slide-3
SLIDE 3

Relational Queries

  • The language between human and relational databases (tables)

Select (filter & projection) Join

3

Select val From T Where color = red; T1 Join T2 On T1.color=T2.color;

slide-4
SLIDE 4

Relational Queries

  • The language between human and relational databases (tables)

Select (filter & projection) Join Group & Aggregation

4

Select val From T Where color = red; T1 Join T2 On T1.color=T2.color; Select color, Sum(val) From T Group by color;

slide-5
SLIDE 5

“The Count Bug”

5

1982 “On Optimizing an SQL-like Nested Query” (Kim Won)

Rewrite rules for nested queries

Select R.ck From R Where R.ch = (Select Agg(S.cm) From S Where S.cn = S.cp); S’ = (Select S.cn, Agg(S.cm) FROM S Group By S.cn): Select R.ck From R Where R.ch = (Select Agg(S’.cm) From S’ Where T’.cn = R.cp);

q1 q2

slide-6
SLIDE 6

“The Count Bug”

6

1982 “On Optimizing an SQL-like Nested Query” (Kim Won) 1987 “Optimization of Nested SQL Queries Revisited” (Ganski & Wong)

Rewrite rules for nested queries

Select R.ck From R Where R.ch = (Select Agg(S.cm) From S Where S.cn = S.cp); S’ = (Select S.cn, Agg(S.cm) FROM S Group By S.cn): Select R.ck From R Where R.ch = (Select Agg(S’.cm) From S’ Where T’.cn = R.cp);

q1 q2 q1 q2

Found a bug in the 1982 paper!

slide-7
SLIDE 7

Reasoning Tasks

Mutation testing / Grading

“Find a distinguishing input between queries.”

q

Property Checking (for optimization)

“Can the query return empty output on SOME input?”

Verification

“Are two queries equivalent on ALL inputs”

q, q’ q, q’ q ≡ q’ q(T) = empty q(T) ≠ q’(T)

7

slide-8
SLIDE 8

Relational Queries

…… tens to hundreds of HUGE tables generated by computer complex analytical functions plays important roles in industry “analyze transition history” highly optimized

can’t afford 5 years to find a bug!

8

slide-9
SLIDE 9

Automatic Reasoning

9

q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T) q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T) Check whether q1 is equivalent to q2 (on ALL inputs)

slide-10
SLIDE 10

Automatic Reasoning

10

q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T) q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T) Check whether q1 is equivalent to q2 (on ALL inputs) (unbounded) equivalence is undecidable

slide-11
SLIDE 11

Symbolic Reasoning

11

q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T) q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T)

(search space)

tables with at most k rows

Check whether q1 is equivalent to q2 (on ALL inputs within a search space)

slide-12
SLIDE 12

Symbolic Reasoning

Solver

“Check whether q1, q2 are equivalent on ALL tables with at most k tuples”

12

(1) Target queries (2) Search space (3) Property

q1(T) ≠ q2(T)

tables with at most k rows

q1: Select id, val From T

Where id > 1

q2: Select id, val From T

Where id ≠ 1

slide-13
SLIDE 13

Symbolic Reasoning

Solver

id val 1 1

2

q1 id val (empty) q2 id val 1

13

(1) Target queries (2) Search space (3) Property

q1(T) ≠ q2(T)

tables with at most k rows

q1: Select id, val From T

Where id > 1

q2: Select id, val From T

Where id ≠ 1

“Check whether q1, q2 are equivalent on ALL tables with at most k tuples”

slide-14
SLIDE 14

Symbolic Reasoning

Solver

(1) Target queries (2) Search space (3) Property

Tout1 ≠ Tout2 q1, q2

tables with at most k rows

Grouping & aggregation “ Select f(val) From T

Group By id ”

14

“Check whether q1, q2 are equivalent on ALL inputs within size k”

slide-15
SLIDE 15

Symbolic Reasoning

Solver

(1) Target queries (2) Search space (3) Property

Tout1 ≠ Tout2 q1, q2

tables with at most k rows

Grouping & aggregation “ Select f(val) From T

Group By id ”

15

“Check whether q1, q2 are equivalent on ALL inputs within size k”

id val x1 y1 x2 y2 … xk yk id val 1 y1 2 y2 … k yk id val 1 y1 1 y2 … 1 yk id val 1 y1 1 y2 … 2 yk

Exponential ways to partition the table

slide-16
SLIDE 16

Symbolic Reasoning

Solver

(1) Target queries (2) Search space (3) Property

Tout1 ≠ Tout2 q1, q2

tables with at most k rows

Grouping & aggregation Computationally expensive Tout1 ⊂ Tout2 & Tout2 ⊂ Tout1 “ Select f(val) From T

Group By id ”

16

“Check whether q1, q2 are equivalent on ALL inputs within size k”

id val x1 y1 x2 y2 … xk yk id val 1 y1 2 y2 … k yk id val 1 y1 1 y2 … 1 yk id val 1 y1 1 y2 … 2 yk

Exponential ways to partition the table

slide-17
SLIDE 17

Symbolic Reasoning

Solver

(1) Target queries (2) Search space (3) Property

Tout1 ≠ Tout2 q1, q2

tables with at most k rows

Grouping & aggregation Computationally expensive Tout1 ⊂ Tout2 & Tout2 ⊂ Tout1 “ Select f(val) From T

Group By id ”

17

“Check whether q1, q2 are equivalent on ALL inputs within size k”

id val x1 y1 x2 y2 … xk yk id val 1 y1 2 y2 … k yk id val 1 y1 1 y2 … 1 yk id val 1 y1 1 y2 … 2 yk

Unsatisfying Scalability

Exponential ways to partition the table

slide-18
SLIDE 18

Symbolic Reasoning

Solver

(1) Target queries (2) Search space (3) Property

Tout1 ≠ Tout2 q1, q2

tables with at most k rows

Grouping & aggregation Computationally expensive Tout1 ⊂ Tout2 & Tout2 ⊂ Tout1 “ Select f(val) From T

Group By id ”

18

“Check whether q1, q2 are equivalent on ALL inputs within size k”

“Small Model” A smaller search space to achieve same reasoning guarantee Exponential ways to partition the table

slide-19
SLIDE 19

Space Refinement

“Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too.

q1, q2

(queries)

Tout1 ≠ Tout2

(property) S (search space) tables with at most k rows

19

“Check whether q1, q2 are equivalent on ALL inputs within size k”

S’ (refined search space) tables with at most k rows

slide-20
SLIDE 20

Space Refinement

“Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too.

q1, q2

(queries)

Tout1 ≠ Tout2

(property) S (search space) tables with at most k rows

provenance analysis

20

“Check whether q1, q2 are equivalent on ALL inputs within size k”

S’ (refined search space) tables with at most k rows

slide-21
SLIDE 21

Insight from Property

“Check whether q1, q2 are equivalent”

  • Many properties requires only one tuple in the output to invalidate.

q1(T) ≠ q2(T) Exists a row r with different multiplicities in Tout1 and Tout2 q1 q2

T from search space S

r r ∈ Tout1, r∉Tout2 → q1(T) ≠ q2(T)

21

slide-22
SLIDE 22

Insight from the Property

  • Many important properties requires only one tuple in the output to be invalidated.

q1 q2

T from search space S

r r ∈ Tout1, r∉Tout2 q1 q2

T’

r r ∈ Tout1, r∉Tout2

T’ can also distinguish q1 from q2!

22

slide-23
SLIDE 23

Provenance Analysis

23

id val ? id val … id val a b …

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

Assume r=(a, b) is the output tuple showing the difference between two queries

r ∈ Tout1, r∉Tout2

slide-24
SLIDE 24

Provenance Analysis

24

id val ??? … id val … id val a b …

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

Assume r=(a, b) is the output tuple showing the difference between two queries

r ∈ Tout1, r∉Tout2

slide-25
SLIDE 25

Provenance Analysis

25

id val ??? … id val … id val a b …

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

id val a ? a ? id val a c id val a b

q2 q1

T’={r ∈ T| r.id = a}

Assume r=(a, b) is the output tuple showing the difference between two queries

r ∈ Tout1, r∉Tout2

slide-26
SLIDE 26

Provenance Analysis

26

id val ??? … id val … id val a b …

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

id val a ? a ? id val a c id val a b

q2 q1

S tables with at most k rows

T’={r ∈ T| r.id = a}

S’ = {T ∈ S | T contain only one group} (the group with id “a”) tables with at most k rows refine

Assume r=(a, b) is the output tuple showing the difference between two queries

r ∈ Tout1, r∉Tout2

slide-27
SLIDE 27

Space Refinement

27

id val 1 2 1 3 1 3 id val 2 1 3 id val 1 1 3

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

id val 1 2 id val 2 id val 1

q2 q1

tables with at most k rows S’ = {T ∈ S | T contain only one group}

slide-28
SLIDE 28

Space Refinement

28

id val 1 2 1 3 1 3 id val 2 1 3 id val 1 1 3

q2 q1

q1: Select id, max(val) From T

Group By id

q2: Select id, min(val) From T

Group By id

id val 1 2 id val 2 id val 1

q2 q1

tables with at most k rows S’ = {T ∈ S | T contain only one group}

slide-29
SLIDE 29

Symbolic Provenance Analysis

29

  • Inductively define the analysis rules for

different operators

  • How to combine provenance from

multiple queries S → S’ If exists T ∈ S satisfying the property, we can find one in the S’ too. Analysis complexity: linear to the query size.

slide-30
SLIDE 30

Experiment

30

Bounded Verification

“Verify two queries are equivalent on ALL inputs.” q1(T) ≣ q2(T) Benchmarks: 46 rules from Apache Calcite

Test generation

“Can the query return empty output?” q(T) = empty “Find a distinguishing input between queries.” q1(T) ≠ q2(T) Benchmarks: 15 student submissions & prior work

slide-31
SLIDE 31

Experiment

31

Bounded Verification

“Verify two queries are equivalent on ALL inputs.” q1(T) ≣ q2(T) Benchmarks: 46 rules from Apache Calcite

Test generation

“Can the query return empty output?” q(T) = empty “Find a distinguishing input between queries.” q1(T) ≠ q2(T) Benchmarks: 15 student submissions & prior work

Process

Measure solving speed with and without space refinement

1. Increase search space size until hitting 10 minutes limit without refinement 2. Re-run the same search space with space refinement

slide-32
SLIDE 32

Experiment

32

Bounded Verification

“Verify two queries are equivalent on ALL inputs.” q1(T) ≣ q2(T) Benchmarks: 46 rules from Apache Calcite

Test generation

“Can the query return empty output?” q(T) = empty “Find a distinguishing input between queries.” q1(T) ≠ q2(T) Benchmarks: 15 student submissions & prior work

Process

Measure solving speed with and without space refinement

1. Increase search space size until hitting 10 minutes limit without refinement 2. Re-run the same search space with space refinement

Cosette SQL Solver Qex SQL Solver

slide-33
SLIDE 33

Experiment — Verification

33

Bounded Verification

“Verify that two queries are equivalent on ALL inputs with no more than k tuples.”

Result

Cosette with and without refinement Qex with and without refinement

slide-34
SLIDE 34

Experiment — Verification

34

Bounded Verification

“Verify that two queries are equivalent on ALL inputs with no more than k tuples.”

Result

Cosette with and without refinement Qex with and without refinement

  • Up to 400x speed up & little
  • verhead.
  • Speedup is independent

from solver implementation!

slide-35
SLIDE 35

Experiment — Verification

35

Bounded Verification

“Verify that two queries are equivalent on ALL inputs with no more than k tuples.”

Result

Benefit from exponential reduction of the number

  • f groups

Provenance is not able to refine search space

slide-36
SLIDE 36

Experiment - Test Generation

36

Test generation

“Can the query return empty output on SOME input?” “Find a distinguishing input between queries.”

Result

cosette with and without refinement qex with and without refinement

slide-37
SLIDE 37

Experiment - Test Generation

37

Test generation

“Can the query return empty output on SOME input?” “Find a distinguishing input between queries.”

Result

Distinguishing input size is small enough, benefit is marginal Reduction of the number of groups

slide-38
SLIDE 38

Limitations

38

  • Property supported are those can be invalidate by one tuple in the output.

q1(T) ≠ q2(T) q1(T) ≠ empty exists r, ϕ(q(T)) “q1(T) contains exactly 5 tuples” “every tuple in q(T) has same multiplicity” Generalize to arbitrary property

  • Improving provenance analysis precision.
slide-39
SLIDE 39

Summary

Scaling Up Symbolic Reasoning for Relational Queries (1) Symbolic Reasoning (2) Scaling Up q

(query)

ϕ

(property) (search space)

tables with at most k rows

provenance analysis (3) Result

(1) smaller search space & easier to traverse (2) equivalent for reasoning

Low analysis overhead & over 100x speed up in

(1) bounded verification (2) test generation

39

tables with at most k rows tables with at most k rows

(refined search space)

q1, q2

(queries)

assert q1≠q2

(property)

unsatisfiable (proved q1=q2) found T, q1(T)≠q2(T)

(search space)

tables with at most k rows

slide-40
SLIDE 40

Hidden Slides!

40

slide-41
SLIDE 41

41

Symbolic Provenance Analysis

Multiple provenance exists for tout for a query q

(strong) (weak) Select id, min(val) From T Where val > 0 Group By id

id val a

  • 1

a 1 a 3 c 1 … id val a

  • 1

a 1 a 3 b 1 … id val a

  • 1

a 1 a 3 b 1 … id val a 1 … id val a

  • 1

a 1 a 3 b 1 … id val a

  • 1

a 1 a 3 c 1 …

Choice of abstraction trades between the analysis overhead and pruning power