A Framework for Testing Query Transformation Rules HICHAM G. - - PowerPoint PPT Presentation

a framework for testing query
SMART_READER_LITE
LIVE PREVIEW

A Framework for Testing Query Transformation Rules HICHAM G. - - PowerPoint PPT Presentation

A Framework for Testing Query Transformation Rules HICHAM G. ELMONGUI, VIVEK NARASAYYA, RAVI RAMAMURTHY SIGMOD09 Presenter: ZHANG, Yushan @Prism 1 2020/10/8 Outline Background of this research The Challenges Framework Design


slide-1
SLIDE 1

A Framework for Testing Query Transformation Rules

HICHAM G. ELMONGUI, VIVEK NARASAYYA, RAVI RAMAMURTHY SIGMOD’09 Presenter: ZHANG, Yushan @Prism

2020/10/8

1

slide-2
SLIDE 2

Outline

➢ Background of this research ➢ The Challenges ➢ Framework Design ➢ Experiment ➢ Reflections

2020/10/8

2

slide-3
SLIDE 3

Query Optimization: Overview

2020/10/8

3

Two Requirements: ➢Produce faster plans ➢Guarantee correct results

https://docs.oracle.com/database/121/TGSQL/tgsql_optcncpt.htm#TGSQL196

Background

slide-4
SLIDE 4

Transformation Rule: example

Let 𝐹 denote a relational algebra expression, 𝜄 is a predicate E.g. 𝜏𝜄1∧𝜄2 𝐹 ≡ 𝜏𝜄1(𝜏𝜄2𝐹), a cascade of 𝜏 𝐹1 ⋈ 𝜄 𝐹2 ≡ 𝐹2 ⋈𝜄 𝐹1, theta-join is commutative

Database System Concepts, 7th edition. P747-751

Background

2020/10/8

4

slide-5
SLIDE 5

Testing Aspects

➢ Coverage

➢ a given transformation rule (rule) should be executed with different queries

➢ Correctness

➢ a given transformation rule should NOT alter the results of a query ➢ a pair of transformation rules should NOT alter the results of a query

➢ Performance… (not considered in this paper)

2020/10/8

5

Background

slide-6
SLIDE 6

The Challenges

➢ Efficiently generate queries that exercise a particular rule (Generation Problem)

➢ Randomly generation [1][2] takes many trials before find a query - slow ➢ Randomly generated queries are hard to interpret - complex

➢ Efficiently execute test suites for correctness testing (Compression Problem) ➢ Intuitive approach: turn off the rules one by one, and compare the results ➢ Problem: un-optimized queries could be extremely slow

2020/10/8

6

Background The Challenges

[1] H. Bati, L. Giakoumakis, S.Herbert, A.Surna. A genetic approach for random testing of database systems. Proceedings of VLDB 2007. [2] D.Slutz, Massive Stochastic Testing of SQL. Proceeding of VLDB 1998.

slide-7
SLIDE 7

Generation with Rule Patterns

➢ Captures the sufficient condition for a rule to execute is hard

➢ Pull up GBAgg requires certain functional dependency

➢ Use Rule Patterns (necessary condition)

➢ A query exercising the given rule must have the corresponding pattern ➢ Initialize the “wholes” in the pattern ➢ Support rule composition (extends to test multiple rules)

Take-away message:

➢ reduce the trials with known necessary conditions

2020/10/8

7

Background The Challenges

Approach

slide-8
SLIDE 8

Test Suite Compression (TSC)

➢ Very high cost for executing the test suite

➢ A query could exercise multiple rules ➢ A rule is exercised by multiple rules

Intuition (bipartite graph problem):

A set of transformation rules: 𝑆 = 𝑠

1, 𝑠2, ⋯ 𝑠 𝑜

A test suite for a rule has 𝑙 queries Overall test suites for R: 𝑈𝑇 =ڂ𝑗 𝑈𝑇𝑗 (with cost) Nodes are rules and queries (with cost) Edge from 𝑠𝑗 to 𝑟𝑘 denotes 𝑠𝑗 is exercised when 𝑟𝑘 is optimized.

2020/10/8

8

Background The Challenges

Approach

𝑈𝑇1 = {𝑟1} 𝑈𝑇2 = {𝑟2} 𝐷𝑝𝑡𝑢(𝑈𝑇1) = 100 + 180 𝐷𝑝𝑡𝑢 𝑈𝑇2 = 100 + 120 𝐷𝑝𝑡𝑢(𝑈𝑇) = 𝐷𝑝𝑡𝑢(𝑈𝑇1) + 𝐷𝑝𝑡𝑢(𝑈𝑇2) and 𝑙 = 1 Goal: Minimize cost of TS Constraint: Each rule is exercised k times Question: what is the optimal solution for this example?

slide-9
SLIDE 9

Test Suite Compression (TSC)

➢ Very high cost for executing the test suite

➢ A query could exercise multiple rules ➢ A rule is exercised by multiple rules

Intuition (bipartite graph problem):

A set of transformation rules: 𝑆 = 𝑠

1, 𝑠2, ⋯ 𝑠 𝑜

A test suite for a rule has 𝑙 queries Overall test suites for R: 𝑈𝑇 =ڂ𝑗 𝑈𝑇𝑗 (with cost) Nodes are rules and queries (with cost) Edge from 𝑠𝑗 to 𝑟𝑘 denotes 𝑠𝑗 is exercised when 𝑟𝑘 is optimized.

2020/10/8

9

Background The Challenges

Approach

𝑈𝑇1 = {𝑟1} 𝑈𝑇2 = {𝑟2} 𝐷𝑝𝑡𝑢(𝑈𝑇1) = 100 + 180 𝐷𝑝𝑡𝑢 𝑈𝑇2 = 100 + 120 𝐷𝑝𝑡𝑢(𝑈𝑇) = 𝐷𝑝𝑡𝑢(𝑈𝑇1) + 𝐷𝑝𝑡𝑢(𝑈𝑇2) and 𝑙 = 1 Optimal solution: 𝐷𝑝𝑡𝑢(𝑈𝑇) = 100 + 120 + 120

slide-10
SLIDE 10

Test Suite Compression (TSC)

➢ Reduction from the Set Cover Problem to TSC (NP-Hard)

➢Only approximation algorithm is possible

➢ Two algorithms

➢ Applying the Set Cover Heuristic (ignores edge cost)

➢ Constrained Set Multicover algorithms (SMC)

➢ TopKIndependent algorithm (considers edge cost)

➢ Consider edge costs for each rule separately, sort them and use greedy picks ➢ Further optimization to boost the picking process

2020/10/8

10

Background The Challenges

Approach

slide-11
SLIDE 11

Framework Overview

2020/10/8

11

Background The Challenges Framework Design

slide-12
SLIDE 12

Experiment Setup

The approach is prototyped in Microsoft SQL Server Databases: TPC-H Used 30 rules of the optimizer ➢ The efficiency of query generation: random vs. patthern ➢ The effectiveness of two algorithms: SetMultiCover vs. TopKIndependent ➢ The importance of exploiting monotonicity (further optimization on TopKIndependent)

2020/10/8

12

Background The Challenges Framework Design

Experiments

slide-13
SLIDE 13

Random vs. Pattern based Generation

2020/10/8

13

Background The Challenges Framework Design

Experiments

slide-14
SLIDE 14

Compression Quality

2020/10/8

14

Background The Challenges Framework Design

Experiments

Y-axis: Log scale Optimizer estimated cost

slide-15
SLIDE 15

Summary

This papers considers the problem of testing transformation rules of a query optimizer Main Contributions: ➢ Efficiently generate queries to cover the given rules

➢ Use query patterns to reduce the random generations trials

➢ Efficiently execute test suites for correctness testing

➢ Use approximation to pick a subset of the queries

2020/10/8

15

slide-16
SLIDE 16

Reflections

➢ This paper has a very clean and clear writing style, which worth a detailed reading ➢ There is a well-established logic between the defined problem and the solution ➢ A typical example is used throughout the paper ➢ The approach illustration is simple to understand without the theory

2020/10/8

16