CS411 balanced search tree Database Systems hash table R etc - PDF document

One-pass Algorithms Duplicate elimination δ (R) • Need to keep a dictionary in memory: CS411 – balanced search tree Database Systems – hash table R – etc • Cost: B(R) 12: Query Optimization Scan • Assumption: B( δ (R)) <= M before? Input buffer Kazuhiro Minami Output M-1 buffers buffer One-pass Algorithms Optimization • Step 1: convert the SQL query to some logical Grouping: γ city, sum(price) (R) plan – Remove subqueries from conditions • Need to keep a dictionary in memory – Map the SFW statement into RA expression • Also store the sum(price) for each city • Step 2: find a better logical plan, find an • Cost: B(R) associated physical plan • Assumption: number of cities fits in memory – Algebraic laws: • foundation for every optimization – Two approaches to optimizations: • Heuristics: apply laws that seem to result in cheaper plans • Cost based: estimate size and cost of intermediate results, search systematically for best plan

Converting from SQL to Logical Plans Select a 1 , …, a n Select a 1 , …, a n From R 1 , …, R k Π a1,…,an ( σ C (R 1 × R 2 × … × R k )) From R 1 , …, R k SQL –> Logical Query Plans Where C Where C Select a 1 , …, a n Select a 1 , …, a n From R 1 , …, R k From R 1 , …, R k Where C Where C Group by b 1 , …, b l Group by b 1 , …, b l Π a1,…,an ( γ b1, …, bm, aggs ( σ C (R 1 × R 2 × … × R k ))) Some nested queries can be flattened Converting Nested Queries Q: Give a list of product-manufacture pairs where the color of the product is blue and its prices is the Select distinct product.name highest among the products with blue color from Select distinct product.name From product that manufacture. From product Where product.maker in (Select company.name Where product.maker in (Select company.name Select distinct x.name, x.maker Select distinct x.name, x.maker From company From company From product x From product x where company.city=“Urbana”) where company.city=“Urbana”) Where x.color= “blue” Where x.color= “blue” AND x.price >= ALL (Select y.price AND x.price >= ALL (Select y.price Select distinct product.name Select distinct product.name From product y From product y From product, company From product, company Where x.maker = y.maker Where x.maker = y.maker Where product.maker = company.name AND Where product.maker = company.name AND AND y.color=“blue”) company.city=“Urbana” AND y.color=“blue”) company.city=“Urbana” Q: How do we convert this one to logical plan ?

Converting Nested Queries Converting Nested Queries Let’s compute the complement first: This one becomes a query without subqueries: Select distinct x.name, x.maker Select distinct x.name, x.maker From product x From product x Select distinct x.name, x.maker Where x.color= “blue” Select distinct x.name, x.maker Where x.color= “blue” AND x.price < SOME (Select y.price AND x.price < SOME (Select y.price From product x, product y From product x, product y From product y From product y Where x.color= “blue” AND x.maker = y.maker Where x.color= “blue” AND x.maker = y.maker Where x.maker = y.maker Where x.maker = y.maker AND y.color=“blue” AND x.price < y.price AND y.color=“blue”) AND y.color=“blue” AND x.price < y.price AND y.color=“blue”) This returns exactly the products we DON’T want, so… A set difference operator finishes the job Now rewrite the logical plan to an equivalent but better one Will probably run faster (Select x.name, x.maker Same query (Select x.name, x.maker answer From product x From product x Optimizer uses algebraic Heuristic : Where x.color = “blue”) Where x.color = “blue”) laws likely to result in cheaper plans EXCEPT EXCEPT Equivalen (Select x.name, x.maker (Select x.name, x.maker Original t From product x, product y From product x, product y Cost-based : Where x.color= “blue” AND x.maker = y.maker Where x.color= “blue” AND x.maker = y.maker estimate size and cost AND y.color=“blue” AND x.price < y.price) AND y.color=“blue” AND x.price < y.price) of intermediate results, search systematically for best plan

Algebraic Laws • Commutative and Associative Laws – R ∪ S = S ∪ R, R ∪ (S ∪ T) = (R ∪ S) ∪ T – R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T Algebraic Laws – R ⋈ S = S ⋈ R, R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T • Distributive Laws – R ⋈ (S ∪ T) = (R ⋈ S) ∪ (R ⋈ T) R( A,B,C,D ) S( E,F,G ) Algebraic laws about selections σ F=3 (R ⋈ D=E S) = (R ⋈ D=E σ F=3 (S)) σ C AND D (R) = σ C ( σ D (R)) = σ C (R) ∩ σ D (R) σ C OR D (R) = σ C (R) ∪ σ D (R) σ C (R ∪ S) = σ C (R) ∪ σ C (S) σ A=5 AND G=9 (R ⋈ D=E S) = σ A=5 ( σ G=9 (R ⋈ D=E S)) = σ A=5 (R ⋈ D=E σ G=9 (S)) = σ A=5 (R) ⋈ D=E σ G=9 (S) o o l l v v e e s s C C i i n n v v i i f f σ C (R ⋈ S) = σ C (R) ⋈ S t t e e s s t t r r i i b b u u l l y y a a t t o o n n σ C (R – S) = σ C (R) – S o o f f R R σ C (R ∩ S) = σ C (R) ∩ S

Algebraic laws for grouping and Algebraic laws for projection aggregation Π M (R ⋈ S) = Π N ( Π P (R) ⋈ Π Q (S)) SUM δ ( γ A, agg(B) (R)) = γ A, agg(B) (R) where N, P, Q are appropriate subsets of attributes of M COUNT γ A, agg(B) ( δ (R)) = γ A, agg(B) (R) , AVG Π M ( Π N (R)) = Π M,N (R) MIN if agg is duplicate insensitive MAX R(A,B,C,D) S(E,F,G) The book describes additional algebraic laws, but even the book doesn’t cover them all. Π A,B,G (R ⋈ D=E S) = Π ? ( Π ? (R) ⋈ D=E Π ? (S)) Heuristic Based Optimizations • Query rewriting based on algebraic laws • Result in better queries most of the time Heuristics-based Optimization • Heuristics number 1: - or – – Push selections down Do projections and selections as • Heuristics number 2: early as possible – Sometimes push selections up, then down

For each company with a product costing Predicate Pushdown more than $100, find the max price of its products pname Select y.name, y.address, Select y.name, y.address, Select y.name, y.address, Select y.name, y.address, pname Max(x.price) Max(x.price) Max(x.price) Max(x.price) σ price>100 AND city=“Urbana” From product x, company y From product x, company y From product x, company y From product x, company y Where x.maker=y.name and Where x.maker = y.name Where x.maker=y.name and Where x.maker = y.name maker=name x.price > 100 GroupBy y.name x.price > 100 GroupBy y.name GroupBy y.name Having Max(x.price) > 100 maker=name GroupBy y.name Having Max(x.price) > 100 city=“Urbana” price>100 Having Max(x.price) > 100 Having Max(x.price) > 100 Company Product Company Product •Advantage: the size of the join will be smaller. • Requires transformation rules specific to the grouping/aggregation (but may cause us to lose an important ordering operators. of the tuples, if we use indexes). • Won’t work if we replace Max by Min. Query Rewrite: Pushing predicates up Pushing predicates up Select V2.category, V2.cname, V2.price Select V2.category, V2.cname, V2.price From V1, V2 Select V2.cname, V2.price From V1, V2 Select V2.cname, V2.price From V1, V2 Where V1.category = V2.category and From V1, V2 Where V1.category = V2.category and V1.p = V2.price Where V1.category = V2.category and V1.p = V2.price Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 V1.p = V2.price AND V1.p < 20 V1: categories and the cheapest V2: Company name, product price of the product category, and the price of the in that category under $20 product made by that company Create View V1 AS Create View V1 AS Create View V1 AS Create View V1 AS Select x.category, Select x.category, Create View V2 AS Create View V2 AS Select x.category, Create View V2 AS Select x.category, Create View V2 AS Min(x.price) AS p Select y.cname, x.category, x.price Min(x.price) AS p Select y.cname, x.category, x.price Min(x.price) AS p Min(x.price) AS p Select y.cname, x.category, x.price Select y.cname, x.category, x.price From product x From product x, company y From product x From product x, company y From product x From product x, company y From product x From product x, company y Where x.price < 20 Where x.price < 20 Where x.maker=y.cname Where x.maker=y.cname Where x.price < 20 Where x.price < 20 Where x.maker=y.cname Where x.maker=y.cname GroupBy x.category GroupBy x.category GroupBy x.category GroupBy x.category

CS411 balanced search tree Database Systems hash table R etc - PDF document

One-pass Algorithms Duplicate elimination (R) Need to keep a dictionary in memory: CS411 balanced search tree Database Systems hash table R etc Cost: B(R) 12: Query Optimization Scan Assumption: B( (R)) <=

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

01: Introduction Please read the class syllabus, policies, and lecture schedule; ask if you

CS411 than $50,000 and pay taxes in Champaign-Urbana. Database Systems Could write in

CS411 data Has similar capabilities for queries to those in Database Systems relational

CS411 Database Systems Join Expressions 06: SQL Kazuhiro Minami Join Expressions Products and

System Aspects of SQL (Chapter 9: Four more ways to make SQL calls CS411 from outside the DBMS)

CS411 Concurrency control Database Systems Recovery Logging Redo 13: Logging

CS411 Pick application Conceptual design domain Database Systems What data do I What data do

CS411 Whats the purpose of main memory buffer? Database Systems Why do we need a

CS411 Database Systems 04: Relational Schema Design Kazuhiro Minami Primary Goal: Minimize

CS411 Database Systems Foreign Keys Local and Global Constraints 06: SQL Triggers Kazuhiro

CS411 Database Systems Concurrency Control 16: Final Review Session Kazuhiro Minami

How Might the Pandemic Affect Health Premiums, Utilization, and Outcomes in 2021 and Beyond?

Job Costs Every job is custom quoted because variables such as actual electricity usage, roof

T HI NKI NG A BOUT C ONSERVATI ON P ARADI GMS . D EVASTATION IN THE D UST B OWL E RA

REGULATORY UPDATE AND THE PRCS ROLE IN REFORM CENTER FOR RESEARCH IN REGULATED INDUSTRIES:

Announcements Brief web service HW discussion on Friday More detailed requirements for

Outline Unions, intersections, differences (6.2.5, 6.4.2) Lecture 03: SQL Subqueries

COMP 110-003 Introduction to Programming More Methods Constructors, Overloading and Static

Software and Programming I Lab 7: Construction of a Simulated Cash Register and a Student Class

From Evolution to Revolution Strategies for contending with Business Disruption in For Purpose

Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of

self build housing Ted Stevens NaSBA Chair What is Self/Custom Build? Who does it/what is

- 1 - Correct ness Robust ness Secur it y Overview Process qualit y Fast