take everything from me but leave me the comprehension
play

Take Everything From Me, But Leave Me The Comprehension DBPL - PowerPoint PPT Presentation

Take Everything From Me, But Leave Me The Comprehension DBPL September 2017 Torsten Grust db.inf.uni-tuebingen.de Apologies, I am only a database person Torsten Grust U Tbingen 2 Apologies, I am only a database person It


  1. One Way to Teach SQL c ← ∅ ; ∅ SELECT A, B foreach x ∈ S do FROM S c ← c ⨄ {(x.A,x.B)}; ⨄ return c; c ← – ∞ ; – ∞ SELECT MAX(A) foreach x ∈ S do FROM S max 2 c ← max 2 (c,x.A); return c; c ← true; true 0 < ALL(SELECT A foreach x ∈ S do FROM S) c ← c ∧ (0 < x.A); ∧ return c; Torsten Grust U Tübingen 15

  2. One Way to Teach SQL ∅ c ← ∅ ; SELECT A, B foreach x ∈ S do FROM S ⨄ c ← c ⨄ {(x.A,x.B)}; return c; – ∞ c ← – ∞ ; SELECT MAX(A) foreach x ∈ S do FROM S max 2 c ← max 2 (c,x.A); return c; true c ← true; 0 < ALL(SELECT A foreach x ∈ S do ∧ FROM S) c ← c ∧ (0 < x.A); return c; Torsten Grust U Tübingen 15

  3. One Program Form for SQL Torsten Grust U Tübingen 16

  4. One Program Form for SQL c ← z ; foreach x ∈ xs do ≡ fold( z , f , xs ) c ← f (c , x ) ; return c; Torsten Grust U Tübingen 16

  5. One Program Form for SQL c ← z ; foreach x ∈ xs do ≡ fold( z , f , xs ) c ← f (c , x ) ; return c; lift M z M M carrier ⊕ M bag bag t { ⋅ } ∅ ⊎ set set t { ⋅ } ∅ ∪ list list t [ ⋅ ] [ ] + + all bool id ∧ true TRUE some bool id ∨ false FALSE sum num id + 0 max t (ordered) id max 2 - ∞ min t (ordered) id min 2 ∞ Torsten Grust U Tübingen 16

  6. One Program Form for SQL Torsten Grust U Tübingen 17

  7. One Program Form for SQL SELECT A fold( ∅ , ⊕ ,S) with FROM S (if (x.A > x.B) {x.A} ⊕ (c,x) = c ⊎ WHERE A > B else ∅ ) Torsten Grust U Tübingen 17

  8. One Program Form for SQL SELECT A fold( ∅ , ⊕ ,S) with FROM S (if (x.A > x.B) {x.A} ⊕ (c,x) = c ⊎ WHERE A > B else ∅ ) fold( ∅ , ⊕ ,R) with SELECT x.A,y.B ⊕ (c,x) = c ⊎ fold( ∅ , ⊗ ,S) with FROM R x,S y ⊗ (d,y) = d ⊎ {(x.A,y.B)} Torsten Grust U Tübingen 17

  9. fold(,,) Gets Ugly Quickly Torsten Grust U Tübingen 18

  10. fold(,,) Gets Ugly Quickly SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B) fold(0, ⊕ ,fold( ∅ , ⊗ ,R)) with with ⊕ (c,_) = c + 1 ⊗ (d,x) = d ⊎ if (fold(false, ⊙ ,S)) {x} else ∅ with ⊙ (e,y) = e ∨ (x.A = y.B) Torsten Grust U Tübingen 18

  11. fold(,,) Gets Ugly Quickly SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B) Algebraic Wonderland. fold(0, ⊕ ,fold( ∅ , ⊗ ,R)) with with ⊕ (c,_) = c + 1 ⊗ (d,x) = d ⊎ if (fold(false, ⊙ ,S)) {x} else ∅ with ⊙ (e,y) = e ∨ (x.A = y.B) Torsten Grust U Tübingen 18

  12. fold(,,) Gets Ugly Quickly SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B) Algebraic Wonderland. fold(0, ⊕ ,fold( ∅ , ⊗ ,R)) with REJECT! with ⊕ (c,_) = c + 1 ⊗ (d,x) = d ⊎ if (fold(false, ⊙ ,S)) {x} else ∅ with ⊙ (e,y) = e ∨ (x.A = y.B) Torsten Grust U Tübingen 18

  13. Comprehension Semantics Torsten Grust U Tübingen 19

  14. Comprehension Semantics [ e | ] M [ e | v 1 ← e 1 , q ] M [ e | p , q ] M Torsten Grust U Tübingen 19

  15. Comprehension Semantics ≡ [ e | ] M lift M ( e ) ≡ [ e | v 1 ← e 1 , q ] M ≡ [ e | p , q ] M Torsten Grust U Tübingen 19

  16. Comprehension Semantics ≡ [ e | ] M lift M ( e ) ≡ [ e | v 1 ← e 1 , q ] M fold( z M , ⊗ , e 1 ) with ⊗ (c, v 1 ) = c ⊕ M [ e | q ] M ≡ [ e | p , q ] M if ( p ) [ e | q ] M else z M Torsten Grust U Tübingen 19

  17. Comprehensible SQL Torsten Grust U Tübingen 20

  18. Comprehensible SQL SELECT COUNT(*) FROM R x EXISTS ( SELECT y WHERE FROM S y ) WHERE x.A = y.B Torsten Grust U Tübingen 20

  19. Comprehensible SQL SELECT COUNT(*) FROM R x EXISTS ( SELECT y WHERE FROM S y ) WHERE x.A = y.B [ y | y ← S, x.A = y.B ] bag Torsten Grust U Tübingen 20

  20. Comprehensible SQL SELECT COUNT(*) FROM R x EXISTS ( SELECT y WHERE FROM S y ) WHERE x.A = y.B [ true | _ ← ] some [ y | y ← S, x.A = y.B ] bag Torsten Grust U Tübingen 20

  21. Comprehensible SQL SELECT COUNT(*) FROM R x EXISTS ( SELECT y WHERE FROM S y ) WHERE x.A = y.B [ 1 | x ← R , [ true | _ ← ] some [ y | y ← S, x.A = y.B ] bag ] sum Torsten Grust U Tübingen 20

  22. Comprehensible SQL SELECT COUNT(*) FROM R x EXISTS ( SELECT y WHERE FROM S y ) WHERE x.A = y.B [ 1 | x ← R , [ true | _ ← ] some [ y | y ← S, x.A = y.B ] bag ] sum [ 1 | x ← R , [ x.A = y.B | y ← S ] some ] sum Torsten Grust U Tübingen 20

  23. Comprehension Unnesting [ e | qs 1 , v ← [ ] N , qs 3 ] M [ e | qs 1 , v ← [ e 2 ] N , qs 3 ] M [ e | qs 1 , v ← [ e 2 | qs 2 ] N , qs 3 ] M [ e | qs 1 , [ e 2 | qs 2 ] some , qs 3 ] M Torsten Grust U Tübingen 21

  24. Comprehension Unnesting [ e | qs 1 , v ← [ ] N , qs 3 ] M [ ] M [ e | qs 1 , v ← [ e 2 ] N , qs 3 ] M [ e [ e 2 / v ] | qs 1 , qs 3 [ e 2 / v ] ] M [ e | qs 1 , v ← [ e 2 | qs 2 ] N , qs 3 ] M [ e [ e 2 / v ] | qs 1 , qs 2 , qs 3 [ e 2 / v ] ] M [ e | qs 1 , [ e 2 | qs 2 ] some , qs 3 ] M ( ⊕ M idempotent) [ e | qs 1 , qs 2 , e 2 , qs 3 ] M Torsten Grust U Tübingen 21

  25. When Syntax Distracts On Optimizing an SQL-like Nested Query W. Kim, ACM TODS, 1982 Torsten Grust U Tübingen 22

  26. When Syntax Distracts On Optimizing an SQL-like Nested Query W. Kim, ACM TODS, 1982 Torsten Grust U Tübingen 22

  27. When Syntax Distracts On Optimizing an SQL-like Nested Query W. Kim, ACM TODS, 1982 Implemented in most RDBMSs to this day Torsten Grust U Tübingen 22

  28. When Syntax Distracts • Syntactic classification of nested SQL queries into 
 types N , Nx , D , J , A , JA , JA ( NA ), JA ( AA ), JA ( AN ), … • Classes are associated with their particular 
 SQL–level unnesting rewrites. Torsten Grust U Tübingen 23

  29. When Syntax Distracts • Syntactic classification of nested SQL queries into 
 types N , Nx , D , J , A , JA , JA ( NA ), JA ( AA ), JA ( AN ), … • Classes are associated with their particular 
 SQL–level unnesting rewrites. Torsten Grust U Tübingen 23

  30. When Syntax Distracts SELECT DISTINCT f ( x ) FROM R AS x WHERE p ( x ) IN (SELECT g ( y ) FROM S AS y WHERE q ( x , y ) ) Torsten Grust U Tübingen 24

  31. When Syntax Distracts SELECT DISTINCT f ( x ) FROM R AS x WHERE p ( x ) IN (SELECT g ( y ) FROM S AS y WHERE q ( x , y ) ) [ f ( x ) | x ← R , [ p ( x ) = v | v ← [ g ( y ) | y ← S , q ( x , y ) ] bag ] some ] set Torsten Grust U Tübingen 24

  32. When Syntax Distracts SELECT DISTINCT f ( x ) FROM R AS x WHERE p ( x ) IN (SELECT g ( y ) FROM S AS y WHERE q ( x , y ) ) [ f ( x ) | x ← R , [ p ( x ) = g ( y ) | y ← S , q ( x , y ) ] some ] set Torsten Grust U Tübingen 24

  33. When Syntax Distracts SELECT DISTINCT f ( x ) FROM R AS x WHERE p ( x ) IN (SELECT g ( y ) FROM S AS y WHERE q ( x , y ) ) [ f ( x ) | x ← R , y ← S , q ( x , y ), p ( x ) = g ( y ) ] set SELECT DISTINCT f ( x ) FROM R AS x, S AS y WHERE q ( x , y ) AND p ( x ) = g ( y ) Torsten Grust U Tübingen 24

  34. A Zoo of Query Representations Groupwise Processing of Relational Queries Damianos Chatziantoniou” Kenneth A. ROSS* Department of Computer Science, Columbia University damia,nos,kar@cs.columbia.edu Groupwise Processing of Relational Queries D. Chatziantoniou, K.A. Ross, VLDB 1997 1 Introduction With the recent interest in decision support systems and Abstract data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial In this paper, we define and examine a particu- systems do not find good plans for many very complex lar class of queries called group queries. Group queries. queries are natural queries in many decision- In this paper we examine a particular class of complex support applications. The main characteristic of a queries, called group Many complex decision queries. group query is that it can be executed in a group- support queries describe the following idea: for each value z by-group fashion. In other words, the underlying in a dimension D (e.g. for each customer), evaluate a query relation(s) can be partitioned (based on some set Q’. This query Q’ can be something simple (e.g. compute of attributes) into disjoint groups, and each group avg(sales)) or complex (e.g. include joins, selections, fur- can be processed separately. We give a syntactic Torsten Grust U Tübingen 25 ther aggregations, etc). To specify this kind of complex criterion to identify these queries and prove its query in SQL, one has to embed D within many places sufficiency. We also prove the strong result that of a complex piece of SQL code. This may have many every group query has an equivalent formulation drawbacks in terms of performance if the optimizer is not that satisfies our syntactic criterion. We describe aware of the query’s structure. This particular structure a general evaluation technique for group queries, is often amenable to the following evaluation strategy: and demonstrate how an optimizer can determine partition the data along dimension D, and evaluate Q’ this plan. We then consider more complex queries independently on each partition. Therefore it is important whose components are group queries with poten- to be able to identify whether an SQL specification has the tially different partitioning attributes. We give form mentioned above. In that case, that SQL specification two methods to identify group query components constitutes a group query. Our main contributions are: within such a query. We also give some per- Group Queries (Section 2) We define the concept of formance results for group queries expressed in a group query. A query Q is a group query with respect standard SQL, comparing a commercial database to certain partitioning attributes S if, for all databases, system with our optimized plan on top of the it is possible to answer Q by (a) partitioning the data same commercial system. These results indicate according to the values for attribute(s) S, (b) evaluating that there are significant potential performance another query Q’ on each partition of the database, and (c) improvements. taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion ‘This research was supported by a grant from the AT&T for identifying group queries and prove its sufficiency. We Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation also prove the surprising result that every group query can Fellowship, by NSF CISE grant CDA-9625374, and by an NSF be expressed in a way that satisfies our syntactic criterion. Young Investigator award. We consider arbitrary relational queries expressed in SQL, Permission to copy without fee all or part of this material is and do not restrict ourselves to a special syntax. granted provided that the copies are not made or distributed for Execution Plan (Section 3) We demonstrate how a direct commercial advantage, the VLDB copyright notice and group query can be evaluated by partitioning the data and the title of the publication and its date appear, and notice is applying a significantly simpler query to each partition. given that copying is by permission of the Very Large Data Base Further, this evaluation plan can be automatically gener- Endowment. To copy otherwise, or to republish, requires a fee ated given a query that satisfies the criterion. The main and/or special permission from the Endowment. benefits of this approach are that the simpler query is often Proceedings of the 23rd VLDB Conference much easier to optimize and evaluate, and often partitions Athens, Greece, 1997 476

  35. A Zoo of Query Representations Groupwise Processing of Relational Queries Damianos Chatziantoniou” Kenneth A. ROSS* Department of Computer Science, Columbia University damia,nos,kar@cs.columbia.edu Groupwise Processing of Relational Queries D. Chatziantoniou, K.A. Ross, VLDB 1997 1 Introduction With the recent interest in decision support systems and SELECT f ( x ), agg ( g ( x )) Abstract data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- FROM R AS x volving both aggregation and joins. Current commercial In this paper, we define and examine a particu- systems do not find good plans for many very complex lar class of queries called group queries. Group GROUP BY f ( x ) queries. queries are natural queries in many decision- In this paper we examine a particular class of complex support applications. The main characteristic of a queries, called group Many complex decision queries. group query is that it can be executed in a group- support queries describe the following idea: for each value z by-group fashion. In other words, the underlying in a dimension D (e.g. for each customer), evaluate a query relation(s) can be partitioned (based on some set Q’. This query Q’ can be something simple (e.g. compute of attributes) into disjoint groups, and each group avg(sales)) or complex (e.g. include joins, selections, fur- can be processed separately. We give a syntactic Torsten Grust U Tübingen 25 ther aggregations, etc). To specify this kind of complex criterion to identify these queries and prove its query in SQL, one has to embed D within many places sufficiency. We also prove the strong result that of a complex piece of SQL code. This may have many every group query has an equivalent formulation drawbacks in terms of performance if the optimizer is not that satisfies our syntactic criterion. We describe aware of the query’s structure. This particular structure a general evaluation technique for group queries, is often amenable to the following evaluation strategy: and demonstrate how an optimizer can determine partition the data along dimension D, and evaluate Q’ this plan. We then consider more complex queries independently on each partition. Therefore it is important whose components are group queries with poten- to be able to identify whether an SQL specification has the tially different partitioning attributes. We give form mentioned above. In that case, that SQL specification two methods to identify group query components constitutes a group query. Our main contributions are: within such a query. We also give some per- Group Queries (Section 2) We define the concept of formance results for group queries expressed in a group query. A query Q is a group query with respect standard SQL, comparing a commercial database to certain partitioning attributes S if, for all databases, system with our optimized plan on top of the it is possible to answer Q by (a) partitioning the data same commercial system. These results indicate according to the values for attribute(s) S, (b) evaluating that there are significant potential performance another query Q’ on each partition of the database, and (c) improvements. taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion ‘This research was supported by a grant from the AT&T for identifying group queries and prove its sufficiency. We Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation also prove the surprising result that every group query can Fellowship, by NSF CISE grant CDA-9625374, and by an NSF be expressed in a way that satisfies our syntactic criterion. Young Investigator award. We consider arbitrary relational queries expressed in SQL, Permission to copy without fee all or part of this material is and do not restrict ourselves to a special syntax. granted provided that the copies are not made or distributed for Execution Plan (Section 3) We demonstrate how a direct commercial advantage, the VLDB copyright notice and group query can be evaluated by partitioning the data and the title of the publication and its date appear, and notice is applying a significantly simpler query to each partition. given that copying is by permission of the Very Large Data Base Further, this evaluation plan can be automatically gener- Endowment. To copy otherwise, or to republish, requires a fee ated given a query that satisfies the criterion. The main and/or special permission from the Endowment. benefits of this approach are that the simpler query is often Proceedings of the 23rd VLDB Conference much easier to optimize and evaluate, and often partitions Athens, Greece, 1997 476

  36. A Zoo of Query Representations Groupwise Processing of Relational Queries Damianos Chatziantoniou” Kenneth A. ROSS* Department of Computer Science, Columbia University damia,nos,kar@cs.columbia.edu Groupwise Processing of Relational Queries D. Chatziantoniou, K.A. Ross, VLDB 1997 1 Introduction With the recent interest in decision support systems and Abstract data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial In this paper, we define and examine a particu- systems do not find good plans for many very complex lar class of queries called group queries. Group [ ⟨ f ( x ), [ g ( y ) | y ← R , f ( y ) = f ( x ) ] agg ⟩ | x ← R ] set queries. queries are natural queries in many decision- In this paper we examine a particular class of complex support applications. The main characteristic of a queries, called group Many complex decision queries. group query is that it can be executed in a group- support queries describe the following idea: for each value z by-group fashion. In other words, the underlying in a dimension D (e.g. for each customer), evaluate a query relation(s) can be partitioned (based on some set Q’. This query Q’ can be something simple (e.g. compute of attributes) into disjoint groups, and each group avg(sales)) or complex (e.g. include joins, selections, fur- can be processed separately. We give a syntactic Torsten Grust U Tübingen 25 ther aggregations, etc). To specify this kind of complex criterion to identify these queries and prove its query in SQL, one has to embed D within many places sufficiency. We also prove the strong result that of a complex piece of SQL code. This may have many every group query has an equivalent formulation drawbacks in terms of performance if the optimizer is not that satisfies our syntactic criterion. We describe aware of the query’s structure. This particular structure a general evaluation technique for group queries, is often amenable to the following evaluation strategy: and demonstrate how an optimizer can determine partition the data along dimension D, and evaluate Q’ this plan. We then consider more complex queries independently on each partition. Therefore it is important whose components are group queries with poten- to be able to identify whether an SQL specification has the tially different partitioning attributes. We give form mentioned above. In that case, that SQL specification two methods to identify group query components constitutes a group query. Our main contributions are: within such a query. We also give some per- Group Queries (Section 2) We define the concept of formance results for group queries expressed in a group query. A query Q is a group query with respect standard SQL, comparing a commercial database to certain partitioning attributes S if, for all databases, system with our optimized plan on top of the it is possible to answer Q by (a) partitioning the data same commercial system. These results indicate according to the values for attribute(s) S, (b) evaluating that there are significant potential performance another query Q’ on each partition of the database, and (c) improvements. taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion ‘This research was supported by a grant from the AT&T for identifying group queries and prove its sufficiency. We Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation also prove the surprising result that every group query can Fellowship, by NSF CISE grant CDA-9625374, and by an NSF be expressed in a way that satisfies our syntactic criterion. Young Investigator award. We consider arbitrary relational queries expressed in SQL, Permission to copy without fee all or part of this material is and do not restrict ourselves to a special syntax. granted provided that the copies are not made or distributed for Execution Plan (Section 3) We demonstrate how a direct commercial advantage, the VLDB copyright notice and group query can be evaluated by partitioning the data and the title of the publication and its date appear, and notice is applying a significantly simpler query to each partition. given that copying is by permission of the Very Large Data Base Further, this evaluation plan can be automatically gener- Endowment. To copy otherwise, or to republish, requires a fee ated given a query that satisfies the criterion. The main and/or special permission from the Endowment. benefits of this approach are that the simpler query is often Proceedings of the 23rd VLDB Conference much easier to optimize and evaluate, and often partitions Athens, Greece, 1997 476

  37. A Zoo of Query Representations Groupwise Processing of Relational Queries Damianos Chatziantoniou” Kenneth A. ROSS* Department of Computer Science, Columbia University damia,nos,kar@cs.columbia.edu Groupwise Processing of Relational Queries D. Chatziantoniou, K.A. Ross, VLDB 1997 1 Introduction With the recent interest in decision support systems and Abstract Q f g agg R ≡ data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial In this paper, we define and examine a particu- systems do not find good plans for many very complex lar class of queries called group queries. Group [ ⟨ f ( x ), [ g ( y ) | y ← R , f ( y ) = f ( x ) ] agg ⟩ | x ← R ] set queries. queries are natural queries in many decision- In this paper we examine a particular class of complex support applications. The main characteristic of a queries, called group Many complex decision queries. group query is that it can be executed in a group- support queries describe the following idea: for each value z by-group fashion. In other words, the underlying in a dimension D (e.g. for each customer), evaluate a query relation(s) can be partitioned (based on some set Q’. This query Q’ can be something simple (e.g. compute of attributes) into disjoint groups, and each group avg(sales)) or complex (e.g. include joins, selections, fur- can be processed separately. We give a syntactic Torsten Grust U Tübingen 25 ther aggregations, etc). To specify this kind of complex criterion to identify these queries and prove its query in SQL, one has to embed D within many places sufficiency. We also prove the strong result that of a complex piece of SQL code. This may have many every group query has an equivalent formulation drawbacks in terms of performance if the optimizer is not that satisfies our syntactic criterion. We describe aware of the query’s structure. This particular structure a general evaluation technique for group queries, is often amenable to the following evaluation strategy: and demonstrate how an optimizer can determine partition the data along dimension D, and evaluate Q’ this plan. We then consider more complex queries independently on each partition. Therefore it is important whose components are group queries with poten- to be able to identify whether an SQL specification has the tially different partitioning attributes. We give form mentioned above. In that case, that SQL specification two methods to identify group query components constitutes a group query. Our main contributions are: within such a query. We also give some per- Group Queries (Section 2) We define the concept of formance results for group queries expressed in a group query. A query Q is a group query with respect standard SQL, comparing a commercial database to certain partitioning attributes S if, for all databases, system with our optimized plan on top of the it is possible to answer Q by (a) partitioning the data same commercial system. These results indicate according to the values for attribute(s) S, (b) evaluating that there are significant potential performance another query Q’ on each partition of the database, and (c) improvements. taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion ‘This research was supported by a grant from the AT&T for identifying group queries and prove its sufficiency. We Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation also prove the surprising result that every group query can Fellowship, by NSF CISE grant CDA-9625374, and by an NSF be expressed in a way that satisfies our syntactic criterion. Young Investigator award. We consider arbitrary relational queries expressed in SQL, Permission to copy without fee all or part of this material is and do not restrict ourselves to a special syntax. granted provided that the copies are not made or distributed for Execution Plan (Section 3) We demonstrate how a direct commercial advantage, the VLDB copyright notice and group query can be evaluated by partitioning the data and the title of the publication and its date appear, and notice is applying a significantly simpler query to each partition. given that copying is by permission of the Very Large Data Base Further, this evaluation plan can be automatically gener- Endowment. To copy otherwise, or to republish, requires a fee ated given a query that satisfies the criterion. The main and/or special permission from the Endowment. benefits of this approach are that the simpler query is often Proceedings of the 23rd VLDB Conference much easier to optimize and evaluate, and often partitions Athens, Greece, 1997 476

  38. A Zoo of Query Representations Torsten Grust U Tübingen 26

  39. A Zoo of Query Representations Torsten Grust U Tübingen 26

  40. A Zoo of Query Representations Torsten Grust U Tübingen 26

  41. A Zoo of Query Representations Torsten Grust U Tübingen 26

  42. A Zoo of Query Representations Torsten Grust U Tübingen 26

  43. A Zoo of Query Representations P P P SELECT agg ( g ( x )) FROM P AS x Torsten Grust U Tübingen 26

  44. A Zoo of Query Representations P P P Q’ g agg P ≡ [ g ( y ) | y ← P ] agg Torsten Grust U Tübingen 26

  45. A Zoo of Query Representations P P P Q’ g agg P ≡ [ g ( y ) | y ← P ] agg Torsten Grust U Tübingen 26

  46. A Zoo of Query Representations Torsten Grust U Tübingen 27

  47. e d n e e d n s n s i o e x t e t h e r i b e d e s c w e a 9 7 ] [ C h ( I n g . a m i n r e n n e d i b a s J G i d , F Y I J o f o n s a r t i t i o p y t a r a t e l s e p i e d a p p l o r . ) p e r a t g o a m i n r e n t h e n d l e o h a t r . g e t h e t o o n e d u n i d b e c o u l u l t s r e s r t i a l e p a d t h a n V i e w s . o l l o w a s f S Q L r d a n d a i n s t s e d p r e s i s e x Q 3 u e r y Q a r e w e e t n t h t h a a d i o s e s r e S u p p e c t i o n h ) e s G r a p o f t h y o n Q u e r d u r a t i : ( g e 2 . 1 a v e r a n h e i n i t i o n s t D e f o n t a i V l c Q f o r V 2 r a p h e w y g d v i q u e r , a n t h e c t i o n o f s e o d e s w o r l d e n t h e T h t o Q . c c e s s u e r y h a a q e a c v e n e f o r e g i y s b d a i n Q e r . s e d s a f t e ) u d a y a b o v t h e e d d i n d e f i n r e a ( a s t i o n s n s s e c e r a t i o f t h e o p n o o n a l r a t i o e l a t i e d u h e r e r a g a r e t e a v t h g l e s , e c t a n a s r i n g l e e n i n s w r i t t u l t a r e r e s c k s t h e b l o g e t B a s i c r t o . o r d e i e w s i n s u b v V 2 i t s a n d a n d V l o i n W e j s . m o n d d i a k a s o l a c l o c k s u e t s e t b d d n d m i t t e s , a i s o c i r c l e o n a s u l a t i o c k s f o r m b l S Q L a t i o n u a l g r e g a c t a g T h e e s . t u p l ( o r v i e w h e t o t O G n g Y I - L p o n d i n , F o r r e s a g a i e l c n c e l a b O e n a 7 ] . s g i v C h a 9 c k i i n [ b l o a r s v e r y a p p e E n d c e a s p a o f n o d e o m g e f r c a n n e d e r y i s a r q u e r e m p l e T h a s i o c k . a n d t b l u t e f t h a t t r i b m e o a n a o n o n ) n e d r e l a t i r t i t i o e p a i d a n b c e n t h e d i n t i o n s m e s c u s s i s l l d i i f N s h a N s W e e l e d n . l a b r t i t i o n o d e h p a t o e a c N i o n e l e d t e l y l a b d s e p a r a p o s e b e A Zoo of o n e r a t i e t o p a s t o f u m e n a r g s a n N s i o r i f N i , e o f 4 . l a u s n 2 . R O M c e c t i o F i n S Q 4 e r y Q u N i s o c k b l a t i o n g r e g n a g i s a u r c e e s o w h o s g e n e d A n N i . i Query Representations m i n g s c o E d g e N . o f t e s t t r i b u g a u p i n g r o t h e w i t h e d l a b e l Framework Theoretical 2 a n i t h r w g e t h e t o n k e d a r e l i N o c k c b l b a s i n g l e a s i o f o u t g e s E d f N . n o n e n d i t i o d e f i c o a n d j o i n g y t h e n o l o b y t e r m i b e l e d o u r l a e o i n t l y o d u c n d j i n t r c , a w e a r c t i o n , s e t h i s I n 0 c t i c e d . y n t a a b e l a s n o t l g i v e a r e W e c k s . t b l o q u e r y o f s e u p o u t g r o n g y a c o m i a n b m e w e w h a t s t h i t h a t o v e d p r s a n u e r i e p q g r o u g n t i f y i n i d e f o r r i o n c r i t e r y q u e o u p r y g r e v e t h a t h o w l s o s l e W e a s i n g t . h a i c i e n w i t s s u f f a p h n i c g r n d i t i o c y c l i c o d a e c t e a d i r i s r a p h y g q u e r A o n . r i t e r i u r c e s o a t i s f i a t s h a l l m t h W e s f o r t . i n a r e s u l s e d e r y p r e s e q u e e x t h a n b n t i n g c r e s e r e p n o d e o t ) ( r o o u r c e s u p . ” n g o i n t i s “ p e d g e i t h h w g r a p t h e r a w d n a l l y e n t i o c o n v e s i n g l b e a y s a l w a w i l l e r e , t h a s e s o g y d a t a b m i n o l n T e r l a t i o n d l e - r e a s i n g n s F o r m p t i o s s u A 2 . 1 h e w s t s h o e 1 F i g u r . c t u r e e p i o f t h o p h e t a t t o d e n k n s i h w i t i e w s , o f v r m s i n t e e n w r i t t a r e e r i e s q u t h a t u m e a s s W e i o n p a r a t e s e e t h N o t i c Q 2 . n d Q l , a e s q u e r i f o r p h s g r a u e r y q a r e e r e e t h s i n c o n u m p t i a s s a l i d a v s i s T h i i e s . b q u e r o s u n V 3 B ) B , ( V 2 c k s c b l o b a s i i n t o Q l u e r y m q 3 f r o n d V 2 a o f V o r e ) o r m w o ( o f t j o i n a s a r y b q u e a s u r i t e r e w y s t o w a m a n y ) . V 3 A A , ( V 2 o c k s b l a t i o n g r e g d a g a n e t t h e t h a s u m a s t i a l l y e i n i ] . W P L S G , S a y 8 7 , D i m 8 2 s [ K v i e w s a t i o n r e l i p l e ( M u l t R . o n r e l a t i g l e a s i n n s o n t a i e c a b a s d a t Torsten Grust U Tübingen 27 i e w a v f b e i t s e l m a y R 2 . 4 . ) o n S e c t i i n e r e d n s i d e c o i l l b w e w i t o f v i i n t u r p o m o f r o b u t u e r y , e r q n o t h o f a s u l t e r e o r t h a s R w , i f i e w , ( I . e . a v b l e . t a l a t e d a p s u e n c s a n d a r e a t e i s t n t o R i o f i t i o n d e f i n t h e n g f o l d i u n s i d e r c o n o n ’ t w e d e n t h R . ) v e r e s o q u e r i A h . g r a p e r y a q u o f o t i o n e n w t h b e l o i n e d e f s h a l l W e W e s . a t i o n o p e r a l a t i o n e r e l t a r t h a o d e s a s n h h g r a p u e r y q s : a t i o n o p e r n a l l a t i o o f r e d s e k i n t h r e d e r c o n s i o f o n i n a t i o m b m e c s o k i s b l o c a s i c A b s B l o c k c B a s i I n o n s . e l a t i o f r j o i n t o a e d a p p l i o n s e l e c t i d s a n t i o n s r o j e c p R E W H E O M - T - F R L E C s S E d a r e s s e e x p a r e o n s p e r a t i h o s u c S Q L s e A b a . m i n g r e n a t e t t r i b u o r a t e s g r e g a a g o u t w i t h e r i e s q u l o c k . s i c b a b a a s t e d t r e a a l s o n i s l a t i o r e s i n - i s a o c k b l a t i o n g r e g n a g A o c k s B l n g a t i o g g r e A g u p i n g r o o n s . t o f o n d i t i a s e o i n c g O 2 : j i o n . c i f y i n E ) , , c o n d i t s p e j o i n o n 0 : e r a t i o p t i o n g r e g a a g g l e o m - e c t o b n s n c t i o f u g a t e a g g r e o f l i s t d a a n e s t r i b u t a t a l l y p t i o n n o c a a t i o n g r e g a g T h e Q2 p s . (b) query g r o u h e v e r t d o p u t e (4 query Ql r e s a a t i o n o p e r u c h L , s S Q I n o n . l e c t i a s e b y w e d f o l l o b e e r e w h e r i e s q u G A V I N B Y - H R O U P M - G h s s e d - F R O g r a p x p r e s L E C T e r y e s S E Q u a 1 : g u r e F i p l u s ( b u t e s a t t r i n g r o u p i a l l g e s c l u d e i n c l a u s h e t T E L E C S e s i n g l a t a i n s c o n a u s e O M c l F R t h e a n d ) , g a t e s g g r e m e a s o . a t i o n r e l s b u t e a t t r i e t o f e a s S b L e t g ) i o n i n P a r t i t 2 : ( 2 . o n f i n i t i D e p e r a - o e n t e d t - o r i e s e s t h p r e s s e x b l o c k S e t s B l o c k e t S t h e e n t p r e s D r e l e t a n d R , o n r e l a t i g e r l y i n u n d t h e o m f r s a t i o n f r e l c e o e r e n d i f f a n d i o n r s e c t i n t e n i o n , y u a m e l s , n t i o n y e v e r F o r g e . r a n a l u e s S v o f u p l e s h e t c h t w h i o v e r a i n d o m t s s t r u c c o n S Q L n g o n d i r e s p e c o r T h e m a . s c h a m e h e s t h t w i x ) N ( w r i t e w e E D x d f o r a n r a p h r y g q u e t h e N i n d e n o T . C E P E X a n d C T R S E I N T E O N , U N I a r e o f s i o n e x t e n t h e e n N w h d e n o e r a t n s w h e a n t t r e s e o r e p t e t h r e e s e g t h u s i n f i e d s p e c i b e c a n e r y q u o n a l r e l a t i A n y . 0 = ( R ) c r s = b y a c e d r e p l R i s s - n e c e s i f b l o c k w o t o t t i n e s p l i n b y c a q u e r A n s . e r a t i o o p a n d u s e R E c l a W B E a b o t h h a s e r y e q u f t h p l e i x a m o r e a r y , f s - e a t t o b a v e o t h o n S d i n t e s t t r i b u e a t t h t h a e r v e O b s t i o n g r e g a a g a s a n i t w r i t e n r e e c a n w t h e a u s e P B Y c l R O U a G e l l - e w o b ) t N ( x f o r e N n o d a t s w e r a n t h e o f b u t e s t r i w a l l o n o t d o w e t h a t N o t e k . b l o c a s i c a b d t o p p l i e k a b l o c ) . R ( z U z E D = t R t h a s e r v e o b A l s o , e d . d e f i n e r i b u t a t t a l l o w w e d o n o r e s , q u e r i i n n s e l a t i o n t r n s t a c o 7 8 4

  48. applied separately to partitions of FYIJJG based in id, renaming. (In [Cha97] we describe the extensions needed and the partial results could be unioned together. to handle the renaming operator.) Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration of the sections read in the Definition days before each access to the world section, and view V2 2.1: (Query Graph) Suppose that we are given a query Q. the average duration The nodes of the query graph for Q of the sections read in the days after. are the relational We join Vl and V2 in order to get the result in single operations (as defined above) used in Q and its subviews. tuples. The actual SQL formulation Basic blocks are written is omitted as rectangles, due to lack aggregation of space and appears in [Cha97]. blocks as circles, and set blocks as diamonds. Once again, FYI-LOG Every block is given a label corresponding can be partitioned on id attribute and a simpler query can to the view (or relation) be posed separately name of that block. There is an edge from node on each partition. We shall discuss labeled Ni to node labeled Ns if Ns is mentioned Query Q4 in Section 2.4. in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation 2 Theoretical block N is Framework labeled with the grouping attributes of N. Edges coming out of a single basic block N are linked together with an e d In this section, we introduce n e e d n s n s i o e x t e t h e r i b e d e s c w e a 9 7 ] [ C h ( I n g . a m i n r e n our terminology n e d i b a s J G i d , F Y I J o f o n s and define a r t i t i o p y t a r a t e l s e p i e d a p p l arc, and jointly what we mean by a group query. labeled by the join condition o r . ) p e r a t g o a m i n r e n t h e n d l e o h a t r . g e t h e We give a syntactic t o of N. Edges o n e d u n i d b e c o u l u l t s r e s r t i a l e p a d t h a n coming out of set blocks are not labeled. criterion for identifying group queries and prove that this V i e w s . o l l o w a s f 0 S Q L r d a n d a i n s t s e d p r e s i s e x Q 3 u e r y Q condition is sufficient. We also show that every group query a r e w e e t n t h t h a a d i o s e s r e S u p p e c t i o n h ) e s G r a p o f t h y o n Q u e r d u r a t i : ( g e 2 . 1 a v e r a n h e i n i t i o n s t D e f o n t a i V l c can be expressed in a form that satisfies our criterion. Q f o r V 2 r a p h e w A query graph is a directed y g d v i q u e r , a n t h e c t i o n o f s e o d e s w o r l d e n t h e T h t o Q . c c e s s u e r y h a a q e a c v e n e f o r e g i y s b d a acyclic graph with a single i n Q e r . s e d s a f t e ) u d a y a b o v t h e e d d i n source (root) node representing d e f i n r e a ( a s t i o n s n s s e c e r a t i o f t h e o p n o o n a l r a t i o e l a t i e d u h e r e r a g a r e t e a v t h the query result. We shall g l e s , e c t a n a s r i n g l e e n i n s w r i t t u l t a r e r e s c k s t h e b l o g e t conventionally B a s i c r t o . o r d e i e w s i n s u b v 2.1 V 2 i t s a n d a n d V l o i n Assumptions draw the graph with edges “pointing W e j and Terminology up.” s . m o n d d i a k a s o l a c l o c k s u e t s e t b d d n d m i t t e s , a i s o c i r c l e o n a s u l a t i o c k s f o r m For single-relation b l S Q L a t i o n u a l g r e g a c t a g T h e e s . t u p l databases, there will always be a single We assume that queries are written ( o r v i e w h e t o t O G n g Y I - L p o n d i n , F o r r e s a g a i e l c n c e l a b O e n a 7 ] . s g i v C h a 9 c k i i n [ b l o a r s v e r y a p p e E sink node at the top of the picture. n d c e a in terms of views, with s p a o f Figure 1 shows the no subqueries. n o d e o m g e f r c a n n e d e r y i s a r q u e r e m p l e T h a s i o c k . This is a valid assumption a n d t b l u t e f t h a t t r i b m e o a n a o n o n ) n e d r e l a t i r t i t i o e p a i d a n b query graphs for queries Ql, and Q2. Notice the separation c since there are e n t h e d i many ways to rewrite a subquery as a join of two (or more) n t i o n s m e s c u s s i s l l d i i f N s h a N s W e e l e d n . l a b r t i t i o n o d e h p a t o e a c N i o n e l e d t e l y l a b d s e p a r a p o s e b e of V2 and V3 from query Ql into basic blocks (V2B, A Zoo of o n e r a t i V3B) e t o p a s t o f views [Kim82, Day87, SPLSG]. We initially u m e n a r g s a n N s i o r i f N i , e o f 4 . l a u s n 2 . R O M c e c t i o F i n S Q 4 e r y Q u and aggregation assume that the blocks (V2A, V3A). N i s o c k b l a t i o n g r e g n a g i s a database contains a single relation u r c e e s o w h o s g e n e d A n N i . i R. (Multiple Query Representations relations m i n g s c o E d g e N . o f t e s t t r i b u g a u p i n will be considered in Section 2.4.) R may itself be a view g r o t h e w i t h e d l a b e l Framework Theoretical 2 a n i t h r w g e t h e t o n k e d a r e l i N o c k c b l b a s i n g l e or the result of another query, but from our point of view it a s i o f o u t g e s E d f N . n o n e n d i t i o d e f i c o a n d j o i n g y t h e n o l o b y t e r m i b e l e d o u r l a e o i n t l y o d u c n d j i n t r c , a w e a r is treated as an encapsulated c t i o n , s e t h i s I n table. (I.e., if R was a view, 0 c t i c e d . y n t a a b e l a s n o t l g i v e a r e W e c k s . t b l o q u e r y o f s e u p o u t g r o n g y a c o m i a n b m e w e then we don’t consider unfolding w h a t the definition of R into s t h i t h a t o v e d p r s a n u e r i e p q g r o u g n t i f y i n i d e f o r r i o n c r i t e queries over R.) r y q u e o u p r y g r e v e t h a t h o w l s o s l e W e a s i n g t . h a i c i e n w i t s s u f f a p h n i c g r n d i t i o c y c l i c o d a e c t e a d i r i s r a p h y g q u e r We shall define below the notion of a query graph. A o n . r i t e r i A u r c e s o a t i s f i a t s h a l l m t h W e s f o r t . i n a r e s u l s e d e r y p r e s e q u e e x t h a n b n t i n g c r e s e r e p n o d e o t ) ( r o o u r c e s query graph has nodes that are relational operations. We u p . ” n g o i n t i s “ p e d g e i t h h w g r a p t h e r a w d n a l l y e n t i o c o n v consider three kinds of relational operations: e s i n g l b e a y s a l w a w i l l e r e , t h a s e s o g y d a t a b m i n o l n T e r l a t i o n d l e - r e a s i n g n s F o r m p t i o s s u A 2 . 1 Basic Blocks A basic block is some combination h e w s t s h o e 1 F i g u r . of c t u r e e p i o f t h o p h e t a t t o d e n k n s i projections h and selections applied to a join of relations. w i t i e w s , o f v r m s i n t e e n w r i t t a r e e r i e s q u t h a t u m e a s s W e i o n p a r a t e s e e t h N o t i c Q 2 . n d Q l , a In e s q u e r i f o r p h s g r a u e r y q SQL such operations a r e e r e e t h s i n c o n u m p t i a s s a l i d a v s are expressed as SELECT-FROM-WHERE i s T h i i e s . b q u e r o s u n V 3 B ) B , ( V 2 c k s c b l o b a s i i n t o Q l u e r y m q 3 f r o n d V 2 a o f V o r e ) queries without o r m w o ( o f t j o i n a s a r y b q u e a s u aggregates or attribute r i t e r e w y s t o w a m a n y renaming. ) . V 3 A A , ( V 2 o c k s A base b l a t i o n g r e g d a g a n e t t h e t h a relation is also treated as a basic block. s u m a s t i a l l y e i n i ] . W P L S G , S a y 8 7 , D i m 8 2 s [ K v i e w s a t i o n r e l i p l e ( M u l t R . o n r e l a t i g l e Aggregation a s i n n s o n t a i e c a b a s d a t Blocks An aggregation Torsten Grust U Tübingen 27 block is a sin- i e w a v f b e i t s e l m a y R 2 . 4 . ) o n S e c t i gle aggregation i n e r e d n s i d e c o i l l b w operation specifying a set of grouping e w i t o f v i i n t u r p o m o f r o b u t u e r y , e r q n o t h o f a E), s u l t , O2 attributes e r e o r t h : join conditions. and a list of aggregate functions 0 : join condition. to be com- a s R w , i f i e w , ( I . e . a v b l e . t a l a t e d a p s u e n c s a n d a r e a t e i s t puted over the groups. The aggregation can optionally n t o R i o f i t i o n d e f i n t h e n g f o l d i u n s i d e r c o n o n ’ t w e d e n t h be followed by a selection. (4 query In SQL, such operations Ql (b) query are Q2 R . ) v e r e s o q u e r i expressed as SELECT-FROM-GROUPBY-HAVING queries where A h . g r a p e r y a q u o f o t i o n e n w t h b e l o i n e d e f s h a l l W e the SELECT clause includes all grouping Figure 1: Query attributes W e s . a t i o n graphs o p e r (plus a l a t i o n e r e l t a r t h a o d e s a s n h h g r a p u e r y q some aggregates), and the FROM clause contains s : a t i o n a single o p e r n a l l a t i o o f r e d s e k i n t h r e d e r c o n s i relation. o f o n i n a t i o m b m e c s o k i s b l o c a s i c A b s B l o c k c B a s i Set Blocks Set blocks express the set-oriented I n o n s . e l a t i o f r Definition j o i n t o a e d a p p l i o n s e l e c t i opera- 2.2: d s a n t i o n s (Partitioning) r o j e c p Let S be a set of attributes tions, namely union, intersection R E W H E and difference of relations O M - T - F R L E C s S E from the underlying d a r e s s e e x p a r e o n s p e r a t i h o s u c S Q L relation R, and let D represent the with the same schema. The corresponding s e A b a . m i n g r e n a t e t t r i b u o r a t e s domain over which the tuples of S values range. For every g r e g a SQL constructs a g o u t w i t h e r i e s q u are UNION, INTERSECT and EXCEPT. l o c k . s i c b a b a a s t e d t r e a a l s o n i s node N in the query graph and for x E D we write N(x) l a t i o r e Any relational query can be specified using these three s i n - i s a o c k b l a t i o n g r e g n a g A o c k s B l n g a t i o to represent the answer at node N when the extension of g g r e A operations. A query can be split into two blocks if neces- g u p i n g r o o n s . t o f o n d i t i a s e o i n c g O 2 : j i o n . c i f y i n E ) , , c o n d i t s p e j o i n o n 0 : e r a t i o p t i o n g r e g a a g R is replaced by crs== (R). 0 g l e sary, for example if the query has both a WBERE o m - e c t o b n s n c t i o f u g a t e a g g r e o f l i s t d a a n e s t r i b u t a t clause and a GROUPBY a l l y p t i o n clause then we can rewrite it as an aggregation n o c a a t i o n g r e g a g T h e Q2 p s . (b) query g r o u h e v e r t d o p u t e (4 query Ql Observe that the attributes in S do not have to be at- block applied to a basic block. Note that we do not allow r e s a a t i o n o p e r u c h L , s S Q I n o n . l e c t i a s e b y w e d f o l l o b e tributes of the answer at node N for N(x) constant e r e w h e r i e s q u relations to be well- G in queries, nor do we allow attribute A V I N B Y - H R O U P M - G h s s e d - F R O g r a p x p r e s L E C T e r y e s S E Q u a 1 : g u r e F i defined. Also, observe that R = UzED R(z). p l u s ( b u t e s a t t r i n g r o u p i a l l g e s c l u d e i n c l a u s h e t T E L E C S e s i n g l a t a i n s c o n a u s e O M c l F R t h e a n d ) , g a t e s g g r e m e a s o . a t i o n r e l s b u t e a t t r i e t o f e a s S b L e t g ) i o n i n P a r t i t 2 : ( 2 . o n f i n i t i D e p e r a - o e n t e d t - o r i e s e s t h p r e s s e x b l o c k S e t s B l o c k e t S t h e e n t 478 p r e s D r e l e t a n d R , o n r e l a t i g e r l y i n u n d t h e o m f r s a t i o n f r e l c e o e r e n d i f f a n d i o n r s e c t i n t e n i o n , y u a m e l s , n t i o n y e v e r F o r g e . r a n a l u e s S v o f u p l e s h e t c h t w h i o v e r a i n d o m t s s t r u c c o n S Q L n g o n d i r e s p e c o r T h e m a . s c h a m e h e s t h t w i x ) N ( w r i t e w e E D x d f o r a n r a p h r y g q u e t h e N i n d e n o T . C E P E X a n d C T R S E I N T E O N , U N I a r e o f s i o n e x t e n t h e e n N w h d e n o e r a t n s w h e a n t t r e s e o r e p t e t h r e e s e g t h u s i n f i e d s p e c i b e c a n e r y q u o n a l r e l a t i A n y . 0 = ( R ) c r s = b y a c e d r e p l R i s s - n e c e s i f b l o c k w o t o t t i n e s p l i n b y c a q u e r A n s . e r a t i o o p a n d u s e R E c l a W B E a b o t h h a s e r y e q u f t h p l e i x a m o r e a r y , f s - e a t t o b a v e o t h o n S d i n t e s t t r i b u e a t t h t h a e r v e O b s t i o n g r e g a a g a s a n i t w r i t e n r e e c a n w t h e a u s e P B Y c l R O U a G e l l - e w o b ) t N ( x f o r e N n o d a t s w e r a n t h e o f b u t e s t r i w a l l o n o t d o w e t h a t N o t e k . b l o c a s i c a b d t o p p l i e k a b l o c ) . R ( z U z E D = t R t h a s e r v e o b A l s o , e d . d e f i n e r i b u t a t t a l l o w w e d o n o r e s , q u e r i i n n s e l a t i o n t r n s t a c o 7 8 4

  49. applied separately to partitions of FYIJJG based in id, renaming. (In [Cha97] we describe the extensions needed and the partial results could be unioned together. to handle the renaming operator.) Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration of the sections read in the Definition days before each access to the world section, and view V2 2.1: (Query Graph) Suppose that we are given a query Q. the average duration The nodes of the query graph for Q of the sections read in the days after. are the relational We join Vl and V2 in order to get the result in single operations (as defined above) used in Q and its subviews. tuples. The actual SQL formulation Basic blocks are written is omitted as rectangles, due to lack aggregation of space and appears in [Cha97]. blocks as circles, and set blocks as diamonds. Once again, FYI-LOG Every block is given a label corresponding can be partitioned on id attribute and a simpler query can to the view (or relation) be posed separately name of that block. There is an edge from node on each partition. We shall discuss labeled Ni to node labeled Ns if Ns is mentioned Query Q4 in Section 2.4. applied separately in the and the partial to partitions FROM clause of Ni , or if Ns is an argument of a set operation of FYIJJG results could be unioned together. Query Q3 is expressed in standard SQL as follows. View based in i d , Vl contains the average duration renaming. in Ni. An edge whose source is an aggregation (In [Cha97] we describe the extensions needed 2 Theoretical days before each access to the world section, and view V2 block N is Framework to handle the renaming of the sections read in the the average duration labeled with the grouping operator.) attributes of N. Edges coming We join Vl of the sections read in the days after. Definition and V2 in order to get the result in single out of a single basic block N are linked together with an e d In this section, we introduce n e e d n s tuples. The actual SQL formulation n s i o e x t e t h e r i b e 2.1: d e s c w e a 9 7 ] [ C h ( I n g . a m i n (Query r e n our terminology given a query Q. n e d i b a s J G i d , F Y I J o f o n s and define a r t i t i o p Graph) y t a r a t e l s e p of space and appears in [Cha97]. i e d a p p l Suppose arc, and jointly are the relational The nodes of the query graph for Q what we mean by a group query. that labeled by the join condition o r . ) is omitted p e r a t g o a m i n can be partitioned r e n t h e n d l e o h we are a t r . g e t h e We give a syntactic t o of N. Edges o n e d due to lack u n i operations d b e c o u l and its subviews. u l t s r e s r t i a l e p a d t h a n Once again, FYI-LOG on be posed separately (as defined above) used in Q i d attribute coming out of set blocks are not labeled. criterion Basic blocks are written aggregation for identifying and a simpler query can group queries and prove that this V i e w s . o l l o w Query Q4 in Section 2.4. a s f 0 on each partition. S Q L r d blocks as circles, and set blocks as diamonds. a n d a i n s t s e d p r e s i s e x Q 3 u e r y Q Every block is given a label corresponding as rectangles, We shall discuss condition is sufficient. We also show that every group query relation) a r e w e e t n t h t h a a d i o s e s r e S u p p e c t i o n h ) e s G r a p o f t h y o n Q u e r d u r a t i : ( g e 2 . 1 a v e r a n h e name of that block. There is an edge from node i n i t i o n s t D e f o n t a i V l c 2 Theoretical labeled Ni to node labeled Ns if Ns is mentioned to the view (or can be expressed in a form that satisfies our criterion. Q f o r V 2 r a p h e w A query graph is a directed y g d v i q u e r , a n t h e Framework c t i o n o f s e o d e s w o FROM r l d e n t h e T h t o Q . c c e s s u e r y h a a q e a c v e n e f o r e g i y s b clause of Ni , or if Ns is an argument of a set operation d a acyclic graph with a single In this section, we introduce in Ni. An edge whose source is an aggregation i n Q e r . s e d s a f t e ) u d a y a b o v what we mean by a group query. t h e e d d i n source (root) node representing d e f i n r e a ( a s t i o n s n s s e c in the e r a t i o f t h e o p n o o n a l r a t i o e l a t i e d u h e r e r a g a r e t e a v t h labeled with the grouping our terminology the query result. We shall criterion and define for identifying out of a single basic block N are linked together with an g l e s , e c t a n a s r i n g l e e n i n s w r i t t u l t a r e r e s c k s t h e b l o g e t conventionally B a s i c r t o . o r d e attributes i e w s We give a syntactic i n block N is s u b v 2.1 V 2 i t s condition is sufficient. We also show that every group query a n d a n d V l o i n Assumptions draw the graph with edges “pointing W e j group queries and prove that this and of N. Edges coming Terminology arc, and jointly up.” can be expressed in a form that satisfies our criterion. s . m o n d d i a k a s labeled by the join condition o l a c l o c k s u e t s e t b d d n d m i t t e s , a i s o c i r c l e o coming out of set blocks are not labeled. n a s u l a t i o c k s f o r m For single-relation b l S Q L a t i o n u a l g r e g a c t a g T h e e s . t u p l databases, there will always be a single We assume that queries are written ( o r of N. Edges v i e w h e t o t O G n g Y I - L p o n d i n , F o r r e s a g a i e l c n c e l a b O e n a 7 ] . s g i v C h a 9 c k i i n [ b l o 2.1 a r s v e r y a p p e E sink node at the top of the picture. n d c e a in terms of views, with s p a o f Assumptions 0 A query graph is a directed Figure 1 shows the no subqueries. and n o d e o m g e f r c a n n e d e r y i s a r q u e r e m p l e T h Terminology source (root) node representing a s i o c k . This is a valid assumption a n d t b l u t e f t h a We assume that queries are written t t r i b m e o a n a o n o n ) n e d r e l a t i r t i t i o e p a i d a n b query graphs for queries Ql, and Q2. Notice the separation c since there are acyclic graph with a single conventionally no subqueries. e n t h e d i many ways to rewrite a subquery as a join of two (or more) n t i o n s m e s c u s s i s l l the query result. We shall d i i f N s h a N s W e e l e d n . l a b r t i t i o draw the graph with edges “pointing n o d e h p a t o e a c N i This is a valid assumption o n e l e d in terms of views, with t e l y l a b d s e p a r a For single-relation p o s e b e many ways to rewrite a subquery as a join of two (or more) of V2 and V3 from query Ql into basic blocks (V2B, A Zoo of databases, there will always be a single sink node at the top of the picture. o n views [Kim82, Day87, SPLSG]. We initially e r a t i V3B) e t o p a s t o f views [Kim82, Day87, SPLSG]. We initially u m e n since there are a r g s a n N s i o r i f N i , e o f 4 . l a u s n 2 . R O M c e c t i o F i n S Q 4 e r y up.” Q u and aggregation query graphs for queries Ql, and Q2. Notice the separation assume that the database contains a single relation blocks (V2A, V3A). N i s o c k b l a t i o n g r e g n a g Figure 1 shows the i s a database contains a single relation of V2 and V3 from query Ql into basic blocks (V2B, u r c e e s o will be considered in Section 2.4.) R may itself be a view w h o s g e n e d A assume that the n N i . i R. (Multiple Query Representations R. (Multiple relations and aggregation or the result of another query, but from our point of view it m i n g s c o E d g e N . o f relations t e s t t r i b u g a u p i n will be considered in Section 2.4.) R may itself be a view g r o t h e blocks (V2A, w i t h e d l a b e l is treated as an encapsulated Framework Theoretical 2 V3A). V3B) a n then we don’t consider unfolding i t h r w g e t h e t o n k e d a r e l i N o c k c b l b a s i n g l e or the result of another query, but from our point of view it a s i o f table. o u t (I.e., if R was queries over R.) g e s a E d f N . the definition n o v n e n d i t i o i e d e f i c o a n d j o i n w g y t h e , n o l o b y t e r m i b e l e d o u r l a e o i n t l y o d u c n d j i n t r c , a w e a r We shall define below the notion of a query graph. is treated as an encapsulated c t i o n , s e t h i s I n of R into table. (I.e., if R was a view, query graph has nodes that are relational 0 c t i c e d . y n t a a b e l a s n o t l g i v e a r e W e c k s . t b l o q u e r y o f s e u p o u t g r o n g y a c o m i a n b m e w e then we don’t consider unfolding w h a t consider three kinds of relational the definition of R into A s t h i t h a t operations. o v e Basic d p r s a n u e r i e p q g r o u g n t i f y i n i d e f o r Blocks r i o n c r i t e queries over R.) operations: We projections A basic block is some combination and selections applied to a join of relations. r y q u e o u p r y g r e v e t h a t h o w l s o s l e W e a s i n g SQL such operations t . h a i c i e n w i t s s u f f a p h n i c g r n d i t i o c y c l i c o d a e c t e a d i r i s r a p h y g q u e r We shall define below the notion of a query graph. A of queries without are expressed as SELECT-FROM-WHERE o n . r i t e r i A u r c e s o a t i s f i a t s h a l l m t h W e s f o r t . i n a r e s u l s e d e r y aggregates or attribute p r e s In e q u e e x t h a n b n t i n g c r e s e relation is also treated as a basic block. r e p n o d e o t ) ( r o o u r c e s query graph has nodes that are relational operations. renaming. Aggregation We u p . ” n g o i n t i s “ p e d g e i t h h w g r a p A base t h e r a w d n a l l y e n t i o c o n v Blocks consider three kinds of relational gle aggregation An aggregation operations: operation e s i n g l attributes b e a y s a l w a w i l l e r e block is a sin- , t h a s e s o g y d a t a b m i n o l n T e r l a t i o specifying n d l e - r e a s i n g n s F o r m p t i o s s u and a list of aggregate A 2 . 1 Basic Blocks a set of grouping puted over the groups. A basic block is some combination h e w s t s h o e 1 F i g u r . functions of c t u r e e p i o f t h o p h e t be followed by a selection. a t t o d e n k n s i The aggregation to be com- projections h and selections applied to a join of relations. w i t i e w s , o f v E r m s ) i n t e e , n expressed w r i t t , a r e O e r i e s 2 q u t h a t : u m e can optionally a s s W e j i o n o p a r a t i n e s e e t h N o t i c c Q 2 . In SQL, such operations o n d n Q l , a d In e s a q u e r i s i t f o r i o p h s S n g r a u e r y E s q L . E C the T - F R O S M E SQL such operations - a r e L G e r e e 0 t h E R s i n c C O o n u m p t i : T U a s s j clause includes all grouping P a l i d o a v i n B s are expressed as SELECT-FROM-WHERE i s Y T h i i e s . are c b q - u e r o o s u H n n A d V i t V 3 B ) some aggregates), I i o B , N ( V 2 G n c k s . c b l o b a s i i n t o queries where Q l (4 query u e r y m q 3 f r o n d V 2 a o f V SQL surface syntax, relational algebra, Ql relation. and the FROM o r e ) queries without o r m w o ( attributes o f t j o i n a s a r y b q u e a s u aggregates or attribute r i t e r e w y s t o w a m a n y (b) query (plus clause contains renaming. ) . V 3 A A , ( V 2 o c k s A base b l Q2 a t i o n g r e g d a g a n Set Blocks query graphs + annotations, iteration Figure a single e t t h e t h a relation is also treated as a basic block. s u m a s Set blocks express the set-oriented t i a l l y e i n i ] . W 1: Query P L S G , S tions, namely union, intersection a y 8 7 , D i m 8 2 s [ K v i e w graphs with the same schema. The corresponding s a t i o n r e l i p l e ( M u l t R . o n r e l a t i opera- g l e Aggregation and difference of relations a s i n n s o n t a i e c a b a s d a t Blocks Definition a r An aggregation e U Torsten Grust U Tübingen N 27 I O block is a sin- N , 2.2: I N T E R (Partitioning) S E from the underlying C i e w T a v f b e and EXCEPT. i t s e l Any relational m a y SQL constructs R 2 . 4 . ) o n S e c t i gle aggregation i n e r e d n s i d e c o i l l b w Let S be a set of attributes operation specifying query can be specified using these three operations. domain over which the tuples of S values range. For every relation a set of grouping R, and let D represent A query can be split into two blocks if neces- e w i t o f v i i n t u r p o m o f r o b u t sary, for example if the query has both a WBERE u e r y , e r node N in the query graph and for x E D we write N(x) q n o t h o f a E), s u l t , O2 attributes e r e o r t h : join conditions. and a list of aggregate the functions 0 : join condition. a GROUPBY to represent the answer at node N when the extension of to be com- a s R w , i f i e w , clause then we can rewrite it as an aggregation ( I . e . a v b l e . t a l a t e d a p s u e n c s a n d a r e a t e i s t puted over the groups. block applied to a basic block. Note that we do not allow R is replaced by crs== (R). 0 The aggregation clause and can optionally constant n t o R i o f i t i o n d e f i n t h e n g f o l d i u n s i d e r c o n relations o n ’ t w e d e n t h be followed by a selection. in queries, nor do we allow attribute Observe that the attributes (4 query In SQL, such operations Ql (b) query are Q2 R . ) v e r e s o tributes q u e r i expressed as SELECT-FROM-GROUPBY-HAVING of the answer at node N for N(x) in S do not have to be at- defined. Also, observe that R = UzED R(z). queries where A h . g r a p e r y a q u o f o t i o n e n w t h b e l o i n e d e f s h a l l W e the SELECT clause includes all grouping to be well- Figure 1: Query attributes W e s . a t i o n graphs o p e r (plus a l a t i o n e r e l t a r t h a o d e s a s n h h g r a p u e r y q some aggregates), and the FROM 478 clause contains s : a t i o n a single o p e r n a l l a t i o o f r e d s e k i n t h r e d e r c o n s i relation. o f o n i n a t i o m b m e c s o k i s b l o c a s i c A b s B l o c k c B a s i Set Blocks Set blocks express the set-oriented I n o n s . e l a t i o f r Definition j o i n t o a e d a p p l i o n s e l e c t i opera- 2.2: d s a n t i o n s (Partitioning) r o j e c p Let S be a set of attributes tions, namely union, intersection R E W H E and difference of relations O M - T - F R L E C s S E from the underlying d a r e s s e e x p a r e o n s p e r a t i h o s u c S Q L relation R, and let D represent the with the same schema. The corresponding s e A b a . m i n g r e n a t e t t r i b u o r a t e s domain over which the tuples of S values range. For every g r e g a SQL constructs a g o u t w i t h e r i e s q u are UNION, INTERSECT and EXCEPT. l o c k . s i c b a b a a s t e d t r e a a l s o n i s node N in the query graph and for x E D we write N(x) l a t i o r e Any relational query can be specified using these three s i n - i s a o c k b l a t i o n g r e g n a g A o c k s B l n g a t i o to represent the answer at node N when the extension of g g r e A operations. A query can be split into two blocks if neces- g u p i n g r o o n s . t o f o n d i t i a s e o i n c g O 2 : j i o n . c i f y i n E ) , , c o n d i t s p e j o i n o n 0 : e r a t i o p t i o n g r e g a a g R is replaced by crs== (R). 0 g l e sary, for example if the query has both a WBERE o m - e c t o b n s n c t i o f u g a t e a g g r e o f l i s t d a a n e s t r i b u t a t clause and a GROUPBY a l l y p t i o n clause then we can rewrite it as an aggregation n o c a a t i o n g r e g a g T h e Q2 p s . (b) query g r o u h e v e r t d o p u t e (4 query Ql Observe that the attributes in S do not have to be at- block applied to a basic block. Note that we do not allow r e s a a t i o n o p e r u c h L , s S Q I n o n . l e c t i a s e b y w e d f o l l o b e tributes of the answer at node N for N(x) constant e r e w h e r i e s q u relations to be well- G in queries, nor do we allow attribute A V I N B Y - H R O U P M - G h s s e d - F R O g r a p x p r e s L E C T e r y e s S E Q u a 1 : g u r e F i defined. Also, observe that R = UzED R(z). p l u s ( b u t e s a t t r i n g r o u p i a l l g e s c l u d e i n c l a u s h e t T E L E C S e s i n g l a t a i n s c o n a u s e O M c l F R t h e a n d ) , g a t e s g g r e m e a s o . a t i o n r e l s b u t e a t t r i e t o f e a s S b L e t g ) i o n i n P a r t i t 2 : ( 2 . o n f i n i t i D e p e r a - o e n t e d t - o r i e s e s t h p r e s s e x b l o c k S e t s B l o c k e t S t h e e n t 478 p r e s D r e l e t a n d R , o n r e l a t i g e r l y i n u n d t h e o m f r s a t i o n f r e l c e o e r e n d i f f a n d i o n r s e c t i n t e n i o n , y u a m e l s , n t i o n y e v e r F o r g e . r a n a l u e s S v o f u p l e s h e t c h t w h i o v e r a i n d o m t s s t r u c c o n S Q L n g o n d i r e s p e c o r T h e m a . s c h a m e h e s t h t w i x ) N ( w r i t e w e E D x d f o r a n r a p h r y g q u e t h e N i n d e n o T . C E P E X a n d C T R S E I N T E O N , U N I a r e o f s i o n e x t e n t h e e n N w h d e n o e r a t n s w h e a n t t r e s e o r e p t e t h r e e s e g t h u s i n f i e d s p e c i b e c a n e r y q u o n a l r e l a t i A n y . 0 = ( R ) c r s = b y a c e d r e p l R i s s - n e c e s i f b l o c k w o t o t t i n e s p l i n b y c a q u e r A n s . e r a t i o o p a n d u s e R E c l a W B E a b o t h h a s e r y e q u f t h p l e i x a m o r e a r y , f s - e a t t o b a v e o t h o n S d i n t e s t t r i b u e a t t h t h a e r v e O b s t i o n g r e g a a g a s a n i t w r i t e n r e e c a n w t h e a u s e P B Y c l R O U a G e l l - e w o b ) t N ( x f o r e N n o d a t s w e r a n t h e o f b u t e s t r i w a l l o n o t d o w e t h a t N o t e k . b l o c a s i c a b d t o p p l i e k a b l o c ) . R ( z U z E D = t R t h a s e r v e o b A l s o , e d . d e f i n e r i b u t a t t a l l o w w e d o n o r e s , q u e r i i n n s e l a t i o n t r n s t a c o 7 8 4

  50. A Uniform Query Representation Torsten Grust U Tübingen 28

  51. A Uniform Query Representation Q’ g agg P = [ g ( y ) | y ← P ] agg partition f xs = [ ⟨ f ( x ), [ y | y ← xs , f ( x ) = f ( y ) ] M ⟩ | x ← xs ] set map f xs = [ f ( x ) | x ← xs ] M Torsten Grust U Tübingen 28

  52. A Uniform Query Representation Q’ g agg P = [ g ( y ) | y ← P ] agg partition f xs = [ ⟨ f ( x ), [ y | y ← xs , f ( x ) = f ( y ) ] M ⟩ | x ← xs ] set map f xs = [ f ( x ) | x ← xs ] M map ( λ ⟨ x , P ⟩ . ⟨ x , Q’ g agg P ⟩ ( partition f xs ) Torsten Grust U Tübingen 28

  53. A Uniform Query Representation Q’ g agg P = [ g ( y ) | y ← P ] agg partition f xs = [ ⟨ f ( x ), [ y | y ← xs , f ( x ) = f ( y ) ] M ⟩ | x ← xs ] set map f xs = [ f ( x ) | x ← xs ] M map ( λ ⟨ x , P ⟩ . ⟨ x , Q’ g agg P ⟩ ( partition f xs ) [ ⟨ f ( x ), [ g ( y ) | y ← R , f ( y ) = f ( x ) ] agg ⟩ | x ← R ] set Torsten Grust U Tübingen 28

  54. A Uniform Query Representation Q’ g agg P = [ g ( y ) | y ← P ] agg partition f xs = [ ⟨ f ( x ), [ y | y ← xs , f ( x ) = f ( y ) ] M ⟩ | x ← xs ] set map f xs = [ f ( x ) | x ← xs ] M map ( λ ⟨ x , P ⟩ . ⟨ x , Q’ g agg P ⟩ ( partition f xs ) SELECT f ( x ), agg ( g ( x )) FROM R AS x GROUP BY f ( x ) Torsten Grust U Tübingen 28

  55. XPath Torsten Grust U Tübingen 29

  56. XPath Comprehensions Torsten Grust U Tübingen 30

  57. XPath Comprehensions /descendant::a[following::b]/child::c Torsten Grust U Tübingen 30

  58. XPath Comprehensions /descendant::a[following::b]/child::c 1. Normalize, simplify, flip XPath step expressions Torsten Grust U Tübingen 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend