herodotos herodotou nedyalko borisov shivnath babu
play

Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University - PowerPoint PPT Presentation

Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University Table Partitioning Split parent table into smaller child tables (partitions) Partitioning methods: hash, range, list Sales Sales_Jan_11 Benefits id date . .


  1. Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University

  2. Table Partitioning  Split parent table into smaller child tables (partitions)  Partitioning methods: hash, range, list Sales Sales_Jan_11  Benefits id date . . . id date . . .  Improve query performance  Faster data loading, archival, backup Sales_Feb_11 id date . . .  Efficient statistics maintenance  Better cardinality estimation Sales_Mar_11  Fine-grained control for tuning id date . . .  . . . 06/14/2011 Duke University 2

  3. Recent Trends  Growing usage due to increased data sizes  Growing user control due to new SQL extensions  Partitioning conditions for derived tables  DBA must satisfy multiple objectives and constraints regarding partitioning  Implications  DBA may have limited control over partitioning scheme  Diverse mix of partitioning schemes 06/14/2011 Duke University 3

  4. Partitioning Schemes  Multidimensional (Hierarchical) partitioning  Tables partitioned on same key but different ranges  Range partitioning with non-equi ranges Ads A Payments P A.d P.id 0 P 1 Sales S A 1 A 2 A 3 A 4 A 5 10 P 2 S.d S.d Loans L 20 P 3 L.id S.id 0 0 30 L 1 S 11 S 11 S 12 S 12 S 13 S 13 P 4 20 20 40 P 5 L 2 S 21 S 21 S 22 S 22 S 23 S 23 50 40 40 S 1 S 2 S 3 P 6 60 L 3 S 31 S 31 S 32 S 32 S 33 S 33 P 7 60 60 70 P 8 L 4 S 41 S 42 S 43 80 80 80 06/14/2011 Duke University 4

  5. Partition-aware Query Optimization SELECT * FROM Sales S, Payments P WHERE S.id = P.id AND S.d > Feb-15 AND P.id < 25 Payments P P.id 0 P 1 P 1 Sales S 10 P 2 P 2 S.d S.d 20 P 3 P 3 S.id 0 30 S 11 S 11 S 11 S 12 S 12 S 12 S 12 S 13 S 13 S 13 S 13 P 4 P 4 20 40 P 5 P 5 S 21 S 21 S 21 S 22 S 22 S 22 S 22 S 23 S 23 S 23 S 23 50 40 S 1 S 2 S 3 P 6 P 6 60 S 31 S 31 S 31 S 32 S 32 S 32 S 32 S 32 S 33 S 33 S 33 S 33 S 33 P 7 P 7 60 70 P 8 P 8 S 41 S 41 S 42 S 42 S 42 S 43 S 43 S 43 80 80 06/14/2011 Duke University 5

  6. Partition-aware Query Optimization Sales S Payments P SELECT * S.d P.id 0 FROM Sales S, Payments P S.id P 1 0 10 S 12 S 13 WHERE S.id = P.id P 2 20 20 AND S.d > Feb-15 S 22 S 23 P 3 30 40 AND P.id < 25 P 1 : “join of unions” P 2 : “union of (partition - wise) joins” Union HJ Union Union MJ HJ TS(P 1 ) TS(S 12 ) Union Union Union IS(P 3 ) TS(P 2 ) TS(S 13 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 3 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(S 23 ) 06/14/2011 Duke University 6

  7. Partition-aware Query Optimization  More efficient partition-wise joins Legend S = Sales  More appropriate join orders P = Payments  More appropriate join operators L = Loans S ⋈ P ⋈ L S ⋈ P ⋈ L HJ Union Union Union MJ INLJ TS(L 1 ) MJ HJ MJ TS(L 1 ) HJ IS(P 3 ) TS(L 2 ) Union Union Union IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 7

  8. Problem & Challenges  Problem Definition  Given a partitioning scheme and a query find the optimal query execution plan  Challenges  Dealing with plan space explosion  Incorporating into state-of-the-art optimizers  Partitions as physical or logical properties?  Supporting a wide range of partitioning conditions 06/14/2011 Duke University 8

  9. Overview 3. Path Selection S.id = P.id MJ S P P 1 Union Union S 12 TS(S 22 ) TS(P 1 ) S 13 P 2 1. Matching TS(S 23 ) TS(P 2 ) S 22 P 3 P 1 S 12 S 23 HJ S 13 P 2 Union TS(P 3 ) S 22 P 3 2. Clustering TS(S 22 ) S 23 S 12 P 1 TS(S 23 ) S 13 P 2 S 22 P 3 S 23 06/14/2011 Duke University 9

  10. Matching Phase  Goal  Identify partition-wise join pairs that can generate output records  New data structure: Partition Index Tree (PIT)  Core idea: associate each partition with intervals  Functionalities: index intervals, efficient lookups  Implementation: Augmented red-black tree 06/14/2011 Duke University 10

  11. Matching Algorithm Inputs : Table S, Table P, Join condition J Step 1: Convert partitioning conditions to intervals Step 2: Build PIT with intervals of S Step 3: Probe PIT with intervals of P Output: Output : Partition-wise join pairs (S i , P j ) Partition Join Pairs P 2 [10, 20) Probe S 22 [20, 40) 40 S 22 [20, 40) 40 (S 12 , P 1 ) ✓ (S 12 , P 2 ) S 13 [0, 20) 40 S 13 [0, 20) 40 S 23 [20, 40) 40 S 23 [20, 40) 40 (S 13 , P 1 ) (S 13 , P 2 ) ✓ S 12 [0, 20) 40 S 12 [0, 20) 40 (S 22 , P 3 ) (S 23 , P 3 ) Partition Interval Max Partition Interval Max Notation: Notation: 06/14/2011 Duke University 11

  12. Matching Algorithm  Benefits  O(n log(n)), n = number of partitions of S  Θ(n) memory needs  Build & reuse PITs multiple times  Additional support  Numeric, dates, and string ranges or lists  Complex partitioning conditions (AND, OR)  Complex join conditions (AND, OR)  Non-equi joins  Details in the paper 06/14/2011 Duke University 12

  13. Clustering Phase  Goal Input : Partition join pairs (output from Matching Phase) Step 1: Build bipartite join partition graph  Minimize number of partition-wise join pairs Step 2: Find connected components using Breadth-First-Search Output : Clustered join pairs Input: Intermediate: Output: Partition Join Partition Graph Clustered Join Pairs Join Pairs S 12 P 1 (S 12 , P 1 ) ({S 12 ,S 13 },{P 1 ,P 2 }) S 13 P 2 (S 12 , P 2 ) ({S 22 ,S 23 },{P 3 }) (S 13 , P 1 ) S 12 (S 13 , P 2 ) P 3 (S 22 , P 3 ) S 13 (S 23 , P 3 ) 06/14/2011 Duke University 13

  14. Path Creation and Selection  Goal  Create and cost partition-wise join paths for child tables S ⋈ P ⋈ L Union MJ INLJ MJ TS(L 1 ) HJ IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 14

  15. Bottom-up Query Optimization Find & retain best 3-way join LSP paths per interesting order Find & retain best 2-way join LP LS SP paths per interesting order Find & retain best access L S P paths 06/14/2011 Duke University 15

  16. Bottom-up Query Optimization Logical Relation (Join) SP HJ MJ Union Union Union Union TS(S 12 ) TS(S 12 ) TS(P 1 ) IS(P 1 ) TS(P 2 ) TS(S 13 ) IS(P 2 ) TS(S 13 ) TS(P 3 ) IS(P 3 ) TS(S 22 ) TS(S 22 ) TS(S 23 ) TS(S 23 ) Best (physical) Best (physical) join path join path with interesting order 06/14/2011 Duke University 16

  17. Extended Enumeration HJ P j  Original enumeration Union Union  Create and cost join of unions TS(S 12 ) TS(P 1 )  Extended enumeration TS(P 2 ) TS(S 13 ) TS(P 3 )  Create and cost union of joins TS(S 22 ) TS(S 23 )  Retain best path per interesting order  Not enough! Union P u  Not considering entire MJ HJ plan space (e.g., Union Union Union TS(P 3 ) if P j is best, no 3-way TS(P 1 ) TS(S 12 ) TS(S 22 ) partition-wise joins) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 17

  18. Treating Partitions as Physical Properties  Interesting partitions in joins  Can make later joins less expensive  Approach  Retain best path for each interesting order  Retain best path for each interesting partition  Limitation  Not considering entire plan space (e.g., cannot create union of joins with different join orders) 06/14/2011 Duke University 18

  19. Treating Partitions as Logical Properties Logical parent join LSP L 1 S 12 S 13 P 1 P 2 Logical child joins L 2 S 22 S 23 P 3 Logical child joins LS LP SP L 1 S 12 S 13 L 1 P 1 P 2 S 12 S 13 P 1 P 2  Treated like L 2 P 3 L 2 S 22 S 23 S 22 S 23 P 3 logical parent joins  Retain best join L S P S 12 paths per L 1 P 1 S 13 interesting L 2 P 2 S 22 P 3 order S 23 06/14/2011 Duke University 19

  20. Treating Partitions as Logical Properties  Property 1  Interesting orders independent across child joins  Property 2  Child joins can have different join orders/operators  Property 3  Entire extended plan space is enumerated  Optimality guarantee  Our bottom-up optimizer will find the optimal plan in the extended plan space 06/14/2011 Duke University 20

  21. Experimental Evaluation  Prototype using PostgreSQL 8.3.7  TPC-H benchmark (scale 30)  Evaluation Methodology  DBA has full/limited control over partitioning scheme  State-of-the-art Vs. Our partition-aware optimizer  Optimizer evaluation metrics  Execution time  Optimization time  Memory utilization 06/14/2011 Duke University 21

  22. Evaluation: Execution Time State-of-the-art Partition aware Basic Advanced 140 Execution Time (min) 120 100 80 60 40 20 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 22

  23. Evaluation: Optimization Time Basic/Intermediate Advanced Optimization Time (ms) 350 300 250 200 150 100 50 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 23

  24. Evaluation: Memory Utilization PostgreSQL Partition-aware Basic Advanced 40 Memory Usage (MB) 30 20 10 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 24

  25. Summary  Extended plan space to include plans with multiway partition-wise joins  Developed new partition-aware optimization techniques  Easy incorporation into bottom-up query optimizers 06/14/2011 Duke University 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend