relax and let the database do the partitioning online
play

Relax and Let the Database do the Partitioning Online Alekh Jindal, - PowerPoint PPT Presentation

Information Systems Group Relax and Let the Database do the Partitioning Online Alekh Jindal, Jens Dittrich - presented by Stefan Schuh VLDB International Workshop on Real-Time Business Intelligence September 2, 2011 Thursday, September 1,


  1. Information Systems Group Relax and Let the Database do the Partitioning Online Alekh Jindal, Jens Dittrich - presented by Stefan Schuh VLDB International Workshop on Real-Time Business Intelligence September 2, 2011 Thursday, September 1, 2011

  2. Motivation: Offline Physical Database Design offline workload analysis BI Applications Advisory Tools offline transformation Database DBA 2 Thursday, September 1, 2011

  3. Offline Design Cheats! • Workloads infrequently change over time • DBAs always available • Physical design once-in-a-while process • DBAs make perfect decisions 3 Thursday, September 1, 2011

  4. Motivation: Offline Physical Database Design offline workload analysis BI Applications Advisory Tools offline transformation Database DBA 4 Thursday, September 1, 2011

  5. Motivation: Online Physical Database Design Sub-Problem Proposed Solution Online Indexing BI Applications Indexing Database Cracking Adaptive Indexing Materialized Views Dynamic Materialized Views WE! Partitioning Database 5 Thursday, September 1, 2011

  6. Challenges in Online Partitioning • Collecting online workload • Analyzing workload online • Querying with online workload analysis • Creating partitions online 6 Thursday, September 1, 2011

  7. Challenges in Online Partitioning • Collecting online workload • Analyzing workload online • Querying with online workload analysis • Creating partitions online 6 Thursday, September 1, 2011

  8. What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) 7 Thursday, September 1, 2011

  9. What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 7 Thursday, September 1, 2011

  10. What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 7 Thursday, September 1, 2011

  11. What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 7 Thursday, September 1, 2011

  12. What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 ..... 7 Thursday, September 1, 2011

  13. How to Express the Partitioning Problem? • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011

  14. How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( a 1 a 2 a 3 a 4 a 5 a 6 a 3 a 2 a 1 a 5 a 4 a 6 8 Thursday, September 1, 2011

  15. How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( • Split line, Split vector e.g. [01001] ctor S [ ] 0 1 0 0 1 a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011

  16. How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , [ 0 1 ] 0 0 1 ( ( ( ( ( ( • Split line, Split vector e.g. [01001] a 1 a 2 a 3 a 4 a 5 a 6 ctor S • Partition e.g. (a 1 , a 2 ) artition p m,r ( S, � ) ( a 1 , a 2 ) a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011

  17. How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , [ 0 1 ] 0 0 1 ( ( ( ( ( ( • Split line, Split vector e.g. [01001] a 1 a 2 a 3 a 4 a 5 a 6 ctor S ( a 1 , a 2 ) • Partition e.g. (a 1 , a 2 ) a 1 a 2 a 3 a 4 a 5 a 6 artition p m,r ( S, � ) • Partitioning scheme e.g. (a 1 , a 2 ), (a 3 , a 4 , a 5 ), (a 6 ) scheme P ( S, � ) { (a 1 , a 2 ) (a 3 , a 4 , a 5 ) (a 6 ) } a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011

  18. What about Horizontal Partitioning? • Just rotate the table by 90 degrees • abstraction allows us to solve both problems set P u • can be attributes, row-ranges, or any other table slice set P u a 1 a 2 a 3 a 4 a 5 a 6 a 1 a 2 a 3 a 4 a 5 a 6 r 6 r 5 r 4 r 3 r 2 r 1 9 Thursday, September 1, 2011

  19. Partitioning Problem: What to Analyze? • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( • Split line, Split vector e.g. [01001] ctor S • Partition e.g. (a 1 , a 2 ) artition p m,r ( S, � ) • Partitioning scheme e.g. (a 1 , a 2 ), (a 3 , a 4 , a 5 ), (a 6 ) scheme P ( S, � ) • Workload ad W t k • Problem statement Find , such that: S ⇥ S ⇥ = argmin � ⇥ ordering ⇥ , W t k , P ( S, � ) C est. S ( 10 Thursday, September 1, 2011

  20. How to Analyze the Workload? Step 1: Finding Partitioning Unit Ordering • offline approach: create affinity matrix and cluster it once, as proposed by Navathe et. al. • online approach: leverage the affinity idea, but dynamically update and cluster the affinity matrix 11 Thursday, September 1, 2011

  21. Offline Partitioning Unit Ordering • Create affinity matrix having attributes co-occurrences PartKey SuppKey Quantity PartKey 8 5 6 SuppKey 5 8 4 Quantity 6 4 9 • Cluster affinity matrix to maximize the affinity measure ⇥ ⇥ as M ( ⇥ ) = � x � x j =1 A ( a i , a j )[ A ( a i , a j − 1 )+ A ( a i , a j +1 )]. i =1 ) = A ( a , a ) = A ( a , a ) = A ( a , a )=0. For the = 404 = 440 e M ( ⇥ ) e M ( ⇥ ) matrix has matrix has PartKey SuppKey Quantity PartKey Quantity SuppKey PartKey 8 5 6 PartKey 8 6 5 SuppKey 5 8 4 Quantity 6 9 4 Quantity 6 4 9 SuppKey 5 4 8 12 Thursday, September 1, 2011

  22. Online Partitioning Unit Ordering • Update only the referenced in affinity matrix set P u PartKey Quantity SuppKey PartKey Quantity SuppKey (PartKey, SuppKey) PartKey 8 6 5 PartKey 9 6 6 Quantity 6 9 4 Quantity 6 9 4 SuppKey 5 4 8 SuppKey 6 4 9 • Re-cluster only the referenced in affinity matrix set P u +48 0 PartKey Quantity SuppKey SuppKey PartKey Quantity PartKey 9 6 6 SuppKey 6 9 6 Quantity 6 9 4 PartKey 4 6 9 SuppKey 6 4 9 Quantity 9 6 4 13 Thursday, September 1, 2011

  23. How to Analyze the Workload? Step 2: Enumerating Split Vectors • offline approach: consider all possible split vectors (brute force) a 1 a 2 a 3 .. ... a n is 2 n � 1 . Complexity: 14 Thursday, September 1, 2011

  24. How to Analyze the Workload? Step 2: Enumerating Split Vectors • offline approach: consider all possible split vectors (brute force) • online approach: One-dimensional Online Partitioning (O 2 P) Algorithm Technique 1: prune non-referenced partitioning units Technique 2: consider split vectors greedily Technique 3: save previous best split vectors using dynamic programming 15 Thursday, September 1, 2011

  25. Partitioning Unit Pruning Idea: Prune the unused (non-referenced) in set P u at most two separate partitions a 1 a 2 a 3 .. ... a n Complexity: For p leading and q trailing unused set P u is 2 n � p � q � 1 . 16 Thursday, September 1, 2011

  26. Greedy Split Vector Enumeration Idea: Mark only one (best) split vector at a time a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n Complexity: worst case n 2 17 Thursday, September 1, 2011

  27. Dynamic Programming Idea: save best split vectors in un-split partitions a 1 a 2 a 3 .. ... a n Best Best a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n 18 Thursday, September 1, 2011

  28. How to Amortize Partitioning Analysis? • offline approach: querying after computing and creating partitions Analysis Cost Query Cost • online approach: option 1 : interleave queries with partitioning analysis option 2 : queries in a separate thread Analysis Cost Query Cost 19 Thursday, September 1, 2011

  29. Goals of the Experiments • Does greedy partitioning hurt Quality? • How much is O 2 P faster? • Can such a system adapt to changing workload ? • Will our approach work on real systems? 20 Thursday, September 1, 2011

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend