15 721
play

15-721 ADVANCED DATABASE SYSTEMS Lecture #18 Parallel Join - PowerPoint PPT Presentation

15-721 ADVANCED DATABASE SYSTEMS Lecture #18 Parallel Join Algorithms (Hashing) Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAYS AGENDA Background Parallel


  1. 15-721 ADVANCED DATABASE SYSTEMS Lecture #18 – Parallel Join Algorithms (Hashing) Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017

  2. 2 TODAY’S AGENDA Background Parallel Hash Join Hash Functions Hash Table Implementations Evaluation CMU 15-721 (Spring 2017)

  3. 3 PARALLEL JOIN ALGORITHMS Perform a join between two relations on multiple threads simultaneously to speed up operation. Two main approaches: → Hash Join → Sort-Merge Join We won’t discuss nested-loop joins… CMU 15-721 (Spring 2017)

  4. 4 OBSERVATION Many OLTP DBMSs don’t implement hash join. But a index nested-loop join with a small number of target tuples is more or less equivalent to a hash join. CMU 15-721 (Spring 2017)

  5. 5 HASHING VS. SORTING 1970s – Sorting 1980s – Hashing 1990s – Equivalent 2000s – Hashing 2010s – ??? CMU 15-721 (Spring 2017)

  6. 6 PARALLEL JOIN ALGORITHMS SO SORT VS. S. HASH SH REV EVISI SITED ED: FAST ST JOIN I JO N IMPLEMENT NTATION O N ON N MO MODERN MUL MULTI-CO CORE CPU CPUS VLDB 2009 → Hashing is faster than Sort-Merge. → Sort-Merge will be faster with wider SIMD. MASSI SSIVEL ELY PARA RALLEL S SORT ORT- MAI AIN-MEMORY ORY HASH JOI OINS ON ON MERG RGE JOI OINS I IN MAIN MEMORY ORY MULTI-CO MUL CORE CPU CPUS: : TUNING T TO MULTI-CORE D MUL E DATABASE SY SE SYST STEM EMS THE U UNDERL RLYI YING H HARD RDWARE RE VLDB 2012 ICDE 2013 → Sort-Merge is already faster, → New optimizations and results even without SIMD. for Radix Hash Join. Source: Cagri Balkesen CMU 15-721 (Spring 2017)

  7. 7 JOIN ALGORITHM DESIGN GOALS Goal #1: Minimize Synchronization → Avoid taking latches during execution. Goal #2: Minimize CPU Cache Misses → Ensure that data is always local to worker thread. CMU 15-721 (Spring 2017)

  8. 8 IMPROVING CACHE BEHAVIOR Factors that affect cache misses in a DBMS: → Cache + TLB capacity. → Locality (temporal and spatial). Non-Random Access (Scan): → Clustering to a cache line. → Execute more operations per cache line. Random Access (Lookups): → Partition data to fit in cache + TLB. Source: Johannes Gehrke CMU 15-721 (Spring 2017)

  9. 9 PARALLEL HASH JOINS Hash join is the most important operator in a DBMS for OLAP workloads. It’s important that we speed it up by taking advantage of multiple cores. → We want to keep all of the cores busy, without becoming memory bound DESIGN AND EVALUATION OF MAIN MEMORY HASH JOIN ALGORITHMS FOR MULTI-CORE CPUS SIGMOD 2011 CMU 15-721 (Spring 2017)

  10. 10 CLOUDERA IMPALA % of Total CPU Time Spent in Query Operators Workload: TPC-H Benchmark HASH JOIN 25.0% SEQ SCAN 49.6% UNION 3.1% AGGREGATE 19.9% OTHER 2.4% CMU 15-721 (Spring 2017)

  11. 11 HASH JOIN (R ⨝ S) Phase #1: Partition ( optional ) → Divide the tuples of R and S into sets using a hash on the join key. Phase #2: Build → Scan relation R and create a hash table on join key. Phase #3: Probe → For each tuple in S , look up its join key in hash table for R . If a match is found, output combined tuple. CMU 15-721 (Spring 2017)

  12. 12 PARTITION PHASE Split the input relations into partitioned buffers by hashing the tuples’ join key(s). → The hash function used for this phase should be different than the one used in the build phase. → Ideally the cost of partitioning is less than the cost of cache misses during build phase. Contents of buffers depends on storage model: → NSM : Either the entire tuple or a subset of attributes. → DSM : Only the columns needed for the join + offset. CMU 15-721 (Spring 2017)

  13. 13 PARTITION PHASE Approach #1: Non-Blocking Partitioning → Only scan the input relation once. → Produce output incrementally. Approach #2: Blocking Partitioning (Radix) → Scan the input relation multiple times. → Only materialize results all at once. CMU 15-721 (Spring 2017)

  14. 14 NON-BLOCKING PARTITIONING Scan the input relation only once and generate the output on-the-fly. Approach #1: Shared Partitions → Single global set of partitions that all threads update. → Have to use a latch to synchronize threads. Approach #2: Private Partitions → Each thread has its own set of partitions. → Have to consolidate them after all threads finish. CMU 15-721 (Spring 2017)

  15. 15 SHARED PARTITIONS Data Table A B C CMU 15-721 (Spring 2017)

  16. 15 SHARED PARTITIONS Data Table hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  17. 15 SHARED PARTITIONS Data Table Partitions hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  18. 15 SHARED PARTITIONS Data Table Partitions hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  19. 16 PRIVATE PARTITIONS Data Table Partitions hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  20. 16 PRIVATE PARTITIONS Data Table Partitions hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  21. 16 PRIVATE PARTITIONS Data Table Partitions Combined hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  22. 17 RADIX PARTITIONING Scan the input relation multiple times to generate the partitions. Multi-step pass over the relation: → Step #1: Scan R and compute a histogram of the # of tuples per hash key for the radix at some offset. → Step #2: Use this histogram to determine output offsets by computing the prefix sum . → Step #3: Scan R again and partition them according to the hash key. CMU 15-721 (Spring 2017)

  23. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 CMU 15-721 (Spring 2017)

  24. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 Radix 9 2 3 8 1 4 CMU 15-721 (Spring 2017)

  25. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 Radix 8 1 2 0 4 6 CMU 15-721 (Spring 2017)

  26. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 CMU 15-721 (Spring 2017)

  27. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 Prefix Sum 1 CMU 15-721 (Spring 2017)

  28. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 + Prefix Sum 1 3 CMU 15-721 (Spring 2017)

  29. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 + + + + + Prefix Sum 1 3 6 10 15 21 CMU 15-721 (Spring 2017)

  30. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  31. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  32. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  33. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p Partition 0: 2 # 0 7 Partition 1: 2 p # 0 3 p # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  34. 20 RADIX PARTITIONS Step #2: Compute output offsets Partition 0 , CPU 0 # 0 7 p # 1 8 p 0 Partition 0, CPU 1 hash P (key) # 1 9 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 Partition 1: 2 p # 0 3 p Partition 1, CPU 1 # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  35. 20 RADIX PARTITIONS Step #3: Read input and partition Partition 0 , CPU 0 # 0 7 0 7 p # 1 8 p 0 Partition 0, CPU 1 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 Partition 1: 2 p # 0 3 p Partition 1, CPU 1 # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  36. 20 RADIX PARTITIONS Step #3: Read input and partition Partition 0 , CPU 0 # 0 7 0 7 p # 1 8 0 7 p 0 Partition 0, CPU 1 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 1 8 Partition 1: 2 p # 0 3 1 9 p Partition 1, CPU 1 # 1 1 1 1 p 1 # 1 5 1 5 p Partition 0: 1 # 1 0 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  37. 20 RADIX PARTITIONS Partition 0 # 0 7 0 7 p # 1 8 0 7 p 0 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 # 0 7 1 8 Partition 1: 2 p # 0 3 1 9 p # 1 1 1 1 p 1 # 1 5 1 5 p Partition 0: 1 # 1 0 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend