main memory adaptive indexing for multi core systems
play

Main Memory Adaptive Indexing for Multi-core Systems Felix Martin - PowerPoint PPT Presentation

SIGMOD DaMoN 23.06.2014 Main Memory Adaptive Indexing for Multi-core Systems Felix Martin Schuhknecht Victor Alvarez Jens Dittrich Stefan Richter Information Systems Group Saarland University https://infosys.uni-saarland.de/ Problem:


  1. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  2. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  3. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries > 10000 10 Queries 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  4. Multi-threaded environments? Multi-threaded algorithms! 10 / 30

  5. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  6. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  7. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 Q2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  8. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 T1 Q2 T2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  9. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 W Q1 R T1 R W R R R R Q2 T2 W R W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  10. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ R R ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  11. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  12. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  13. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W Underutilize resources (T3, T4, T5, ...) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  14. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Query 12 / 30

  15. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30

  16. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30

  17. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30

  18. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 12 / 30

  19. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30

  20. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  21. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  22. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  23. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  24. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index No Tk consecutive Cracker Index result k Chunks 13 / 30

  25. Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30

  26. Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 No problem for realistic k 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30

  27. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A 15 / 30

  28. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying 1024 partitions 15 / 30

  29. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index 1024 partitions 15 / 30

  30. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index Query 2. Perform P-SC 1024 partitions 15 / 30

  31. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index W Query R 2. Perform P-SC R W 1024 partitions 15 / 30

  32. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30

  33. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30

  34. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  35. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  36. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index How to do? W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  37. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 . . . t k Thread k . . . t k Thread k 16 / 30

  38. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30

  39. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30

  40. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30

  41. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30

  42. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required t k Thread k 16 / 30

  43. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30

  44. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k NUMA- fragmented memory Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30

  45. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  46. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  47. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  48. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  49. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30

  50. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) P-CSC + Range Partitioning Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30

  51. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A 18 / 30

  52. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A Index(A) 1. Range-partition while copying 1024 partitions 18 / 30

  53. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition Fully sorted 1024 partitions 18 / 30

  54. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition shared with P-CCGI Fully sorted 1024 partitions 18 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend