database system implementation
play

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #23: VECTORIZED EXECUTION 2 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer


  1. 37 COMPILER HINTS The restrict keyword in void add ( int * restrict X , C++ tells the compiler that int * restrict Y , int * restrict Z ) { the arrays are distinct for ( int i=0; i<MAX; i++) { locations in memory. Z[i] = X[i] + Y[i]; } }

  2. 38 COMPILER HINTS This pragma tells the void add ( int *X , compiler to ignore loop int *Y , int *Z ) { dependencies for the vectors. #pragma ivdep for ( int i=0; i<MAX; i++) { It’s up to you make sure that Z[i] = X[i] + Y[i]; } this is correct. }

  3. 39 EXPLICIT VECTORIZATION Use CPU intrinsics to manually marshal data between SIMD registers and execute vectorized instructions. Potentially not portable.

  4. 40 EXPLICIT VECTORIZATION Store the vectors in 128-bit void add ( int *X , SIMD registers. int *Y , int *Z ) { __mm128i *vecX = (__m128i*)X; Then invoke the intrinsic to __mm128i *vecY = (__m128i*)Y; add together the vectors and __mm128i *vecZ = (__m128i*)Z; for ( int i=0; i<MAX /4 ; i++) { write them to the output _mm_store_si128(vecZ++, location. _mm_add_epi32(*vecX, *vecY)) ; } }

  5. 41 VECTORIZATION DIRECTION Approach #1: Horizontal → Perform operation on all elements together within a single vector. Approach #2: Vertical → Perform operation in an elementwise manner on elements of each vector. Source: Przemys ł aw Karpi ń ski

  6. 42 VECTORIZATION DIRECTION Approach #1: Horizontal 0 1 2 3 → Perform operation on all elements 6 SIMD Add together within a single vector. Approach #2: Vertical → Perform operation in an elementwise manner on elements of each vector. Source: Przemys ł aw Karpi ń ski

  7. 43 VECTORIZATION DIRECTION Approach #1: Horizontal 0 1 2 3 → Perform operation on all elements 6 SIMD Add together within a single vector. Approach #2: Vertical 0 1 2 3 → Perform operation in an elementwise manner on elements of each vector. 1 2 3 4 SIMD Add 1 1 1 1 Source: Przemys ł aw Karpi ń ski

  8. 44 EXPLICIT VECTORIZATION Linear Access Operators → Predicate evaluation → Compression Ad-hoc Vectorization → Sorting → Merging Composable Operations → Multi-way trees → Bucketized hash tables Source: Orestis Polychroniou

  9. 45 VECTORIZED DBMS ALGORITHMS Principles for efficient vectorization by using fundamental vector operations to construct more advanced functionality. → Favor vertical vectorization by processing different input data per lane. → Maximize lane utilization by executing different things per lane subset. RET RETHINK NKING NG SIMD VEC ECTORI RIZATION N FO FOR R IN-ME MEMO MORY DA DATABASES SIGMOD 2015

  10. 46 FUNDAMENTAL OPERATIONS Selective Load Selective Store Selective Gather Selective Scatter

  11. 47 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

  12. 48 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

  13. 49 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

  14. 50 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

  15. 51 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D Mask 0 1 0 1 U V W X Y Z • • • Memory

  16. 52 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U Mask 0 1 0 1 U V W X Y Z • • • Memory

  17. 53 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U Mask 0 1 0 1 U V W X Y Z • • • Memory

  18. 54 FUNDAMENTAL VECTOR OPERATIONS Selective Load Vector A B C D U V Mask 0 1 0 1 U V W X Y Z • • • Memory

  19. 55 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  20. 56 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  21. 57 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  22. 58 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  23. 59 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  24. 60 FUNDAMENTAL VECTOR OPERATIONS Selective Load Selective Store U V W X Y Z • • • Vector Memory A B C D U V B D Mask Mask 0 1 0 1 0 1 0 1 U V W X Y Z • • • Memory Vector A B C D

  25. 61 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory

  26. 62 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory

  27. 63 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

  28. 64 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B C A D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

  29. 65 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B W A C D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

  30. 66 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Value Vector A B W V Z A C X D Index Vector 2 1 5 3 U V W X Y Z • • • Memory 0 1 2 3 4 5

  31. 67 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter U V W X Y Z • • • Value Vector Memory W V A B C Z A X D Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

  32. 68 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter U V W X Y Z • • • Value Vector Memory W V A B C Z A X D Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

  33. 69 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V C A Z D X Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

  34. 70 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V C A Z D X Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

  35. 71 FUNDAMENTAL VECTOR OPERATIONS Selective Gather Selective Scatter 0 1 2 3 4 5 U V W X Y Z • • • Value Vector Memory A B W V A C Z D X B A D C Index Vector Index Vector 2 1 5 3 2 1 5 3 U V W X Y Z • • • Memory Value Vector A B C D 0 1 2 3 4 5

  36. 72 ISSUES Gathers and scatters are not really executed in parallel because the L1 cache only allows one or two distinct accesses per cycle. Gathers are only supported in newer CPUs. Selective loads and stores are also emulated in Xeon CPUs using vector permutations.

  37. 73 VECTORIZED OPERATORS Selection Scans Hash Tables Partitioning Paper provides additional info: → Joins, Sorting, Bloom filters. RET RETHINK NKING NG SIMD VEC ECTORI RIZATION N FO FOR R IN-ME MEMO MORY DA DATABASES SIGMOD 2015

  38. 74 SELECTION SCANS SELECT * FROM table WHERE key >= $(low) AND key <= $(high)

  39. 75 SELECTION SCANS Scalar (Branching) i = 0 for t in table : key = t.key if (key≥ low ) && (key≤ high ): copy (t, output[i]) i = i + 1

  40. 76 SELECTION SCANS Scalar (Branching) i = 0 for t in table : key = t.key if (key≥ low ) && (key≤ high ): copy (t, output[i]) i = i + 1

  41. 77 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m

  42. 78 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m

  43. 79 SELECTION SCANS Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if (key≥ low ) && (key≤ high ): key = t.key copy (t, output[i]) m = (key≥ low ? 1 : 0) && i = i + 1 (key≤ high ? 1 : 0) i = i + m Source: Bogdan Raducanu

  44. 80 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  45. 81 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  46. 82 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  47. 83 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  48. 84 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  49. 85 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false |

  50. 86 SELECTION SCANS Vectorized i = 0 for v t in table : simdLoad ( v t .key, v k ) v m = ( v k ≥ low ? 1 : 0) && ( v k ≤ high ? 1 : 0) simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

  51. 87 SELECTION SCANS Vectorized ID KEY 1 J i = 0 2 O for v t in table : 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

  52. 88 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

  53. 89 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) i = i + | v m ≠false | SELECT * FROM table WHERE key >= "O" AND key <= "U"

  54. 90 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) All Offsets i = i + | v m ≠false | 0 1 2 3 4 5 SELECT * FROM table WHERE key >= "O" AND key <= "U"

  55. 91 SELECTION SCANS Vectorized Key Vector ID KEY J O Y S U X 1 J i = 0 2 O for v t in table : SIMD Compare 3 Y simdLoad ( v t .key, v k ) 4 S v m = ( v k ≥ low ? 1 : 0) && Mask 0 1 0 1 1 0 5 U ( v k ≤ high ? 1 : 0) 6 X simdStore ( v t , v m , output[i]) All Offsets i = i + | v m ≠false | 0 1 2 3 4 5 SIMD Store SELECT * FROM table Matched Offsets 1 3 4 WHERE key >= "O" AND key <= "U"

  56. 92 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT)

  57. 93 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 1.8 1.7 1.7 1.7 16 2.0 1.5 1.6 1.4 1.2 5.7 5.7 5.6 5.3 4.9 4.3 2.8 1.3 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

  58. 94 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 16 2.0 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

  59. 95 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 Memory Bandwidth 16 2.0 Memory Bandwidth 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

  60. 96 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 16 2.0 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

  61. 97 SELECTION SCANS Scalar (Branching) Vectorized (Early Mat) Scalar (Branchless) Vectorized (Late Mat) MIC (Xeon Phi 7120P – 61 Cores + 4 × HT) Multi-Core (Xeon E3-1275v3 – 4 Cores + 2 × HT) 48 6.0 (billion tuples / sec) (billion tuples / sec) Throughput Throughput 32 4.0 16 2.0 0 0.0 0 1 2 5 10 20 50 100 0 1 2 5 10 20 50 100 Selectivity (%) Selectivity (%)

  62. 98 HASH TABLES – PROBING Linear Probing Hash Table KEY PAYLOAD

  63. 99 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key KEY PAYLOAD k1

  64. 100 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key hash(key) Hash Index KEY PAYLOAD # k1 h1

  65. 101 HASH TABLES – PROBING Scalar Linear Probing Hash Table Input Key hash(key) Hash Index KEY PAYLOAD # k1 h1 = k1 k9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend