lect ure 22 advanced database
play

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution - PowerPoint PPT Presentation

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15- 721 // Spring 2018 2 Bit-Slicing Bit-Weaving Relaxed Operator Fusion (Prashanth) CMU 15-721 (Spring 2018) 3 BITM AP EN CO DIN G Original Data


  1. Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15- 721 // Spring 2018

  2. 2 Bit-Slicing Bit-Weaving Relaxed Operator Fusion (Prashanth) CMU 15-721 (Spring 2018)

  3. 3 BITM AP EN CO DIN G Original Data id sex 1 M 2 M 3 M 4 F 6 M 7 F 8 M 9 M CMU 15-721 (Spring 2018)

  4. 3 BITM AP EN CO DIN G Original Data Compressed Data sex id sex M F id 1 M 1 1 0 2 M 2 1 0 3 M 3 1 0 4 F 4 0 1 6 M 6 1 0 7 F 7 0 1 8 M 8 1 0 9 M 9 1 0 CMU 15-721 (Spring 2018)

  5. 4 BITM AP IN DEX: EN CO DIN G Approach #1: Equality Encoding → Basic scheme with one Bitmap per unique value. Approach #2: Range Encoding → Use one Bitmap per interval instead of one per value. Approach #3: Hierarchical Encoding → Use a tree to identify empty key ranges. Approach #4: Bit-sliced Encoding → Use a Bitmap per bit location across all values. CMU 15-721 (Spring 2018)

  6. 5 H IERARCH ICAL EN CO DIN G Keys: 1 , 3 , 9 , 12 , 13 , 14 , 38 , 40 1 0 1 0 1 0 11 0000 0 1 00 0000 1 0 1 0 0000 1 00 1 11 00 0000 0000 0000 0000 0000 0 1 0 1 0000 0000 0000 0000 0000 0000 1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 HIERARCHICAL BITMAP INDEX: AN EFFICIENT AND SCALABLE INDEXING TECHNIQUE FOR SET- VALUED ATTRIBUTES Advances in Databases and Information Systems 2003 CMU 15-721 (Spring 2018)

  7. 5 H IERARCH ICAL EN CO DIN G Original: 8 bytes Keys: 1 , 3 , 9 , 12 , 13 , Encoded: 4 bytes 14 , 38 , 40 1 0 1 0 1 0 11 0 1 00 1 0 1 0 1 00 1 11 00 0 1 0 1 1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 HIERARCHICAL BITMAP INDEX: AN EFFICIENT AND SCALABLE INDEXING TECHNIQUE FOR SET- VALUED ATTRIBUTES Advances in Databases and Information Systems 2003 CMU 15-721 (Spring 2018)

  8. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode 1 21042 2 15217 3 02903 4 90220 6 14623 7 53703 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  9. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 2 15217 3 02903 4 90220 6 14623 7 53703 bin(21042)→ 00 1 0 1 00 1 000 11 00 1 0 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  10. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 0 2 15217 3 02903 4 90220 6 14623 7 53703 bin(21042)→ 00 1 0 1 00 1 000 11 00 1 0 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  11. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 2 15217 3 02903 4 90220 6 14623 7 53703 bin(21042)→ 00 1 0 1 00 1 000 11 00 1 0 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  12. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 2 15217 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 3 02903 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 1 1 4 90220 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 0 0 6 14623 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 1 1 1 7 53703 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 1 1 1 SELECT * FROM customer_dim WHERE zipcode < 15217 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  13. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 2 15217 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 3 02903 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 1 1 4 90220 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 0 0 6 14623 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 1 1 1 7 53703 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 1 1 1 Walk each slice and construct a result bitmap. SELECT * FROM customer_dim WHERE zipcode < 15217 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  14. 6 BIT- SLICED EN CO DIN G Original Data Bit-Slices id zipcode N? 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 21042 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 2 15217 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 3 02903 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 1 1 4 90220 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 1 0 0 6 14623 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 1 1 1 7 53703 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 1 1 1 Walk each slice and construct a result bitmap. SELECT * FROM customer_dim Skip entries that have 1 in first 3 slices (16, 15, 14) WHERE zipcode < 15217 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  15. 7 BIT- SLICED EN CO DIN G Bit-slices can also be used for efficient aggregate computations. Example: SUM( attr ) → First, count the number of 1 s in slice 17 and multiply the count by 2 17 → Then, count the number of 1 s in slice 16 and multiply the count by 2 16 → Repeat for the rest of slices… Intel added POPCNT SIMD instruction in 2008. CMU 15-721 (Spring 2018)

  16. 8 O BSERVATIO N The bit width of compressed data does not always fit naturally into SIMD register lanes. → This means that the DBMS has to do extra work to transform data into the proper format. Just because the lanes are fully utilized does not mean the bits are fully utilized… A B C D 0 1 0 1 SIMD Compare X B X D CMU 15-721 (Spring 2018)

  17. 8 O BSERVATIO N The bit width of compressed data does not always fit naturally into SIMD register lanes. → This means that the DBMS has to do extra work to transform data into the proper format. Just because the lanes are fully utilized does not mean the bits are fully utilized… 0 1 0 0 0 0 0 1 A B C D 0 1 0 1 SIMD Compare 0 1 0 1 1 0 0 0 X B X D CMU 15-721 (Spring 2018)

  18. 9 BITWEAVIN G Alternative storage layout for columnar databases that is designed for efficient predicate evaluation on compressed data using SIMD. → Order-preserving dictionary encoding. → Bit-level parallelization. → Only require common instructions (no scatter/gather) Implemented in Wisconsin’s QuickStep engine. Became an Apache Incubator project in 2016. BITWEAVING: FAST SCANS FOR MAIN MEMORY DATA P PROCESSING SIGMOD 2013 CMU 15-721 (Spring 2018)

  19. 9 BITWEAVIN G Alternative storage layout for columnar databases that is designed for efficient predicate evaluation on compressed data using SIMD. → Order-preserving dictionary encoding. → Bit-level parallelization. → Only require common instructions (no scatter/gather) Implemented in Wisconsin’s QuickStep engine. Became an Apache Incubator project in 2016. BITWEAVING: FAST SCANS FOR MAIN MEMORY DATA P PROCESSING SIGMOD 2013 CMU 15-721 (Spring 2018)

  20. 10 BITWEAVIN G STO RAGE LAYO UTS Approach #1: Horizontal → Row-oriented storage at the bit-level Approach #2: Vertical → Column-oriented storage at the bit-level CMU 15-721 (Spring 2018)

  21. 11 H O RIZO N TAL STO RAGE t 0 =1 0 0 1 t 1 =5 1 0 1 t 2 =6 1 1 0 Segment #1 t 3 =1 0 0 1 t 4 =6 1 1 0 t 5 =4 1 0 0 t 6 =0 0 0 0 t 7 =7 1 1 1 Segment #2 t 8 =4 1 0 0 t 9 =3 0 1 1 CMU 15-721 (Spring 2018)

  22. 11 H O RIZO N TAL STO RAGE t 0 0 0 1 Segment #1 Segment #2 t 1 1 0 1 t 0 t 4 t 8 t 9 t 2 1 1 0 Segment #1 v 0 0 v 4 0 t 3 0 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 t 1 t 5 t 4 1 1 0 t 5 v 1 0 1 0 0 1 0 1 0 1 0 0 t 6 t 2 t 6 0 0 0 t 7 v 2 0 1 1 1 1 1 0 0 0 0 0 t 3 t 7 Segment #2 t 8 1 0 0 v 3 0 0 0 1 0 1 1 1 t 9 0 1 1 CMU 15-721 (Spring 2018)

  23. 11 H O RIZO N TAL STO RAGE t 0 0 0 1 Segment #1 Segment #2 t 1 1 0 1 t 0 t 4 t 8 t 9 t 2 1 1 0 Segment #1 v 0 0 v 4 0 t 3 0 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 t 1 t 5 t 4 1 1 0 Processor Word t 5 v 1 0 1 0 0 1 0 1 0 1 0 0 t 6 t 2 t 6 0 0 0 t 7 v 2 0 1 1 1 1 1 0 0 0 0 0 t 3 t 7 Segment #2 t 8 1 0 0 v 3 0 0 0 1 0 1 1 1 t 9 0 1 1 Processor Word CMU 15-721 (Spring 2018)

  24. 11 H O RIZO N TAL STO RAGE t 0 0 0 1 Segment #1 Segment #2 t 1 1 0 1 t 0 t 4 t 8 t 9 Delimiter t 2 1 1 0 Segment #1 v 0 0 v 4 0 t 3 0 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 t 1 t 5 t 4 1 1 0 Processor Word t 5 v 1 0 1 0 0 1 0 1 0 1 0 0 t 6 t 2 t 6 0 0 0 t 7 v 2 0 1 1 1 1 1 0 0 0 0 0 t 3 t 7 Segment #2 t 8 1 0 0 v 3 0 0 0 1 0 1 1 1 t 9 0 1 1 Processor Word CMU 15-721 (Spring 2018)

  25. 12 BITWEAVIN G/ H EXAM PLE SELECT * FROM table WHERE val < 5 t 0 t 4 X = 0 0 0 1 0 1 1 0 5 5 Y = 0 1 0 1 0 1 0 1 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  26. 12 BITWEAVIN G/ H EXAM PLE SELECT * FROM table WHERE val < 5 t 0 t 4 X = 0 0 0 1 0 1 1 0 5 5 Y = 0 1 0 1 0 1 0 1 mask = 0 1 1 1 0 1 1 1 (Y+(X ⊕ mask)) ∧ ¬mask = 1 0 0 0 0 0 0 0 Source: Jignesh Patel CMU 15-721 (Spring 2018)

  27. 12 BITWEAVIN G/ H EXAM PLE SELECT * FROM table WHERE val < 5 t 0 t 4 X = 0 0 0 1 0 1 1 0 5 5 Y = 0 1 0 1 0 1 0 1 mask = 0 1 1 1 0 1 1 1 (Y+(X ⊕ mask)) ∧ ¬mask = 1 0 0 0 0 0 0 0 Selection Vector Source: Jignesh Patel CMU 15-721 (Spring 2018)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend