efficient lightweight compression alongside fast scans
play

Efficient Lightweight Compression Alongside Fast Scans Orestis - PowerPoint PPT Presentation

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross DaMoN 2015, Melbourne, Victoria, Australia Databases & Compression Process data on disk Nearly unlimited capacity Affects query


  1. Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross DaMoN 2015, Melbourne, Victoria, Australia

  2. Databases & Compression Process data on disk ❖ Nearly unlimited capacity ❖ Affects query optimization ❖ 0.6 Minimize # of blocks fetched ❖ 0.45 read GB/s Minimize # of random block accesses ❖ 0.3 Compress to improve disk speed ❖ Focused on compression rate since disks are “slow” ❖ 0.15 0 M M ) 5 1 P P 0 R R 2 - 0 0 4 0 0 1 4 2 0 5 7 2 ( D D D D D S H H S

  3. Databases & Compression Process data on disk ❖ Nearly unlimited capacity ❖ Affects query optimization ❖ 60 Minimize # of blocks fetched ❖ 45 read GB/s Minimize # of random block accesses ❖ 30 Compress to improve disk speed ❖ Focused on compression rate since disks are “slow” ❖ 15 0 Process data on RAM ❖ 3 3 4 R R R D D D Always limited capacity ❖ D D D l l l e e e Affects query optimization & query execution n n n ❖ n n n a a a h h h Minimize # of accesses (e.g. column stores & late materialization) c c c ❖ - - - 2 4 4 Minimize # of random (out of CPU cache) accesses (e.g. partitioned join) ❖ Compress to improve RAM speed & avoid disk ❖ Focused on (de-) compression efficiency as RAM is “fast” ❖

  4. 
 
 Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ 14 2 * 32 21 15 1 17 min = 14 
 3 21 max = 21 7 8 * 32 + 8 * b 
 14 0 b = log (max-min+1) 19 5 = 256 bits = 3 bits per code) = 88 bits 14 0 20 6 17 3

  5. Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression + ❖ A 0 C A 2 original compressed A B 0 data data B C 1 A D 0 n*W bits n* b bits D 3 dictionary with 2 C D distinct values B 1 ( b = logD)

  6. Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖

  7. Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖ DBMS integration ❖ Decompress during execution ❖ In CPU cache (non-integrated) or in registers (integrated) ❖

  8. Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖ DBMS integration ❖ Decompress during execution ❖ In CPU cache (non-integrated) or in registers (integrated) ❖ Process compressed data without decompressing ❖

  9. Bit Packing Definition ❖ Input code width is hardware-supported ❖ 8-bit, 16-bit, 32-bit, 64-bit ❖ Output code width b must be (almost) constant ❖ Either constant across the entire input ❖ Or constant for the next group of items (e.g. frame-of-reference) ❖ A 0 0 dictionary C 2 2 A A 0 0 B B 1 1 bit A C 0 0 packing D D mapping 
 3 3 original C 2 (not mat- 
 2 data B 1 1 erialized)

  10. Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 00010101 00110001 11110101 01100110 00100100

  11. Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ Vertical bit packing ❖ Bits of codes are interleaved ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 4 0111 0011 0101 1001 0001 0110 1100 0001 1000 0110

  12. Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ Vertical bit packing ❖ Bits of codes are interleaved ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 4 0111 0011 0101 1001 0001 0110 1100 0001 1000 0110 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 8 01110110 00111100 01010001 10011000 00010110

  13. Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖

  14. Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖ Horizontal layouts ❖ Fully packed ❖ Fast unpacking & scanning ❖ Word aligned ❖ Faster scanning ❖

  15. Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖ Horizontal layouts ❖ Fully packed ❖ Fast unpacking & scanning ❖ Word aligned ❖ Faster scanning ❖ Vertical layout ❖ Known traits ❖ Fastest scanning ❖ New traits ❖ Fast packing & unpacking ❖

  16. Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖

  17. Horizontal Layout Fully packed ❖ Pack Unpack No space wasted ❖ 6 Codes can span across 2 packed words ❖ Packing ❖ 5 Thoughput (GB/s) Process 1 unpacked code per iteration ❖ 4 Branch to store output packed word ❖ Unpacking 3 ❖ Process 1 output code per iteration ❖ 2 Branch to load input packed word ❖ 1 0 1 6 11 16 21 26 31 Number of bits

  18. Horizontal Layout Fully packed ❖ LSB MSB No space wasted ❖ 00010101 00110001 11110101 01100110 Codes can span across 2 packed words ❖ 8-bit —> 4-bit Packing ❖ 0001 0101 0011 0001 1111 0101 0110 0110 Process 1 unpacked code per iteration ❖ shuffle Branch to store output packed word ❖ Unpacking ❖ 0001 0101 0101 0011 0011 0001 0001 1111 Process 1 output code per iteration 4-bit —> 8-bit ❖ Branch to load input packed word ❖ 00010101 01010011 00110001 00011111 Can be written in SIMD ! ❖ shift << << << << Based on paper by 00010101 1010011 0 110001 00 11111 000 T. Willhalm et al. mask & & & & @ VLDB 2009 (& improved using 
 00010 000 10100 000 11000 000 11111 000 latest SIMD ISA)

  19. Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖ Scalar SIMD Packing ❖ 60 Unpacking thoughput (GB/s) Process 1 unpacked code per iteration ❖ 50 Branch to store output packed word ❖ up to 7X improvement from SIMD Unpacking 40 ❖ Process 1 output code per iteration ❖ 30 Branch to load input packed word ❖ 20 Can be written in SIMD ! ❖ 10 0 1 6 11 16 21 26 31 Number of bits

  20. Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖ Packing ❖ Process 1 unpacked code per iteration ❖ Branch to store output packed word ❖ Unpacking ❖ Process 1 output code per iteration ❖ Branch to load input packed word ❖ Can be written in SIMD ! ❖ Scanning ❖ Unpack the codes in CPU registers ❖ Evaluate selective predicates and append to bitmap ❖ Must unpack first thus bounded by O(n) ❖

  21. Horizontal Layout Fully packed ❖ select … where column < C … No space wasted ❖ 00010101 00110001 11110101 01100110 Codes can span across 2 packed words ❖ Packing ❖ Process 1 unpacked code per iteration ❖ 00010 000 10100 000 11000 000 11111 000 Branch to store output packed word ❖ Unpacking ❖ compare with C Process 1 output code per iteration ❖ 01100 000 01100 000 01100 000 01100 000 Branch to load input packed word ❖ Can be written in SIMD ! ❖ Scanning ❖ 0000000 0 1111111 1 1111111 1 0000000 0 Unpack the codes in CPU registers ❖ extract Evaluate selective predicates and append to bitmap ❖ 0110 Must unpack first thus bounded by O(n) ❖ Can also be written in SIMD via SIMD unpacking ❖

  22. Horizontal Layout Fully packed ❖ Pack (scalar) No space wasted ❖ Unpack (SIMD) Codes can span across 2 packed words Scan (SIMD) ❖ Packing ❖ 60 Process 1 unpacked code per iteration C1 <= column <= C2 ❖ 50 Branch to store output packed word ❖ Thoughput (GB/s) Unpacking 40 ❖ slower than unpacking Process 1 output code per iteration ❖ 30 Branch to load input packed word ❖ 20 Can be written in SIMD ! ❖ Scanning ❖ 10 Unpack the codes in CPU registers ❖ 0 Evaluate selective predicates and append to bitmap ❖ 1 6 11 16 21 26 31 Must unpack first thus bounded by O(n) ❖ Number of bits Can also be written in SIMD via SIMD unpacking ❖

  23. Horizontal Layout Word aligned ❖ Waste space to get alignment ❖ fully packed Pack b’ = w / (b+1) codes per processor word ❖ 01 10 11 00 Extra bit per word used for scanning ❖ 01 0 10 0 00 11 0 00 0 00 word aligned unused high order bits per word 01 0 10 0 00 unused extra bit per code

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend