Efficient Lightweight Compression Alongside Fast Scans Orestis - PowerPoint PPT Presentation

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross DaMoN 2015, Melbourne, Victoria, Australia

Databases & Compression Process data on disk ❖ Nearly unlimited capacity ❖ Affects query optimization ❖ 0.6 Minimize # of blocks fetched ❖ 0.45 read GB/s Minimize # of random block accesses ❖ 0.3 Compress to improve disk speed ❖ Focused on compression rate since disks are “slow” ❖ 0.15 0 M M ) 5 1 P P 0 R R 2 - 0 0 4 0 0 1 4 2 0 5 7 2 ( D D D D D S H H S

Databases & Compression Process data on disk ❖ Nearly unlimited capacity ❖ Affects query optimization ❖ 60 Minimize # of blocks fetched ❖ 45 read GB/s Minimize # of random block accesses ❖ 30 Compress to improve disk speed ❖ Focused on compression rate since disks are “slow” ❖ 15 0 Process data on RAM ❖ 3 3 4 R R R D D D Always limited capacity ❖ D D D l l l e e e Affects query optimization & query execution n n n ❖ n n n a a a h h h Minimize # of accesses (e.g. column stores & late materialization) c c c ❖ - - - 2 4 4 Minimize # of random (out of CPU cache) accesses (e.g. partitioned join) ❖ Compress to improve RAM speed & avoid disk ❖ Focused on (de-) compression efficiency as RAM is “fast” ❖

    Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ 14 2 * 32 21 15 1 17 min = 14   3 21 max = 21 7 8 * 32 + 8 * b   14 0 b = log (max-min+1) 19 5 = 256 bits = 3 bits per code) = 88 bits 14 0 20 6 17 3

Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression + ❖ A 0 C A 2 original compressed A B 0 data data B C 1 A D 0 n*W bits n* b bits D 3 dictionary with 2 C D distinct values B 1 ( b = logD)

Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖

Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖ DBMS integration ❖ Decompress during execution ❖ In CPU cache (non-integrated) or in registers (integrated) ❖

Lightweight Compression Compression schemes ❖ Entropy compression ❖ Group nearby similar values ❖ e.g. run-length-encoding, frame-of-reference ❖ Symbol compression ❖ Assign a symbol to each distinct value ❖ e.g. dictionary compression ❖ Frequency (symbol) compression ❖ Compress frequent symbols with less bits ❖ e.g. Huffman coding (slow), multiple dictionaries (fast) ❖ DBMS integration ❖ Decompress during execution ❖ In CPU cache (non-integrated) or in registers (integrated) ❖ Process compressed data without decompressing ❖

Bit Packing Definition ❖ Input code width is hardware-supported ❖ 8-bit, 16-bit, 32-bit, 64-bit ❖ Output code width b must be (almost) constant ❖ Either constant across the entire input ❖ Or constant for the next group of items (e.g. frame-of-reference) ❖ A 0 0 dictionary C 2 2 A A 0 0 B B 1 1 bit A C 0 0 packing D D mapping   3 3 original C 2 (not mat-   2 data B 1 1 erialized)

Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 00010101 00110001 11110101 01100110 00100100

Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ Vertical bit packing ❖ Bits of codes are interleaved ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 4 0111 0011 0101 1001 0001 0110 1100 0001 1000 0110

Bit Packing Layouts ❖ Horizontal bit packing ❖ Bits per code are contiguous ❖ Vertical bit packing ❖ Bits of codes are interleaved ❖ 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 4 0111 0011 0101 1001 0001 0110 1100 0001 1000 0110 00010 000 10100 000 11000 000 11111 000 01010 000 11001 000 10001 000 00100 000 b = 5 k = 8 01110110 00111100 01010001 10011000 00010110

Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖

Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖ Horizontal layouts ❖ Fully packed ❖ Fast unpacking & scanning ❖ Word aligned ❖ Faster scanning ❖

Outline Operations ❖ Packing ❖ Unpacking ❖ Scanning ❖ Horizontal layouts ❖ Fully packed ❖ Fast unpacking & scanning ❖ Word aligned ❖ Faster scanning ❖ Vertical layout ❖ Known traits ❖ Fastest scanning ❖ New traits ❖ Fast packing & unpacking ❖

Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖

Horizontal Layout Fully packed ❖ Pack Unpack No space wasted ❖ 6 Codes can span across 2 packed words ❖ Packing ❖ 5 Thoughput (GB/s) Process 1 unpacked code per iteration ❖ 4 Branch to store output packed word ❖ Unpacking 3 ❖ Process 1 output code per iteration ❖ 2 Branch to load input packed word ❖ 1 0 1 6 11 16 21 26 31 Number of bits

Horizontal Layout Fully packed ❖ LSB MSB No space wasted ❖ 00010101 00110001 11110101 01100110 Codes can span across 2 packed words ❖ 8-bit —> 4-bit Packing ❖ 0001 0101 0011 0001 1111 0101 0110 0110 Process 1 unpacked code per iteration ❖ shuffle Branch to store output packed word ❖ Unpacking ❖ 0001 0101 0101 0011 0011 0001 0001 1111 Process 1 output code per iteration 4-bit —> 8-bit ❖ Branch to load input packed word ❖ 00010101 01010011 00110001 00011111 Can be written in SIMD ! ❖ shift << << << << Based on paper by 00010101 1010011 0 110001 00 11111 000 T. Willhalm et al. mask & & & & @ VLDB 2009 (& improved using   00010 000 10100 000 11000 000 11111 000 latest SIMD ISA)

Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖ Scalar SIMD Packing ❖ 60 Unpacking thoughput (GB/s) Process 1 unpacked code per iteration ❖ 50 Branch to store output packed word ❖ up to 7X improvement from SIMD Unpacking 40 ❖ Process 1 output code per iteration ❖ 30 Branch to load input packed word ❖ 20 Can be written in SIMD ! ❖ 10 0 1 6 11 16 21 26 31 Number of bits

Horizontal Layout Fully packed ❖ No space wasted ❖ Codes can span across 2 packed words ❖ Packing ❖ Process 1 unpacked code per iteration ❖ Branch to store output packed word ❖ Unpacking ❖ Process 1 output code per iteration ❖ Branch to load input packed word ❖ Can be written in SIMD ! ❖ Scanning ❖ Unpack the codes in CPU registers ❖ Evaluate selective predicates and append to bitmap ❖ Must unpack first thus bounded by O(n) ❖

Horizontal Layout Fully packed ❖ select … where column < C … No space wasted ❖ 00010101 00110001 11110101 01100110 Codes can span across 2 packed words ❖ Packing ❖ Process 1 unpacked code per iteration ❖ 00010 000 10100 000 11000 000 11111 000 Branch to store output packed word ❖ Unpacking ❖ compare with C Process 1 output code per iteration ❖ 01100 000 01100 000 01100 000 01100 000 Branch to load input packed word ❖ Can be written in SIMD ! ❖ Scanning ❖ 0000000 0 1111111 1 1111111 1 0000000 0 Unpack the codes in CPU registers ❖ extract Evaluate selective predicates and append to bitmap ❖ 0110 Must unpack first thus bounded by O(n) ❖ Can also be written in SIMD via SIMD unpacking ❖

Horizontal Layout Fully packed ❖ Pack (scalar) No space wasted ❖ Unpack (SIMD) Codes can span across 2 packed words Scan (SIMD) ❖ Packing ❖ 60 Process 1 unpacked code per iteration C1 <= column <= C2 ❖ 50 Branch to store output packed word ❖ Thoughput (GB/s) Unpacking 40 ❖ slower than unpacking Process 1 output code per iteration ❖ 30 Branch to load input packed word ❖ 20 Can be written in SIMD ! ❖ Scanning ❖ 10 Unpack the codes in CPU registers ❖ 0 Evaluate selective predicates and append to bitmap ❖ 1 6 11 16 21 26 31 Must unpack first thus bounded by O(n) ❖ Number of bits Can also be written in SIMD via SIMD unpacking ❖

Horizontal Layout Word aligned ❖ Waste space to get alignment ❖ fully packed Pack b’ = w / (b+1) codes per processor word ❖ 01 10 11 00 Extra bit per word used for scanning ❖ 01 0 10 0 00 11 0 00 0 00 word aligned unused high order bits per word 01 0 10 0 00 unused extra bit per code

Efficient Lightweight Compression Alongside Fast Scans Orestis - PowerPoint PPT Presentation

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross DaMoN 2015, Melbourne, Victoria, Australia Databases & Compression Process data on disk Nearly unlimited capacity Affects query

Strobe delay scans in STcontrol Jrn Grosse-Knetter Intro: strobe delay scans (1) See talk

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Intra-Pulse Beam-Beam Scans at the NLC IP Steve Smith SLAC Nanobeams 2002 Beam-Beam Scans

FUSED TABLE SCANS: COMBINING AVX-512 AND JIT Markus Dreseler, Jan Kossmann, Johannes Frohnhofen,

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Most of the slides are borrowed from the authors original presentation. original

Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and

with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software

A Little Confusing Without [a block digest], one must query the offset digest with all

Fast Text Compression with Neural Networks Matthew Mahoney Florida Institute of Technology

Practical Near-Collisions and Collisions on Reduced-Round ECHO-256 Compression Function Jrmy

On Variable Dependencies and Compressed Pattern Databases Malte Helmert 1 Nathan Sturtevant 2