Lightweight Compression Methods Achieving 120GBps and More Piotr - PowerPoint PPT Presentation

Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire d’Informatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight Compression Methods Achieving 120GBps and More 1/25

K. Kaczmarski and P. Przymus , Fixed Length Lightweight Compression for GPU Revised , Journal of Parallel and Distributed Computing, 2017. A lightweight compression library for GPU. github.com/mis-wut/feathergpu MIT -licenesed. This project was partly funded by National Science Centre, decision DEC-2012/07/D/ST6/02483. Team Krzysztof Kaczmarski Warsaw University of Technology, Poland Piotr Przymus Aix-Marseille University, France Nicolaus Copernicus University in Toruń, Poland. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 2/25

Lightweight compression on GPU – motivation Lightweight compression algorithms favours compression and decompression speed over compression ratio. Improved data transfer: Disk ↔ RAM ↔ GPU. GPU ↔ GPU: exchange of already compressed data, compress → transfer → decompress. Lower memory footprint: Less disk space used. Less RAM used. Less GPU memory used. Improved internal memory access: In some cases improved internal GPU memory access. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Figure: Compressed data (each byte encodes two words of length 4 bits.) P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Figure: Compressed data (each byte encodes two words of length 4 bits.) compression ratio (CR) = Uncompressed size = 2, Compressed size P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

Fixed length compression Fixed length (FL) compression: easy to implement, easy to achieve high data throughput. Many applications: Database compression: Columns, Indexes, Timeseries compression, Graph compression, etc. Many variants: Patched FL, Adaptive FL, DELTA-* P. Przymus Lightweight Compression Methods Achieving 120GBps and More 5/25

Fixed length compression on GPU Performance over flexibility ( Fang et al. 2010 ) High performance but highly simplified version of algorithm. Words are mapped to full bytes e.g. 4 bits word will be mapped to 1 byte. Uses map primitive. Coalesced reads and writes: YES . Direct memory access: YES . Flexibility over performance ( Nvbio and Kaczmarski, Przymus 2012-2017 ) No simplifications at the cost of lower performance. Supports all possible bit encodings. Uses allgather or gather primitive. Coalesced reads and writes: NO . Direct memory access: YES . P. Przymus Lightweight Compression Methods Achieving 120GBps and More 6/25

Fixed length compression on GPU 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 0 3233343536373839404142434445464748495051525354555657585960616263 1 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... 1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055 31 Figure: Read pattern: GPU version of FL algorithm 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 6465666768697071727374757677787980818283848586878889909192939495 Figure: Write pattern: GPU version of FL algorithm P. Przymus Lightweight Compression Methods Achieving 120GBps and More 7/25

Fixed length on GPU (C+D, GTX Titan Black) 120 100 Bandwidth GB/s 80 60 40 20 0 0 200 400 600 800 1000 Data Size MB int max int min long max long min int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 8/25

Fixed length on GPU (1 GB of data, GTX Titan Black) 300 250 Compr. GB/s 200 150 100 50 Bit Encoding 8 16 24 32 40 48 56 63 50 Decompr. GB/s 100 150 200 250 300 int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 9/25

Can we do better? Aligned Fixed Length ( AFL ) algorithm. The FL algorithm is optimized for CPU memory access scheme. We can do better with GPU friendly memory organisation scheme. Features No simplifications, high performance on GPU . Still works quite well on CPU , but loses some cache hits benefits. Supports all possible bit encodings. Uses allgather or gather primitive. Coalesced reads and writes: YES . Direct memory access: YES . P. Przymus Lightweight Compression Methods Achieving 120GBps and More 10/25

Aligned FL on GPU 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055 Figure: Read pattern: GPU version of Aligned FL algorithm 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 6465666768697071727374757677787980818283848586878889909192939495 Figure: Write pattern: GPU version of Aligned FL algorithm P. Przymus Lightweight Compression Methods Achieving 120GBps and More 11/25

Aligned FL on GPU (C+D, GTX Titan Black) 120 100 Bandwidth GB/s 80 60 40 20 0 0 200 400 600 800 1000 Data Size MB int max int min long max long min int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 12/25

Lightweight Compression Methods Achieving 120GBps and More Piotr - PowerPoint PPT Presentation

Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire dInformatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

The National Adoption Service Suzanne Griffiths, Director of Achieving More Together Achieving

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Deep Compression and EIE: Deep Neural Network Model Compression and Efficient Inference

MPEG: A Video Compression Standard for Multimedia Applications V clav Hlav CTU Prague,

Seminar Paper Presentation & PowerPoint Guidelines 2017 These guidelines are provided to help

Study of Scanning Electron Microscope images The relationship between the structure of insects

Create a Narrated Video from a PowerPoint 2013 Presentation This article will briefly describe

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu

gzip, tar Purpose file archiving -compressing multiple files into one smaller file

Single Letter Formulas for Quantized Compressed Sensing with Gaussian Codebooks Alon Kipnis

Lightweight Compression Methods Achieving 120GBps and More Piotr - PowerPoint PPT Presentation

Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire dInformatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

The National Adoption Service Suzanne Griffiths, Director of Achieving More Together Achieving

Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

The lightweight beam for Heavyweight applications The impact of this lightweight steel beam will

Its time to Think Lightweight! www.thinklightweight.com TO D A Y S TO P IC S 1.

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Deep Compression and EIE: Deep Neural Network Model Compression and Efficient Inference

MPEG: A Video Compression Standard for Multimedia Applications V clav Hlav CTU Prague,

Seminar Paper Presentation &amp; PowerPoint Guidelines 2017 These guidelines are provided to help

Study of Scanning Electron Microscope images The relationship between the structure of insects

Create a Narrated Video from a PowerPoint 2013 Presentation This article will briefly describe

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu

gzip, tar Purpose file archiving -compressing multiple files into one smaller file

Single Letter Formulas for Quantized Compressed Sensing with Gaussian Codebooks Alon Kipnis

Seminar Paper Presentation & PowerPoint Guidelines 2017 These guidelines are provided to help