gzip compression using altera opencl
play

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University - PowerPoint PPT Presentation

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh Gzip Widely-used lossless compression program Gzip = LZ77 + Huffman Big data needs fast compression Gigabyte-per-second


  1. Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh

  2. Gzip  Widely-used lossless compression program  Gzip = LZ77 + Huffman  Big data needs fast compression Gigabyte-per-second  Lower disk space in data centers  Less power on communication networks 2

  3. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 3

  4. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 4

  5. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 5

  6. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 6

  7. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 7

  8. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 8

  9. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length 2. Match offset 3. Replace with a reference to previous occurrence 9

  10. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 2 2. Match offset 3. Replace with a reference to previous occurrence 10

  11. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 3 2. Match offset 3. Replace with a reference to previous occurrence 11

  12. LZ77 Compression Example  This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset 3. Replace with a reference to previous occurrence 12

  13. LZ77 Compression Example  This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset = 20 3. Replace with a reference to previous occurrence 13

  14. LZ77 Compression Example  This sentence is an easy @(8,20) to compress. 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 14

  15. LZ77 Compression Example  This sentence is an easy sentence to compress.  This sentence is an easy @(8,20) to compress. Saved 5 bytes! 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 15

  16. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 16

  17. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 1 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 17

  18. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 2 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 1 Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 18

  19. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 3 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 2 Compiler { for(i=1..size) { int x = input[i]; 1 int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 19

  20. FPGAs can be VERY Custom Host ARM Host on FPGA chip CPU IO Channels IO Channels PCIe FPGA Accelerator Load x Load y Store z RDL? Different memory types QDR? DDRx Memory

  21. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 21

  22. 1. Shift In New Data Current Window Input from DDR memory 22

  23. 1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary 23

  24. 1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary Use text in our example, but can be anything VEC = 4 24

  25. 1. Shift In New Data Current Window e.g. t e x t sample_text Cycle boundary 25

  26. 1. Shift In New Data Current Window e.g. t e x t s a m p le_text Cycle boundary 26

  27. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 27

  28. 2. Dictionary Lookup/Update Dictionary Current Window: t t e x t e x t e x t x t t s a m p s a m s s a 0 1. Compute hash Dictionary 2. Look for match 1 in 4 dictionaries 3. Update dictionaries Dictionary 2 Dictionaries buffer the text that we have already processed, e.g.: Dictionary 3 28

  29. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p 0 t e x t Hash e x t s t e x t Dictionary x t s a 1 t s a m t e x l Dictionary 2 t e e n Dictionary 3 29

  30. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 t s a m t e x l Dictionary e e p s 2 t e e n Dictionary e n t e 3 30

  31. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 x y l o t s a m t e x l Dictionary e e p s 2 x e l y t e e n Dictionary e n t e 3 x i r t 31

  32. 2. Dictionary Lookup/Update Possile matches from history (dictionaries) t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t t a n _ e x t s Hash t e x t Dictionary e a r s x t s a 1 x y l o t s a m t a m e t e x l Dictionary e e p s 2 x e l y t e a l t e e n Dictionary e n t e 3 x i r t t e e n 32

  33. 2. Dictionary Lookup/Update Dictionary Current Window: t e x t s a m p 0 t e x t e x t s Hash Dictionary x t s a 1 t s a m Dictionary 2 Dictionary 3 33

  34. 2. Dictionary Lookup/Update t e e n RD03 RD01 t a n _ Dictionary Current Window: t e x t s a m p 0 RD02 RD00 t e x t t e x l W0 t e x t RD13 RD11 Dictionary 1 RD12 RD10 W1 Generate exactly the number of read/write ports that we need and the width RD23 RD21 Dictionary 2 RD22 RD20 256 read ports, 16 write ports – 128 bits W2 RD33 RD31 Dictionary 3 RD32 RD30 W3 34

  35. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 35

  36. 3. Match Search & Filtering Current Windows: Comparison Windows: t e x t t e e n t e x l t e x t t a n _ e x t s e n t e e e p s e a r s e a t e x t s a x i r t x e l y x y l o x a n t t s a m t e e n t e a l t a m e t a n _ A set of candidate matches The substrings for each incoming substring Compare current window against each of its 4 compare windows 36

  37. 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t We have another 3 of those Match Length: 2 3 4 1 Compare each byte 37

  38. 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t Match Length: 2 3 4 1 Match Reduction Best Length: 4 38

  39. 3. Match Search & Filtering 39

  40. 3. Match Search & Filtering 40

  41. 3. Match Search & Filtering 41

  42. 3. Match Search & Filtering Typical C-code Fixed loop bounds – compiler can unroll loop 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend