Non-Linear Compression: Gzip Me Not! Michael F. Nowlan Bryan Ford - - PowerPoint PPT Presentation

non linear compression gzip me not
SMART_READER_LITE
LIVE PREVIEW

Non-Linear Compression: Gzip Me Not! Michael F. Nowlan Bryan Ford - - PowerPoint PPT Presentation

Non-Linear Compression: Gzip Me Not! Michael F. Nowlan Bryan Ford Ramakrishna Gummadi Decentralized and Distributed Systems Group Department of Computer Science Yale University 4 th USENIX Workshop on Hot Topics in Storage and File Systems


slide-1
SLIDE 1

Non-Linear Compression: Gzip Me Not!

Michael F. Nowlan Bryan Ford Ramakrishna Gummadi

Decentralized and Distributed Systems Group Department of Computer Science Yale University

4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage '12) June 13 – 14, Boston, MA

slide-2
SLIDE 2

DeDiS Group, Yale CS HotStorage '12, Boston, MA 2

Linear Compression

The popular compression schemes (i.e., gzip, bzip2) are linear.

t S0

comp

C1

B1 S1

comp

C2

B2 S2

slide-3
SLIDE 3

DeDiS Group, Yale CS HotStorage '12, Boston, MA 3

Linear Compression

Compression state accumulates sequentially, with each successive block of data that is compressed.

t S0

comp

C1

B1 S1

comp

C2

B2 S2

Any given state depends on all previous compression states.

slide-4
SLIDE 4

DeDiS Group, Yale CS HotStorage '12, Boston, MA 4

Linear Compression

This dependency chain is restrictive.

t S0

dcomp

S1

dcomp

S2 B1 B2

C1 C2

slide-5
SLIDE 5

DeDiS Group, Yale CS HotStorage '12, Boston, MA 5

Linear Compression

This dependency chain is restrictive. It forces decompression to proceed in the same order as compression (i.e., prohibits random-access).

t S0

dcomp

S1

dcomp

S2 B1 B2

C1 C2

slide-6
SLIDE 6

DeDiS Group, Yale CS HotStorage '12, Boston, MA 6

Linear Compression

In summary: Popular compression schemes transform compression state linearly.

S0

comp

C1

B1 S1

comp

C2

B2 S2

slide-7
SLIDE 7

DeDiS Group, Yale CS HotStorage '12, Boston, MA 7

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-8
SLIDE 8

DeDiS Group, Yale CS HotStorage '12, Boston, MA 8

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-9
SLIDE 9

DeDiS Group, Yale CS HotStorage '12, Boston, MA 9

B2

Compression in Storage Systems

Storage systems that use compression generally perform: 1) block compression, and/or 2) delta-encoding Examples include:

  • De-duplicating file systems
  • Distributed source control management
  • Collaborative editing systems

B1 B2 Data Source

slide-10
SLIDE 10

DeDiS Group, Yale CS HotStorage '12, Boston, MA 10

Storage Requirements

Data blocks may be related, or not, and they may be available at different times (e.g., versions of a file), or all at once.

Related Unrelated At once Over time

Inter-Block Content Availability

slide-11
SLIDE 11

DeDiS Group, Yale CS HotStorage '12, Boston, MA 11

Storage Requirements

Related Unrelated At once Linear Over time Linear

Inter-Block Content Availability Data blocks may be related, or not, and they may be available at different times (e.g., versions of a file), or all at once.

slide-12
SLIDE 12

DeDiS Group, Yale CS HotStorage '12, Boston, MA 12

Storage Requirements

Data blocks may be related, or not, and they may be available at different times (e.g., versions of a file), or all at once.

Related Unrelated At once Linear ??? Over time ??? Linear

Inter-Block Content Availability

slide-13
SLIDE 13

DeDiS Group, Yale CS HotStorage '12, Boston, MA 13

Linear Limitations

Related Unrelated At once ??? Over time

Random Access

slide-14
SLIDE 14

DeDiS Group, Yale CS HotStorage '12, Boston, MA 14

Linear Limitations

Resetting compression state between blocks enables random access... but significantly reduces the compression ratio for small blocks.

slide-15
SLIDE 15

DeDiS Group, Yale CS HotStorage '12, Boston, MA 15

Linear Limitations

Reuse Compression State

Related Unrelated At once Over time ???

No abstraction for doing this!

slide-16
SLIDE 16

DeDiS Group, Yale CS HotStorage '12, Boston, MA 16

Linear Limitations

Linear compression forces an all-or-nothing choice (especially for blocks < 1KB) of: (Random-access) vs. (Compression ratio) and no notion of copying, or reusing, compression state.

slide-17
SLIDE 17

DeDiS Group, Yale CS HotStorage '12, Boston, MA 17

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-18
SLIDE 18

DeDiS Group, Yale CS HotStorage '12, Boston, MA 18

NLC API

Linear Compression API Non-Linear Compression API

  • State initialize();
  • int compress(State, void*, int);
  • int decompress(State, void*, int);
  • State fork(State);
slide-19
SLIDE 19

DeDiS Group, Yale CS HotStorage '12, Boston, MA 19

NLC Fork

Foo.c

v.1 v.2a v.2b

  • Small delta w/ Content

dependency

  • Small delta w/

Content dependency

  • Independent of v.2a

Alice Bob

slide-20
SLIDE 20

DeDiS Group, Yale CS HotStorage '12, Boston, MA 20

NLC Fork

Intuition: Fork copies compression state to allow independent compression, or decompression, using previous compression state.

S2a S2b S1 S0

Fork Compress v.1 Compress Independently

slide-21
SLIDE 21

DeDiS Group, Yale CS HotStorage '12, Boston, MA 21

NLC API

Linear Compression API Non-Linear Compression API

  • State initialize();
  • int compress(State, void*, int);
  • int decompress(State, void*, int);
  • State fork(State);
  • State merge(State, State);
slide-22
SLIDE 22

DeDiS Group, Yale CS HotStorage '12, Boston, MA 22

NLC Merge

Foo.c

v.1 v.2a v.2b … int func_alice() { … } … int func_bob() { … } v.3

Alice Bob

slide-23
SLIDE 23

DeDiS Group, Yale CS HotStorage '12, Boston, MA 23

NLC Merge

Intuition: Merge combines compression state to allow future compression to use all acquired state between two nodes.

S2a S2b

Compress Independently

S3a S3b S3

Merge

slide-24
SLIDE 24

DeDiS Group, Yale CS HotStorage '12, Boston, MA 24

NLC API

Linear Compression API Non-Linear Compression API

  • State initialize();
  • int compress(State, void*, int);
  • int decompress(State, void*, int);
  • State fork(State);
  • State merge(State, State);
slide-25
SLIDE 25

DeDiS Group, Yale CS HotStorage '12, Boston, MA 25

NLC Architecture

  • NLC module provided by the OS.
  • Single abstraction for all outstanding state nodes.
  • Independent of any specific compression scheme.
  • Supports Huffman, Arithmetic, LZW, LZ77, etc.
  • No expectation of random access within a block.
  • Normal linear compression within blocks.
  • Application can use different paths through the DAG for logically distinct

“streams” of data.

  • Application keeps compressor in-sync with decompressor, but Future

Work discusses potential NLC “naming”, or “identification”, schemes.

slide-26
SLIDE 26

DeDiS Group, Yale CS HotStorage '12, Boston, MA 26

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-27
SLIDE 27

DeDiS Group, Yale CS HotStorage '12, Boston, MA 27

NLC – Parallel Compression

S0 S2 S1 S3 S5 S4 S6 Legend: = Fork = Merge = Compress

slide-28
SLIDE 28

DeDiS Group, Yale CS HotStorage '12, Boston, MA 28

NLC – Synchronized Streams

S0 S1 S2 S5 S3 S4 Legend: = Fork = Merge = Compress

slide-29
SLIDE 29

DeDiS Group, Yale CS HotStorage '12, Boston, MA 29

NLC – Windowed Compression

S0 S2 S1 S3 S2' S1' S3' Base state SCUM Cumulative state S4 S5 S6 SCUM SCUM For any given state, x, and current state, c, x is merged into the Cumulative State when: x <= (c - w) Window, w, = 3. Legend: = Fork = Merge = Compress

slide-30
SLIDE 30

DeDiS Group, Yale CS HotStorage '12, Boston, MA 30

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-31
SLIDE 31

DeDiS Group, Yale CS HotStorage '12, Boston, MA 31

Prototype Implementation

  • We have an Adaptive Huffman compressor in C++
  • Proof-of-concept; Not meant to compete head-to-head with

gzip or other compressors.

  • Order of magnitude slower
  • Fork and Merge are very expensive
  • Compression ratios approach optimal

depending on application fork/merge strategy.

  • Merge allows eventual usage of all

compression state.

slide-32
SLIDE 32

DeDiS Group, Yale CS HotStorage '12, Boston, MA 32

Preliminary Results

Block size = 128 bytes Window size = 3 blocks The cost for “unordered decompression” is paid in the first 10 KB.

slide-33
SLIDE 33

DeDiS Group, Yale CS HotStorage '12, Boston, MA 33

Outline

  • Linear Compression
  • Compression in Storage Systems
  • Storage Requirements
  • Linear Limitations
  • Non-Linear Compression
  • Architecture and API
  • Example Applications
  • Prototype Implementation
  • Preliminary Results
  • Future Work
slide-34
SLIDE 34

DeDiS Group, Yale CS HotStorage '12, Boston, MA 34

Future Work – Challenges

  • Merge, Merge, Merge
  • It's computationally expensive and slow.
  • Is it even needed? Are approximation

heuristics good enough?

  • Fork/Merge behaviors
  • Should we use Fork and Merge sparingly?
  • Block size vs. Memory overhead
  • As block sizes decrease, the compression
  • verhead ratio increases.
  • State node “naming” or “identification”
  • NLC module should do it for the application.
slide-35
SLIDE 35

DeDiS Group, Yale CS HotStorage '12, Boston, MA 35

Conclusion

  • Data Compression is used everywhere.

However, the API is one-size-fits-all.

  • Non-Linear Compression aims to be a superset of the

traditional compression API by offering Fork and Merge.

  • Fork and Merge allow compression state to follow the

data's natural logical dependencies.

  • This provides localized compression and unordered

decompression in many instances.

slide-36
SLIDE 36

DeDiS Group, Yale CS HotStorage '12, Boston, MA 36

Thanks to Jana Iyengar, Avi Silberschatz, Michael Fischer, Rob Ross, the anonymous reviewers... And all of you for listening! Questions?

slide-37
SLIDE 37

DeDiS Group, Yale CS HotStorage '12, Boston, MA 37

Compression in Storage Modern Requirements Non-Linear Compression Linear Limitations Architecture API

Outline

Prototype Implementation Future Work

slide-38
SLIDE 38

DeDiS Group, Yale CS HotStorage '12, Boston, MA 38

Non-Linear Compression

S2 S3 S1 S4 S5 S6

slide-39
SLIDE 39

DeDiS Group, Yale CS HotStorage '12, Boston, MA 39

Non-Linear Compression

S2 S3 S1 S4 S5 S6

slide-40
SLIDE 40

DeDiS Group, Yale CS HotStorage '12, Boston, MA 40

Non-Linear Compression

S2 S3 S1 S4 S5 S6