Joint Source-Channel LZ'77 Coding Stefano Lonardi University of - - PDF document

joint source channel lz 77 coding
SMART_READER_LITE
LIVE PREVIEW

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of - - PDF document

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of California, Riverside Wojciech Szpankowski Purdue University, West Lafayette Source vs. Channel coding Source coding: represent the source information with the minimum of


slide-1
SLIDE 1

Joint Source-Channel LZ'77 Coding

Stefano Lonardi

University of California, Riverside

Wojciech Szpankowski

Purdue University, West Lafayette

Source vs. Channel coding

  • Source coding: represent the

source information with the minimum of symbols

  • Channel coding: represent the

source information in a manner that minimizes the error probability in decoding

slide-2
SLIDE 2

Problem definition

  • How to achieve joint source and

channel coding in LZ’77 (by adding error resiliency)

–without significantly degrading the compression performance, –and keeping backward compatibility with the original LZ’77?

T.gz T.gz

Encoding

“Dear Bob, How are you doing today? …” “Dear Bob, How are you doing today? …”

LZRS’77 LZRS’77

slide-3
SLIDE 3

“Dear Bob, How are you doing today? ...” “Dear Bob, How are you doing today? ...” T.gz T.gz “Dear Bob, How are you doing today? …” “Dear Bob, How are you doing today? …” T.gz T.gz

Decoding (no errors)

LZ’77 LZ’77 LZRS’77 LZRS’77

?

Corrupted T.gz Corrupted T.gz “Dear Bob, How are you doing today? …” “Dear Bob, How are you doing today? …” Corrupted T.gz Corrupted T.gz

Decoding (with errors)

LZ’77 LZ’77 LZRS’77 LZRS’77

slide-4
SLIDE 4

Roadmap

  • We will show how to obtain extra

redundant bits from LZ’77

  • We will show how to achieve error

resiliency in LZ’77

history current position

LZ’77: which of these pointers do we choose?

slide-5
SLIDE 5

history current position

By choosing one of these pointers we are recovering two extra redundant bits. Note that we are not changing LZ’77

00 01 10 11

Extra bits recovering

  • Definition: a LZ’77 phrase has

multiplicity q if has exactly q matches in the history

  • Given a phrase with multiplicity q, we

can recover bits

2

log q    

slide-6
SLIDE 6

Average case analysis

  • Theorem: Let Qn be the random

variable associated with the multiplicity q of a phrase in a string of length n. For a Markov source E[Qn]=O(1) as n? 8

Average phrase multiplicity

average phrase multiplicity (news)

1 2 3 4 5 6 7 50000 100000 150000 200000 position in the text average phrase multiplicity

average phrase multiplicity (paper2)

1 2 3 4 5 6 7 8 9 10000 20000 30000 40000 50000 60000 70000 position in the text average phrase multiplicity

slide-7
SLIDE 7

Recent results

  • Theorem: For memoryless sources

where H is the entropy of the source, and p is the probability of generating a “0”

[ ] [ ]

1 small fluctuations (1 ) (1 )

n k k n

E Q H p p p p P Q k kH = + − + − = =

Number of bits recovered

mito

1000 2000 3000 4000 5000 6000 10000 20000 30000 40000 50000 60000 70000 80000 90000 position bits extracted

paper2

2000 4000 6000 8000 10000 12000 14000 16000 10000 20000 30000 40000 50000 60000 70000 80000 90000 position bits extracted

progc

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 5000 10000 15000 20000 25000 30000 35000 40000 45000 position bits extracted

news

5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 100000 150000 200000 position bits extracted

Remark Remark: more bits can be recovered by relaxing the greediness : more bits can be recovered by relaxing the greediness

slide-8
SLIDE 8

Reed Solomon codes

  • RS codes are block-based error

correcting codes (BCH family)

  • RS(a,b) code

– a=2s-1, where s is the datum size – has (a-b) “parity” bits – can correct up to (a-b)/2 errors

  • We used RS(255,255-2e), which can

correct up to e errors

LZRS’77 encoder (off-line)

  • compress the file with LZ’77
  • break the compressed file in blocks

B1,…, Bm of size 255-2e

  • for i←m downto 2

– encode with RS(255,255-2e) block Bi – embed the extra 2e parity bits in the pointers of block Bi-1

  • encode with RS(255,255-2e) block B1
  • store the extra parity bits at the beginning of

the file

slide-9
SLIDE 9

LZRS’77 encoding

  • ptional

LZRS’77 decoder (on-line)

  • (assume RSi are the 2e parity bits for Bi)
  • decode and correct block B1+RS1
  • decompress block B1 and recover RS2
  • for i←2 to m

– decode and correct block Bi+RSi – decompress block Bi and recover RSi+1

slide-10
SLIDE 10

10

Experiments: gzip

  • gzip issues pointers in a sliding

window of 32Kbytes (typically)

  • The length of phrases is represented by

8 bits (3-258)

  • Strings smaller than 3 symbols are

encoded as literals

gzip

  • gzip always chooses the most “recent”
  • ccurrence of the longest prefix

“…the hash chains are searched starting from the most recent strings, to favor small distances and thus take advantage of the Huffman coding…”

slide-11
SLIDE 11

11

“Hacking” gzip

  • We modified gzip-1.2.4 to evaluate

the potential degradation of compression performance due to changing the rule of choosing always the most “recent” occurrence

  • As a preliminary experiment, we simply

chose one pointer at random

gzip vs. gzipS

slide-12
SLIDE 12

12

Error correction (simulation)

  • We chose e=1, e=2 and b=10, b=100
  • For b blocks, we injected 1,…,b

uniformly distributed errors

  • We measured the number of times that

the file was decoded correctly (out of a few hundreds simulations)

probability of the file incorrectly decoded (e=2, b=100)

0.2 0.4 0.6 0.8 1 1.2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 number of injected errors probability

Error-correction

slide-13
SLIDE 13

13

Findings

  • Method to recover extra redundant bits

from LZ’77

  • Extra bits allow to incorporate error

resiliency in LZ’77

– backward-compatible (deployment without disrupting service) – compression degradation due to the extra bits is almost negligible