Information Theory Lecture 3 Lossless source coding algorithms: - PDF document

Information Theory Lecture 3 • Lossless source coding algorithms: • Huffman: CT5.6–8 • Shannon-Fano-Elias: CT5.9 • Arithmetic: CT13.3 • Lempel-Ziv: CT13.4–5 Mikael Skoglund, Information Theory 1/21 Zero-Error Source Coding • Huffman codes: algorithm & optimality • Shannon-Fano-Elias codes • connection to Shannon(-Fano) codes, Fano codes, and per symbol arithmetic coding • within 2(1) symbol of the entropy • Arithmetic codes • adaptable, probabilistic model • within 2 bits of the entropy per sequence! • Lempel-Ziv codes • “basic” and “modified” LZ-algorithm • sketch of asymptotic optimality Mikael Skoglund, Information Theory 2/21

Example: Encoding a Markov Source • 2-state Markov chain P 01 = P 10 = 1 ⇒ µ 0 = µ 1 = 1 3 = 2 • Sample sequence s = 1000011010001111 = 1 0 4 1 2 0 1 0 3 1 4 • Probabilities of 2 -bit symbols p (00) p (01) p (10) p (11) L ≥ H 1 1 3 1 sample ≈ 1 . 9056 16 4 8 8 4 1 1 1 1 model ≈ 1 . 9183 16 3 6 6 3 • Entropy rate H ( S ) = h ( 1 3 ) ≈ 0 . 9183 = ⇒ L ≥ ⌈ 14 . 6928 ⌉ = 15 Mikael Skoglund, Information Theory 3/21 Huffman Coding Algorithm • Greedy bottom-up procedure • Builds a complete D -ary codetree by combining the D symbols of lowest probabilities ⇒ need |X| = 1 mod , D − 1 ⇒ add dummy symbols of 0 probability if necessary • Gives prefix code • Probabilities of source symbols need to be available ⇒ coding long strings (“super symbols”) becomes complex Mikael Skoglund, Information Theory 4/21

Huffman Code Examples sample-based model-based 11: 1 11: 1 ❅ ❅ 4 3 ❅ ❅ � � � � 10: 3 10: 3 00: 1 11: 1 ❅ ❅ ❅ ❅ 8 8 3 3 ❅ ❅ ❅ ❅ 11: 1 00: 1 � ❅ � � ❅ � 4 3 ❅ ❅ 01: 1 01: 1 10: 1 10: 1 � � � � ❅ ❅ ❅ ❅ 8 8 6 6 ❅ ❅ ❅ ❅ � � � � 3 3 1 1 8 8 3 3 � � � � 00: 1 00: 1 01: 1 01: 1 4 4 6 6 16 , | 1000001110000101 | = 16 16 , | 001010000010010111 | = 18 Mikael Skoglund, Information Theory 5/21 Optimal Symbol Codes • An optimal binary prefix code must satisfy p ( x ) ≤ p ( y ) = ⇒ l ( x ) ≥ l ( y ) • there are at least two codewords of maximal length • the longest codewords can be relabeled such that the two least probable symbols differ only in their last bit • Huffman codes are optimal prefix codes (why?) • We know that L = H ( X ) ⇐ ⇒ l ( x ) = − log p ( x ) = ⇒ Huffman will give L = H ( X ) when − log p ( x ) are integers (a dyadic distribution) Mikael Skoglund, Information Theory 6/21

Cumulative Distributions and Rounding • X ∈ X = { 1 , 2 , . . . , m } ; p ( x ) = Pr( X = x ) > 0 • Cumulative distribution function (cdf) 1 � p ( x ′ ) , F ( x ) = x ∈ [0 , m ] x ′ ≤ x p ( x ) F ( x ) • Modified cdf 0 x p ( x ′ ) + 1 ¯ � F ( x ) = 2 p ( x ) , x ∈ X x ′ <x • only for x ∈ X • ¯ F ( x ) known = ⇒ x known! Mikael Skoglund, Information Theory 7/21 • We know that l ( x ) ≈ − log p ( x ) gives a good code • Use the binary expansion of ¯ F ( x ) as code for x ; rounding needed • round to ≈ − log p ( x ) bits • Rounding: [0 , 1) → { 0 , 1 } k • Use base 2 fractions ∞ � f i 2 − i f ∈ [0 , 1) = ⇒ f = i =1 • Take the first k bits ⌊ f ⌋ k = f 1 f 2 · · · f k ∈ { 0 , 1 } k � 2 • For example, 2 � 3 = 0 . 10101010 · · · = 0 . 10 = ⇒ 5 = 10101 3 Mikael Skoglund, Information Theory 8/21

Shannon-Fano-Elias Codes • Shannon-Fano-Elias code (as it is described in CT) 1 • l ( x ) = ⌈ log p ( x ) ⌉ + 1 = ⇒ L < H ( X ) + 2 [bits] • c ( x ) = ⌊ ¯ F ( x ) ⌋ l ( x ) = ⌊ F ( x ) + 1 2 p ( x ) ⌋ l ( x ) • Prefix-free if intervals [0 .c ( x ) , 0 .c ( x ) + 2 − l ( x ) ] disjoint (why?) = ⇒ instantaneous code (check) • Example: sample-based model-based ¯ ¯ p ( x ) l ( x ) F ( x ) c ( x ) p ( x ) l ( x ) F ( x ) c ( x ) X 1(00) 1/4 3 1/8 001 1/3 3 1/6 001 2(01) 1/8 4 5/16 0101 1/6 4 5/12 0110 3(10) 3/8 3 9/16 100 1/6 4 7/12 1001 4(11) 1/4 3 7/8 111 1/3 3 5/6 110 L = 3 . 125 < H ( X ) + 2 L = 3 . 333 < H ( X ) + 2 Mikael Skoglund, Information Theory 9/21 • Shannon (or Shannon–Fano) code (see HW Prob. 1) • order the probabilities 1 • l ( x ) = ⌈ log p ( x ) ⌉ = ⇒ L < H ( X ) + 1 • c ( x ) = ⌊ F ( x ) ⌋ l ( x ) • Fano code (see CT p. 123) • L < H ( X ) + 2 • order the probabilities • recursively split into subsets as nearly equiprobable as possible Mikael Skoglund, Information Theory 10/21

Intervals • Dyadic intervals • A binary string can represent a subinterval of [0 , 1) m x 1 x 2 · · · x m ∈ { 0 , 1 } m = x i 2 m − i ∈ { 0 , 1 , . . . , 2 m − 1 } � ⇒ x = i =1 (the usual binary representation of x ), then � x 110 2 m , x + 1 � x 1 x 2 · · · x m → ⊂ [0 , 1) 1 2 m � 3 4 , 7 � • For example, 110 → 0 8 Mikael Skoglund, Information Theory 11/21 Arithmetic Coding – Symbol • “Algorithm” • No preset codeword lengths for rounding off • Instead, the largest dyadic interval inside the symbol interval gives the codeword for the symbol • Example: Shannon-Fano-Elias vs. arithmetic symbol code sample-based model-based ✻ ✻ 11 111 11 ❄ ❄ 11 110 11 11 11 ✻ ❄ ❄ 10 1001 10 ✻ 10 100 10 10 100 ❄ ✻ 01 0110 01 011 ❄ ✻ 01 0101 01 010 ✻ ✻ 00 001 00 00 001 00 ❄ ❄ 00 00 Mikael Skoglund, Information Theory 12/21

Arithmetic Coding – Stream • Works for streams as well! • Consider binary strings, order strings according to their corresponding integers (e.g., 0111 < 1000 ), let � � F ( x N Pr( X N 1 = y N p ( x 1 x 2 · · · x k − 1 0)+ p ( x N 1 ) = 1 ) = 1 ) y N 1 ≤ x N k : x k =1 1 Sum over all strings to the left of x N 1 in a binary tree (with 00 · · · 0 to the far left) Mikael Skoglund, Information Theory 13/21 • Code x N 1 into largest interval inside [ F ( x N 1 ) − p ( x N 1 ) , F ( x N 1 )) • Markov source example (model-based) 100001 ✲ 10000 ✲ 1000011 1000 ✲ 100 ✲ 10 ✲ 1 ✲ 0 . 1 . ✲ Mikael Skoglund, Information Theory 14/21

Arithmetic Coding – Adaptive • Only the distribution of the current symbol conditioned on the past symbols is needed at every step ⇒ Easily made adaptive: just estimate p ( x n +1 | x n 1 ) • One such estimate is given by the Laplace model 1 ) = n x + 1 Pr( x n +1 = x | x n n + |X| Mikael Skoglund, Information Theory 15/21 Lempel-Ziv: A Universal Code • Not a symbol code • Quite another philosophy: parsings, phrases, dictionary 1 into phrases y c ( n ) • A parsing divides x n 1 x 1 x 2 · · · x n → y 1 , y 2 , . . . , y c ( n ) • In a distinct parsing phrases do not repeat • The LZ algorithm performs a greedy distinct parsing, whereby each new phrase extends an old phrase by just 1 bit ⇒ The LZ code for the new phrase is simply the dictionary index of the old phrase followed by the extra bit • There are several variants of LZ coding, we consider the “basic” and the “modified” LZ algorithms Mikael Skoglund, Information Theory 16/21

The “Basic” Lempel-Ziv Algorithm • Lempel-Ziv parsing and “basic” encoding of s phrases λ 1 0 00 01 10 100 011 11 indices 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 encoding ,1 0,0 10,0 10,1 001,0 101,0 100,1 001,1 • Remarks • Parsing starts with empty string • First pointer sent is also empty • Only “important” index bits are used • Even so, “compressed” 16 bits to 25 bits Mikael Skoglund, Information Theory 17/21 The “Modified” Lempel-Ziv Algorithm • The second time a phrase occurs, • the extra bit is known • it cannot be extended a distinct third way ⇒ the second extension may overwrite the parent • Lempel-Ziv parsing and “modified” encoding of s phrases λ 1 0 00 01 10 100 011 11 indices 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 1 1 encoding ,1 0, 0,0 00, 01,0 11,0 000,1 001, ⇒ saved 5 bits! (still 16:19 “compression”) Mikael Skoglund, Information Theory 18/21

Asymptotic Optimality of LZ Coding • Codeword lengths of Lempel-Ziv codes satisfy (index + extra bit) l ( x n 1 ) ≤ c ( n )(log c ( n ) + 1) • Using a counting argument, the number of phrases c ( n ) in a distinct parsing of a length n sequence is bounded as n c ( n ) ≤ log n (1 + o (1)) • Ziv’s lemma relates distinct parsings and a k th -order Markov approximation of the underlying distribution. Mikael Skoglund, Information Theory 19/21 • Combining the above leads to the optimality result: • For a stationary and ergodic source { X n } , 1 nl ( X n lim sup 1 ) ≤ H ( S ) a.s. n →∞ Mikael Skoglund, Information Theory 20/21

Generating Discrete Distributions from Fair Coins • A natural inverse to data compression • Source encoders aim to produce i.i.d. fair bits (symbols) • Source decoders noiselessly reproduce the original source sequence (with the proper distribution) ⇒ “Optimal” source decoders provide an efficient way to generate discrete random variables Mikael Skoglund, Information Theory 21/21

Information Theory Lecture 3 Lossless source coding algorithms: - PDF document

Information Theory Lecture 3 Lossless source coding algorithms: Huffman: CT5.68 Shannon-Fano-Elias: CT5.9 Arithmetic: CT13.3 Lempel-Ziv: CT13.45 Mikael Skoglund, Information Theory 1/21 Zero-Error Source Coding

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

General motivations Model theory Recursion theory Lambda calculus Set theory

Information Theory project Lo Bordy 29 mai 2017 Lo Bordy Information Theory project Global

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Lectures 34: Consumer Theory Alexander Wolitzky MIT 14.121 1 Consumer Theory Consumer theory

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE

EXCHANGE THEORY Chapter 3 Leader Member Exchange Theory 2 Initially the theory described the

V.2 Index Compression Heaps law (empirically observed and postulated): Size of the vocabulary

Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004 Contents (Data

Wireless Communication Systems @CS.NCTU Lecture 5: Compression Instructor: Kate Ching-Ju Lin (

Priority Queue implementation Creating Heaps

1 Data structures for decoder: Construction of canonical Huffman: (sketch) The array

D ATA C OMPRESSION May. 7, 2015 Acknowledgement: The course slides are

ANALOG AND DIGITAL VIDEO Henning Schulzrinne Columbia University COMS 6181 - Spring 2015 with

COMP 3403 Algorithm Analysis Part 5 Chapter 9 Jim Diamond CAR 409 Jodrey School of

Information Theory Lecture 3 Lossless source coding algorithms: - PDF document

Information Theory Lecture 3 Lossless source coding algorithms: Huffman: CT5.68 Shannon-Fano-Elias: CT5.9 Arithmetic: CT13.3 Lempel-Ziv: CT13.45 Mikael Skoglund, Information Theory 1/21 Zero-Error Source Coding

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

General motivations Model theory Recursion theory Lambda calculus Set theory

Information Theory project Lo Bordy 29 mai 2017 Lo Bordy Information Theory project Global

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Lectures 34: Consumer Theory Alexander Wolitzky MIT 14.121 1 Consumer Theory Consumer theory

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, &amp; CRITICAL RACE

EXCHANGE THEORY Chapter 3 Leader Member Exchange Theory 2 Initially the theory described the

V.2 Index Compression Heaps law (empirically observed and postulated): Size of the vocabulary

Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004 Contents (Data

Wireless Communication Systems @CS.NCTU Lecture 5: Compression Instructor: Kate Ching-Ju Lin (

Priority Queue implementation Creating Heaps

1 Data structures for decoder: Construction of canonical Huffman: (sketch) The array

D ATA C OMPRESSION May. 7, 2015 Acknowledgement: The course slides are

ANALOG AND DIGITAL VIDEO Henning Schulzrinne Columbia University COMS 6181 - Spring 2015 with

COMP 3403 Algorithm Analysis Part 5 Chapter 9 Jim Diamond CAR 409 Jodrey School of

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE