Lecture 2 Lossless Source Coding I-Hsiang Wang Department of - PowerPoint PPT Presentation

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 2, 2016 1 / 50 I-Hsiang Wang IT Lecture 2

The engineering problem motivating the study of this lecture: For a (random) source sequence of length N , design an encoding scheme (mapping) to describe it using K bits, so that the decoder can reconstruct the source sequence at the destination from these K bits. How the encoding scheme works (the mapping) is known by the decoder a priori . Fundamental Questions : What is the minimum possible ratio K N (compression ratio/rate) ? How to achieve that fundamental limit? In this lecture, we will demonstrate that, for most random sources, the fundamental limit is the entropy rate of the random process of the source when we want to reconstruct the source losslessly. 2 / 50 I-Hsiang Wang IT Lecture 2

The Source Coding Problem (Shannon's Abstraction) s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Source Destination Meta Description 1 Encoder : Represent the source sequence s [1 : N ] by a binary source codeword { } 0 , 1 , . . . , 2 K − 1 w ≜ b [1 : K ] ∈ , with K as small as possible. 2 Decoder : From the source codeword w , reconstruct the source sequence either losslessly or within a certain distortion. 3 Efficiency : Determined by the code rate R ≜ K N bits/symbol time 3 / 50 I-Hsiang Wang IT Lecture 2

Decoding Criteria s [1 : N ] b [1 : K ] s [1 : N ] b Source Source Encoder Decoder Source Destination Naturally, one would think of two different decoding criteria for the source coding problem. 1 Exact: the reconstructed sequence � s [1 : N ] = s [1 : N ] . 2 Lossy: the reconstructed sequence � s [1 : N ] ̸ = s [1 : N ] , but is within a prescribed distortion. 4 / 50 I-Hsiang Wang IT Lecture 2

Let us begin with some simple back-of-envelope analysis of the system with the exact recovery criterion to get some intuition. For N fixed, if the decoder would like to reconstruct s [1 : N ] exactly for all possible s [1 : N ] ∈ S N , then it is simple to see that the smallest K must satisfy 2 K − 1 < |S| N ≤ 2 K = ⇒ K = ⌈ N log |S|⌉ . Why? Because every possible sequence has to be uniquely represented by K bits! It seems impossible to have data compression if we require exact reconstruction. What is going wrong? 5 / 50 I-Hsiang Wang IT Lecture 2

Random Source Recall: data compression is possible because there is redundancy in the source sequence. One of the simplest ways to capture redundancy is to model the source as a random process. (Another reason to use a random source model is due to engineering reasons, as mentioned in Lecture 1.) Redundancy comes from the fact that different symbols in S take different probabilities to be drawn . With a random source model, immediately there are two approaches one can take to demonstrate data compression: Allow variable codeword length for different symbols with different probabilities, rather than fixing it to be K Allow (almost) lossless reconstruction rather than exact recovery 6 / 50 I-Hsiang Wang IT Lecture 2

Block-to-Variable Source Coding s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Variable Length Source Destination The key difference here is that we allow K to depend on the realization of the source, s [1 : N ] . Using variable codeword length is intuitive – for symbols with higher probability, we tend to use shorter codewords to represent it. The definition of the code rate is modified to R ≜ E [ K ] N . An optimal block-to-variable source code, Huffman code, is introduced to achieve the minimum compression rate for a given distribution of the random source. (See Chapter 5 of Cover&Thomas) Note : the decoding criterion here is exact reconstruction (zero error) 7 / 50 I-Hsiang Wang IT Lecture 2

(Almost) Lossless Decoding Criterion Another way to let the randomness kick in: allow non-exact recovery. To be precise, we turn our focus to finding the smallest possible R = K N given that the error probability { } P ( N ) S [1 : N ] ̸ = � ≜ P → 0 as N → ∞ . S [1 : N ] e Key features of this approach: Focus on the asymptotic regime where N → ∞ ; instead of error-free reconstruction, the criterion is relaxed to vanishing error probability. Compared to the previous approach where the analysis is mainly combinatorial , the analysis here is majorly probabilistic . 8 / 50 I-Hsiang Wang IT Lecture 2

Outline In this lecture, we shall 1 First, focusing on memoryless sources, introduce a powerful tool called typical sequences, and use typical sequences to prove a lossless source coding theorem 2 Second, extend the typical sequence framework to sources with memory, and prove a similar lossless source coding theorem there. We will show that the minimum compression rate is equal to the entropy of the random source. Let us begin with the simplest case where the source { S [ t ] | t = 1 , 2 , . . . } consists of i.i.d. random i.i.d. variables S [ t ] ∼ P S , which is called a discrete memoryless source (DMS). 9 / 50 I-Hsiang Wang IT Lecture 2

Typical Sequences and a Lossless Source Coding Theorem Typical Sequences and a Lossless Source Coding Theorem 1 Typicality and AEP Lossless Source Coding Theorem 2 Weakly Typical Sequences and Sources with Memory Entropy Rate of Random Processes Typicality for Sources with Memory 10 / 50 I-Hsiang Wang IT Lecture 2

Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Typical Sequences and a Lossless Source Coding Theorem 1 Typicality and AEP Lossless Source Coding Theorem 2 Weakly Typical Sequences and Sources with Memory Entropy Rate of Random Processes Typicality for Sources with Memory 11 / 50 I-Hsiang Wang IT Lecture 2

Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Overview of Typicality Methods Goal : Understand and exploit the probabilistic asymptotic properties of a i.i.d. randomly generated sequence S [1 : N ] for coding. Key Observation : When N → ∞ , one often observe that a substantially small set of sequences become "typical", which contribute almost the whole probability, while others become "atypical". (cf. Lecture 2 "Operational Meaning of Entropy") For lossless reconstruction with vanishing error probability, we can use shorter codewords to label "typical" sequences and ignore "atypical" ones. Note : There are several notions of typicality and various definitions in the literature. In this lecture, we give two definitions: (robust) typicality and weak typicality. Notation : For notational convenience, we shall use the following interchangeably: → x t , and x [1 : N ] ← → x N . x [ t ] ← 12 / 50 I-Hsiang Wang IT Lecture 2

p Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Typical Sequence A (robust) typical sequence is a sequence whose empirical distribution is close to true distribution. For a sequence x n , its empirical p.m.f. is given by the frequency of occurrence of a symbol in x n : ∑ n π ( a | x n ) ≜ 1 i =1 1 { x i = a } . n i.i.d. Due the law of large numbers, π ( a | X n ) → P X ( a ) for all a ∈ X as n → ∞ , if X i ∼ P X . That is, with high probability, the empirical p.m.f. does not deviate too much from the actual p.m.f. Definition 1 (Typical Sequence) For ε ∈ (0 , 1) , a sequence x n is called ε -typical with respect to random variable X ∼ P X if | π ( a | x n ) − P X ( a ) | ≤ εP X ( a ) , ∀ a ∈ X . ( X ) ≜ { x n ∈ X n | x n is ε -typical with respect to X } . The ε -typical set T ( n ) ε 13 / 50 I-Hsiang Wang IT Lecture 2

2 Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Note : In the following, if the context is clear, we will write " T ( n ) " instead of " T ( n ) ( X ) ". ε ε Example 1 ( 1 ) Consider a random bit sequence generated i.i.d. based on Ber . Let us set ε = 0 . 2 and n = 10 . What is T ( n ) ? How large is the typical set? ε sol : Based on the definition, a n -sequence x n is ε -typical iff π (0 | x n ) ∈ [0 . 4 , 0 . 6] and π (1 | x n ) ∈ [0 . 4 , 0 . 6] . In other words, the # of " 0 "s in the sequence should be 4, 5, or 6. Hence, T ( n ) consists of all ε length-10 sequences with 4, 5, or 6 " 0 "s. ( 10 ) ( 10 ) ( 10 ) The size of T ( n ) is = 714 . + + ε 4 5 6 14 / 50 I-Hsiang Wang IT Lecture 2

lim Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Properties of Typical Sequences Let P ( x n ) ≜ P { X n = x n } = ∏ n i =1 P X ( x i ) , that is, the probability that the DMS generates the sequence x n . Similarly P ( A ) ≜ P { X n ∈ A} , denotes the probability of a set A . Proposition 1 (Properties of Typical Sequences and Typical Set) 1 ∀ x n ∈ T ( n ) ( X ) , 2 − n ( H ( X )+ δ ( ε )) ≤ P ( x n ) ≤ 2 − n ( H ( X ) − δ ( ε )) , where δ ( ε ) = εH ( X ) . ε (by definition of typical sequences and entropy) ( ) ( ) T ( n ) T ( n ) = 1 , i.e., P ≥ 1 − ε for n large enough. ( X ) ( X ) n →∞ P 2 ε ε (by the law of large numbers (LLN)) 3 |T ( n ) ( X ) | ≤ 2 n ( H ( X )+ δ ( ε )) . (by summing up the lower bound in property 1 over the typical set) ε ( X ) | ≥ (1 − ε )2 n ( H ( X ) − δ ( ε )) for n large enough. (by the upper bound in property 1, and property 2) 4 |T ( n ) ε 15 / 50 I-Hsiang Wang IT Lecture 2

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of - PowerPoint PPT Presentation

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 2, 2016 1 / 50 I-Hsiang Wang IT Lecture 2 The engineering problem motivating the study of this

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lossless compression in lossy compression systems Almost every lossy compression system

Information Theory Lecture 3 Lossless source coding algorithms: Huffman: CT5.68

Coding and Applications in Sensor Networks Why coding? Information compression

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Compression Outline Introduction : Lossy vs. Lossless, Benchmarks, 15-583:Algorithms in the

Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro

Hypergraph-based Coding Schemes for Two Source Coding Problems under Maximal Distortion Sourya

Coding for Everyone How your library can help anyone learn to code July 19, 2016 Kelly Smith

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai

Conjugate prior summary Distribution Likelihood p ( x | ) Prior p ( ) Distribution (1

The Simple English guide to human-generated secrets Computers try to tell humans apart by asking

PriSec Research Group Datavetenskap, Karlstads universitet Christer Andersson , Reine Lundin On