Lecture 3 Source Coding I-Hsiang Wang Department of Electrical - PowerPoint PPT Presentation

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory Summary Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 8, 2014 1 / 34 I-Hsiang Wang NIT Lecture 3

Typical Sequences and a Lossless Source Coding Theorem Meta Description I-Hsiang Wang 2 / 34 3 Efficiency : losslessly or within a certain distortion. From the source codeword w , reconstruct the source sequence either 2 Decoder : , with K as small as possible. Weakly Typical Sequences and Sources with Memory 1 Encoder : NIT Lecture 3 The Source Coding Problem Summary s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Source Destination Represent the source sequence s [1 : N ] by a binary source codeword [ ] 0 : 2 K − 1 w := b [1 : K ] ∈ Determined by the code rate R := K N bits/symbol time

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory I-Hsiang Wang 3 / 34 prescribed distortion. source coding problem. Naturally, one would think of two different decoding criteria for the NIT Lecture 3 Summary Decoding Criteria s [1 : N ] b [1 : K ] s [1 : N ] b Source Source Encoder Decoder Source Destination 1 Exact: the reconstructed sequence � s [1 : N ] = s [1 : N ] . 2 Lossy: the reconstructed sequence � s [1 : N ] ̸ = s [1 : N ] , but is within a

Typical Sequences and a Lossless Source Coding Theorem Why? I-Hsiang Wang 4 / 34 What is going wrong? impossible to have data compression. As a consequence, it seems that if we require exact reconstruction, it is Because every possible sequence has to be uniquely represented by K bits! NIT Lecture 3 Weakly Typical Sequences and Sources with Memory must satisfy recovery criterion. Let us begin with some simple analysis of the system with the exact Summary For N fixed, if the decoder would like to reconstruct s [1 : N ] exactly for all possible s [1 : N ] ∈ S N , then it is simple to see that the smallest K 2 K − 1 < |S| N ≤ 2 K = ⇒ K = ⌈ N log |S|⌉ .

Typical Sequences and a Lossless Source Coding Theorem different probabilities to be drawn . I-Hsiang Wang 5 / 34 Allow (almost) lossless reconstruction rather than exact recovery probabilities, rather than fixing it to be K Allow variable codeword length for different symbols with different can take to demonstrate data compression: With a random source model, immediately there are two approaches one NIT Lecture 3 Weakly Typical Sequences and Sources with Memory engineering reasons, as mentioned in Lecture 1.) One of the simplest ways to capture redundancy is to model the source source sequence. Recall: data compression is possible because there is redundancy in the Random Source Summary as a random process. (Another reason to use a random source model is due to Redundancy comes from the fact that different symbols in S take

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory I-Hsiang Wang 6 / 34 Note : the decoding criterion here is exact reconstruction. for a given distribution of the random source. called Huffman code, which can achieve the minimum compression rate In this lecture we will introduce an optimal block-to-variable source code, probability, we tend to use shorter codewords to represent it. Using variable codeword length is intuitive – for symbols with higher The key difference here is that we allow K to depend on the realization of NIT Lecture 3 Summary Block-to-Variable Source Coding s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Variable Length Source Destination the source, s [1 : N ] . The definition of the code rate is modified to R := E [ K ] N .

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory I-Hsiang Wang 7 / 34 arguments. combinatorial, the analysis here is majorly based on probabilistic Compared to the previous approach where the analysis is mainly reconstruction, the criterion is relaxed to vanishing error probability. Key features of this approach: NIT Lecture 3 e N Another way to let the randomness kick in: allow non-exact recovery. (Almost) Lossless Decoding Criterion Summary To be precise, we turn our focus to finding the smallest possible R = K given that the error probability { } P ( N ) S [1 : N ] ̸ = � := Pr S [1 : N ] → 0 as N → ∞ . Focus on the asymptotic regime where N → ∞ ; instead of error-free

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory I-Hsiang Wang 8 / 34 called a discrete memoryless source (DMS). We shall begin with the simplest case where the random process In both cases, we will show that the minimum compression rate is equal especially Huffman codes, and prove its optimality 2 Second, introduce block-to-variable source coding schemes, typical sequences to prove a lossless source coding theorem 1 First, introduce a powerful tool called typical sequences, and use In this lecture, we shall Outline Summary NIT Lecture 3 to the entropy of the random source. { S [ t ] | t = 1 , 2 , . . . } consists of i.i.d. random variables S [ t ] ∼ p S , which is

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory Summary 1 Typical Sequences and a Lossless Source Coding Theorem 2 Weakly Typical Sequences and Sources with Memory 3 Summary 9 / 34 I-Hsiang Wang NIT Lecture 3

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory I-Hsiang Wang 10 / 34 interchangeably: Notation : For notational convenience, we shall use the following and weak typicality. the literature. In this lecture, we give two definitions: (robust) typicality Note : There are several notions of typicality and various definitions in For lossless reconstruction with vanishing error probability, we can use whole probability, while others become “atypical”. Goal : Understand and exploit the probabilistic asymptotic properties of a Overview of Typicality Methods Summary NIT Lecture 3 i.i.d. randomly generated sequence S [1 : N ] for coding. Key Observation : When N → ∞ , one often observe that a substantially small set of sequences will become “typical”, which contribute almost the shorter codewords to label “typical” sequences and ignore “atypical” ones. x [ t ] ← → x t , x [1 : N ] ← → x N .

Typical Sequences and a Lossless Source Coding Theorem n I-Hsiang Wang 11 / 34 Definition 1 (Typical Sequence) empirical p.m.f. does not deviate too much from the actual p.m.f. Weakly Typical Sequences and Sources with Memory . p NIT Lecture 3 empirical distribution is close to the actual distribution. Summary Typical Sequence For a sequence x n , its empirical p.m.f. is given by the frequency of Roughly speaking, a (robust) typical sequence is a sequence whose ∑ n i =1 I { x i = a } occurrence of a symbol in the sequence: π ( a | x n ) := Due the law of large numbers, π ( a | x n ) → p X ( a ) for all a ∈ X as n → ∞ , if x n is drawn i.i.d. based on p X . With high probability, the For X ∼ p X and ϵ ∈ (0 , 1) , a sequence x n is called ϵ -typical if | π ( a | x n ) − p X ( a ) | ≤ ϵ p X ( a ) , ∀ a ∈ X . The typical set is defined as the collection of all ϵ -typical length- n sequences, denoted by T ( n ) ( X ) . ϵ

Typical Sequences and a Lossless Source Coding Theorem Consider a random bit sequence generated i.i.d. based on Ber I-Hsiang Wang 12 / 34 is Weakly Typical Sequences and Sources with Memory ? How large is the typical set? . Let 2 NIT Lecture 3 Example 1 ” Summary Note : In the following, if the context is clear, we will write “ T ( n ) ϵ instead of “ T ( n ) ( X ) ”. ϵ ( 1 ) us set ϵ = 0 . 2 and n = 10 . What is T ( n ) ϵ sol : Based on the definition, a n -sequence x n is ϵ -typical iff π (0 | x n ) ∈ [0 . 4 , 0 . 6] and π (1 | x n ) ∈ [0 . 4 , 0 . 6] . In other words, the # of “ 0 ”s in the sequence should be 4, 5, or 6. Hence, T ( n ) consists of all length-10 sequences with 4, 5, or 6 0 ”s. ϵ ( 10 ) ( 10 ) ( 10 ) The size of T ( n ) + + = 714 . ϵ 4 5 6

Typical Sequences and a Lossless Source Coding Theorem (by definition of typical sequences and entropy) I-Hsiang Wang 13 / 34 (by the upper bound in property 1, and property 2) (by summing up the lower bound in property 1 over the typical set) (by the law of large numbers (LLN)) enough. Weakly Typical Sequences and Sources with Memory lim 2 NIT Lecture 3 Properties of Typical Sequences Summary Proposition 1 (Properties of Typical Sequences and Typical Set) Let p ( x n ) := Pr { X n = x n } = ∏ n i =1 p X ( x i ) , that is, the probability that the DMS generates the sequence x n . Similarly p ( A ) := Pr { X n ∈ A} , denotes the probability of a set A . 1 ∀ x n ∈ T ( n ) ( X ) , 2 − n ( H ( X )+ δ ( ϵ )) ≤ p ( x n ) ≤ 2 − n ( H ( X ) − δ ( ϵ )) , where ϵ δ ( ϵ ) = ϵ H ( X ) . ( ) ( ) T ( n ) T ( n ) ( X ) = 1 , i.e., p ( X ) ≥ 1 − ϵ for n large ϵ ϵ n →∞ p 3 |T ( n ) ( X ) | ≤ 2 n ( H ( X )+ δ ( ϵ )) . ϵ 4 |T ( n ) ( X ) | ≥ (1 − ϵ )2 n ( H ( X ) − δ ( ϵ )) for n large enough. ϵ

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical - PowerPoint PPT Presentation

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory Summary Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C D 1 D 0 3 7 A current 2

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of California, Riverside Wojciech

Information Retrieval Tutorial 3: Index Compression Professor: Michel Schellekens TA: Ang Gao

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of

Index compression CE-324: Modern Information Retrieval Sharif University of Technology M.

Reducing Checkpoint Size in PlasComCM with Lossy Compression 14th Annual Workshop on Charm++ and

A Concrete Treatment of Fiat-Shamir Signatures in the Quantum Random-Oracle Model EUROCRYPT 2018

tt ts

Fast Polarization for Processes with Memory Joint work with Eren S as o glu and Boaz

COMS 4721: Machine Learning for Data Science Lecture 14, 3/21/2017 Prof. John Paisley Department

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical - PowerPoint PPT Presentation

Typical Sequences and a Lossless Source Coding Theorem Weakly Typical Sequences and Sources with Memory Summary Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C D 1 D 0 3 7 A current 2

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of California, Riverside Wojciech

Information Retrieval Tutorial 3: Index Compression Professor: Michel Schellekens TA: Ang Gao

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of

Index compression CE-324: Modern Information Retrieval Sharif University of Technology M.

Reducing Checkpoint Size in PlasComCM with Lossy Compression 14th Annual Workshop on Charm++ and

A Concrete Treatment of Fiat-Shamir Signatures in the Quantum Random-Oracle Model EUROCRYPT 2018

tt ts

Fast Polarization for Processes with Memory Joint work with Eren S as o glu and Boaz

COMS 4721: Machine Learning for Data Science Lecture 14, 3/21/2017 Prof. John Paisley Department

Sambuz

Useful Links

Newsletter

Mail Us

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen