Information Theory Lecture 2 Sources and entropy rate: CT4 Typical - PDF document

Information Theory Lecture 2 • Sources and entropy rate: CT4 • Typical sequences: CT3 • Introduction to lossless source coding: CT5.1–5 Mikael Skoglund, Information Theory 1/23 Information Sources source X n • Source data : a speech signal, an image, a fax, a computer file,. . . • In practice source data is time-varying and unpredictable. • Bandlimited continuous-time signals (e.g. speech) can be sampled into discrete time and reproduced without loss. A source S is defined by a discrete-time stochastic process { X n } . Mikael Skoglund, Information Theory 2/23

• If X n ∈ X , ∀ n , the set X is the source alphabet . • The source is • stationary if { X n } is stationary. • ergodic if { X n } is ergodic. • memoryless if X n and X m are independent for n � = m . • iid if { X n } is iid (independent and identically distributed). • stationary and memoryless = ⇒ iid • continuous if X is a continuous set (e.g. the real numbers). • discrete if X is a discrete set (e.g. the integers { 0 , 1 , 2 , . . . , 9 } ). • binary if X = { 0 , 1 } . Mikael Skoglund, Information Theory 3/23 • Consider a source S , described by { X n } . Define X N 1 � ( X 1 , X 2 , . . . , X N ) . • The entropy rate of S is defined as 1 N H ( X N H ( S ) � lim 1 ) N →∞ (when the limit exists). • H ( X ) is the entropy of a single random variable X , while entropy rate defines the “entropy per unit time” of the stochastic process S = { X n } . Mikael Skoglund, Information Theory 4/23

• A stationary source S always has a well-defined entropy rate, and it furthermore holds that 1 N H ( X N H ( S ) = lim 1 ) = lim N →∞ H ( X N | X N − 1 , X N − 2 , . . . , X 1 ) . N →∞ That is, H ( S ) is a measure of the information gained when observing a source symbol, given knowledge of the infinite past. • We note that for iid sources N 1 1 � N H ( X N H ( S ) = lim 1 ) = lim H ( X m ) = H ( X 1 ) N N →∞ N →∞ m =1 • Examples (from CT4): Markov chain, Markov process, Random walk on a weighted graph, hidden Markov models,. . . Mikael Skoglund, Information Theory 5/23 Typical Sequences • A binary iid source { b n } with p = Pr( b n = 1) • Let R be the number of 1:s in a sequence, b 1 , . . . , b N , of ⇒ p ( b N 1 ) = p R (1 − p ) N − R length N = • P ( r ) � Pr( R N ≤ r ) for N = 10 , 50 , 100 , 500 , with p = 0 . 3 , P ( r ) 1 0.8 0.6 0.4 0.2 r 0.2 0.4 0.6 0.8 1 Mikael Skoglund, Information Theory 6/23

• As N grows, the probability that a sequence will satisfy ⇒ given a b N R ≈ p · N is high = 1 that the source produced, it is likely that p ( b N 1 ) ≈ p pN (1 − p ) (1 − p ) N In the sense that the above holds with high probability, the “source will only produce” sequences for which 1 N log p ( b N 1 ) ≈ p log p + (1 − p ) log(1 − p ) = − H That is, for large N it holds with high probability that p ( b N 1 ) ≈ 2 − N · H where H is the entropy (entropy rate) of the source. Mikael Skoglund, Information Theory 7/23 • A general discrete source that produces iid symbols X n , with 1 ∈ X N we have X n ∈ X and Pr( X n = x ) = p ( x ) . For all x N N � log p ( x N 1 ) = log p ( x 1 , . . . , x N ) = log p ( x m ) . m =1 For an arbitrary random sequence X N 1 we hence get N 1 1 � N log p ( X N lim 1 ) = lim log p ( X m ) = E log p ( X 1 ) a.s. N N →∞ N →∞ m =1 by the (strong) law of large numbers. That is, for large N p ( X N 1 ) ≈ 2 − N · H ( X 1 ) holds with high probability. Mikael Skoglund, Information Theory 8/23

• The result (the Shannon–McMillan–Breiman Theorem ) can be extended to (discrete) stationary and ergodic sources (CT16.8). For a stationary and ergodic source, S , it holds that 1 N log p ( X N − lim 1 ) = H ( S ) a.s. N →∞ where H ( S ) is the entropy rate of the source. • We note that p ( X N 1 ) is a random variable . However, the right-hand side of p ( X N 1 ) ≈ 2 − N · H ( S ) is a constant = ⇒ a constraint on the sequences the source “typically” produces! Mikael Skoglund, Information Theory 9/23 The Typical Set • For a given stationary and ergodic source S , the typical set 1 ∈ X N for which A ( N ) is the set of sequences x N ε 2 − N ( H ( S )+ ε ) ≤ p ( x N 1 ) ≤ 2 − N ( H ( S ) − ε ) ⇒ − N − 1 log p ( x N 1 ∈ A ( N ) 1 x N 1 ) ∈ [ H ( S ) − ε, H ( S ) + ε ] ε 1 ∈ A ( N ) 2 Pr( X N ) > 1 − ε , for N sufficiently large ε 3 | A ( N ) | ≤ 2 N ( H ( S )+ ε ) ε 4 | A ( N ) | ≥ (1 − ε )2 N ( H ( S ) − ε ) , for N sufficiently large ε That is, a large N and a small ε gives 1 ∈ A ( N ) ) ≈ 1 , | A ( N ) | ≈ 2 N H ( S ) Pr( X N ε ε | − 1 ≈ 2 − N H ( S ) for x N p ( x N 1 ) ≈ | A ( N ) 1 ∈ A ( N ) ε ε Mikael Skoglund, Information Theory 10/23

The Typical Set and Source Coding 1 Fix ε (small) and N (large). Partition X N into two subsets: and B = X N \ A . A = A ( N ) ε 2 Observed sequences will “typically” belong to the set A . There are M = | A | ≤ 2 N ( H ( S )+ ε ) elements in A . 3 Let the different i ∈ { 0 , . . . , M − 1 } enumerate the elements of A . An index i can be stored or transmitted spending no more than ⌈ N · ( H ( S ) + ε ) ⌉ bits. 4 Encoding . For each observed sequence x N 1 1 if x N 1 ∈ A produce the corresponding index i . 2 if x N 1 ∈ B let i = 0 . 5 Decoding . Map each index i back into A ⊂ X M . Mikael Skoglund, Information Theory 11/23 • An error appears with probability Pr( X N 1 ∈ B ) ≤ ε for large N = ⇒ the probability of error can be made to vanish as N → ∞ • An “almost noiseless” source code that maps x N 1 into an index i , where i can be represented using at most ⌈ N · ( H ( S ) + ε ) ⌉ bits. However, since also M ≥ (1 − ε )2 N ( H ( S ) − ε ) , for a large enough N , we need at least ⌊ log(1 − ε ) + N ( H ( S ) − ε ) ⌋ bits. • Thus, for large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per source symbol. = ⇒ “Operational” meaning of entropy rate: the smallest rate at which a source can be coded with arbitrarily low error probability . Mikael Skoglund, Information Theory 12/23

Data Compression • For large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per symbol, having a vanishing probability of error. • The above is an existence result ; it doesn’t tell us how to design codes. • For a fixed finite N , the typical-sequence codes discussed are “almost noiseless” fixed-length to fixed-length codes. • We will now start looking at concrete “zero-error” codes, their performance and how to design them. • Price to pay to get zero errors: fixed-length to variable -length Mikael Skoglund, Information Theory 13/23 Various Classifications • Source alphabet • Discrete sources • Continuous sources • Recovery requirement • Lossless source coding • Lossy source coding • Coding method • Fixed-length to fixed-length • Fixed-length to variable-length • Variable-length to fixed-length • Variable-length to variable-length Mikael Skoglund, Information Theory 14/23

Zero-Error Source Coding • Source coding theorem for symbol codes (today) • Symbol codes, code extensions • Uniquely decodable and instantaneous (prefix) codes • Kraft(-McMillan) inequality • Bounds on the optimal codelength • Source coding theorem for zero-error prefix codes • Specific code constructions (next time) • Symbol codes: Huffman codes, Shannon-Fano codes • Stream codes: arithmetic codes, Lempel-Ziv codes Mikael Skoglund, Information Theory 15/23 What Is a Symbol Code? • D -ary symbol code C for a random variable X C : X → { 0 , 1 , . . . , D − 1 } ∗ • A ∗ = set of finite-length strings of symbols from a finite set A • C ( x ) codeword for x ∈ X • l ( x ) length of C ( x ) (i.e. number of D -ary symbols) • Data compression = ⇒ minimize expected length � L ( C, X ) = p ( x ) l ( x ) x ∈X • Extension of C is C ∗ : X ∗ → { 0 , 1 , . . . , D − 1 } ∗ C ∗ ( x n 1 ) = C ( x 1 ) C ( x 2 ) · · · C ( x n ) , n = 1 , 2 , . . . Mikael Skoglund, Information Theory 16/23

Example: Encoding Coin Flips X Problem C 0 0 1 10 010 C u 00 1 10 10 · · · 0 C i 00 1 01 – Mikael Skoglund, Information Theory 17/23 Uniquely Decodable Codes • C is uniquely decodable if ∀ x , y ∈ X ∗ , ⇒ C ∗ ( x ) � = C ∗ ( y ) x � = y = • Any uniquely decodable code must satisfy the Kraft inequality D − l ( x ) ≤ 1 � x ∈X (McMillan’s result, Karush’s proof in C&T) Mikael Skoglund, Information Theory 18/23

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical - PDF document

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical sequences: CT3 Introduction to lossless source coding: CT5.15 Mikael Skoglund, Information Theory 1/23 Information Sources source X n Source data : a speech

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

General motivations Model theory Recursion theory Lambda calculus Set theory

Information Theory project Lo Bordy 29 mai 2017 Lo Bordy Information Theory project Global

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Lectures 34: Consumer Theory Alexander Wolitzky MIT 14.121 1 Consumer Theory Consumer theory

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE

EXCHANGE THEORY Chapter 3 Leader Member Exchange Theory 2 Initially the theory described the

Computer Simulation Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu Outlines

Event Driven Simulation and Test-benches Event Driven Simulation Continuous time and value

Lecture 4: Locality and parallelism in simulation I David Bindel 6 Sep 2011 Logistics

Speculative High-Performance Simulation Alessandro Pellegrini A.Y. 2018/2019 Simulation

EchoTag: Accurate Infrastructure-Free Indoor Location Tagging with Smartphones Yu-Chih Tung and

Primary 6 English Meet-The-Parents Session 19 January 2019 Curriculum and Learning Materials

P6 CURRI CULUM B RI EFI NG 2 0 19 2 6 J A N 2 0 1 9 OVERVIEW Introduction How

Scalable tests for ergodicity analysis of large-scale interconnected stochastic reaction networks

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical - PDF document

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical sequences: CT3 Introduction to lossless source coding: CT5.15 Mikael Skoglund, Information Theory 1/23 Information Sources source X n Source data : a speech

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? What does a theory consist of?

General motivations Model theory Recursion theory Lambda calculus Set theory

Information Theory project Lo Bordy 29 mai 2017 Lo Bordy Information Theory project Global

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Lectures 34: Consumer Theory Alexander Wolitzky MIT 14.121 1 Consumer Theory Consumer theory

Absolute notions in model theory Syntactic and semantic notions Absolutness from model theory

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, &amp; CRITICAL RACE

EXCHANGE THEORY Chapter 3 Leader Member Exchange Theory 2 Initially the theory described the

Computer Simulation Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu Outlines

Event Driven Simulation and Test-benches Event Driven Simulation Continuous time and value

Lecture 4: Locality and parallelism in simulation I David Bindel 6 Sep 2011 Logistics

Speculative High-Performance Simulation Alessandro Pellegrini A.Y. 2018/2019 Simulation

EchoTag: Accurate Infrastructure-Free Indoor Location Tagging with Smartphones Yu-Chih Tung and

Primary 6 English Meet-The-Parents Session 19 January 2019 Curriculum and Learning Materials

P6 CURRI CULUM B RI EFI NG 2 0 19 2 6 J A N 2 0 1 9 OVERVIEW Introduction How

Scalable tests for ergodicity analysis of large-scale interconnected stochastic reaction networks

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE