information theory
play

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical - PDF document

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical sequences: CT3 Introduction to lossless source coding: CT5.15 Mikael Skoglund, Information Theory 1/23 Information Sources source X n Source data : a speech


  1. Information Theory Lecture 2 • Sources and entropy rate: CT4 • Typical sequences: CT3 • Introduction to lossless source coding: CT5.1–5 Mikael Skoglund, Information Theory 1/23 Information Sources source X n • Source data : a speech signal, an image, a fax, a computer file,. . . • In practice source data is time-varying and unpredictable. • Bandlimited continuous-time signals (e.g. speech) can be sampled into discrete time and reproduced without loss. A source S is defined by a discrete-time stochastic process { X n } . Mikael Skoglund, Information Theory 2/23

  2. • If X n ∈ X , ∀ n , the set X is the source alphabet . • The source is • stationary if { X n } is stationary. • ergodic if { X n } is ergodic. • memoryless if X n and X m are independent for n � = m . • iid if { X n } is iid (independent and identically distributed). • stationary and memoryless = ⇒ iid • continuous if X is a continuous set (e.g. the real numbers). • discrete if X is a discrete set (e.g. the integers { 0 , 1 , 2 , . . . , 9 } ). • binary if X = { 0 , 1 } . Mikael Skoglund, Information Theory 3/23 • Consider a source S , described by { X n } . Define X N 1 � ( X 1 , X 2 , . . . , X N ) . • The entropy rate of S is defined as 1 N H ( X N H ( S ) � lim 1 ) N →∞ (when the limit exists). • H ( X ) is the entropy of a single random variable X , while entropy rate defines the “entropy per unit time” of the stochastic process S = { X n } . Mikael Skoglund, Information Theory 4/23

  3. • A stationary source S always has a well-defined entropy rate, and it furthermore holds that 1 N H ( X N H ( S ) = lim 1 ) = lim N →∞ H ( X N | X N − 1 , X N − 2 , . . . , X 1 ) . N →∞ That is, H ( S ) is a measure of the information gained when observing a source symbol, given knowledge of the infinite past. • We note that for iid sources N 1 1 � N H ( X N H ( S ) = lim 1 ) = lim H ( X m ) = H ( X 1 ) N N →∞ N →∞ m =1 • Examples (from CT4): Markov chain, Markov process, Random walk on a weighted graph, hidden Markov models,. . . Mikael Skoglund, Information Theory 5/23 Typical Sequences • A binary iid source { b n } with p = Pr( b n = 1) • Let R be the number of 1:s in a sequence, b 1 , . . . , b N , of ⇒ p ( b N 1 ) = p R (1 − p ) N − R length N = • P ( r ) � Pr( R N ≤ r ) for N = 10 , 50 , 100 , 500 , with p = 0 . 3 , P ( r ) 1 0.8 0.6 0.4 0.2 r 0.2 0.4 0.6 0.8 1 Mikael Skoglund, Information Theory 6/23

  4. • As N grows, the probability that a sequence will satisfy ⇒ given a b N R ≈ p · N is high = 1 that the source produced, it is likely that p ( b N 1 ) ≈ p pN (1 − p ) (1 − p ) N In the sense that the above holds with high probability, the “source will only produce” sequences for which 1 N log p ( b N 1 ) ≈ p log p + (1 − p ) log(1 − p ) = − H That is, for large N it holds with high probability that p ( b N 1 ) ≈ 2 − N · H where H is the entropy (entropy rate) of the source. Mikael Skoglund, Information Theory 7/23 • A general discrete source that produces iid symbols X n , with 1 ∈ X N we have X n ∈ X and Pr( X n = x ) = p ( x ) . For all x N N � log p ( x N 1 ) = log p ( x 1 , . . . , x N ) = log p ( x m ) . m =1 For an arbitrary random sequence X N 1 we hence get N 1 1 � N log p ( X N lim 1 ) = lim log p ( X m ) = E log p ( X 1 ) a.s. N N →∞ N →∞ m =1 by the (strong) law of large numbers. That is, for large N p ( X N 1 ) ≈ 2 − N · H ( X 1 ) holds with high probability. Mikael Skoglund, Information Theory 8/23

  5. • The result (the Shannon–McMillan–Breiman Theorem ) can be extended to (discrete) stationary and ergodic sources (CT16.8). For a stationary and ergodic source, S , it holds that 1 N log p ( X N − lim 1 ) = H ( S ) a.s. N →∞ where H ( S ) is the entropy rate of the source. • We note that p ( X N 1 ) is a random variable . However, the right-hand side of p ( X N 1 ) ≈ 2 − N · H ( S ) is a constant = ⇒ a constraint on the sequences the source “typically” produces! Mikael Skoglund, Information Theory 9/23 The Typical Set • For a given stationary and ergodic source S , the typical set 1 ∈ X N for which A ( N ) is the set of sequences x N ε 2 − N ( H ( S )+ ε ) ≤ p ( x N 1 ) ≤ 2 − N ( H ( S ) − ε ) ⇒ − N − 1 log p ( x N 1 ∈ A ( N ) 1 x N 1 ) ∈ [ H ( S ) − ε, H ( S ) + ε ] ε 1 ∈ A ( N ) 2 Pr( X N ) > 1 − ε , for N sufficiently large ε 3 | A ( N ) | ≤ 2 N ( H ( S )+ ε ) ε 4 | A ( N ) | ≥ (1 − ε )2 N ( H ( S ) − ε ) , for N sufficiently large ε That is, a large N and a small ε gives 1 ∈ A ( N ) ) ≈ 1 , | A ( N ) | ≈ 2 N H ( S ) Pr( X N ε ε | − 1 ≈ 2 − N H ( S ) for x N p ( x N 1 ) ≈ | A ( N ) 1 ∈ A ( N ) ε ε Mikael Skoglund, Information Theory 10/23

  6. The Typical Set and Source Coding 1 Fix ε (small) and N (large). Partition X N into two subsets: and B = X N \ A . A = A ( N ) ε 2 Observed sequences will “typically” belong to the set A . There are M = | A | ≤ 2 N ( H ( S )+ ε ) elements in A . 3 Let the different i ∈ { 0 , . . . , M − 1 } enumerate the elements of A . An index i can be stored or transmitted spending no more than ⌈ N · ( H ( S ) + ε ) ⌉ bits. 4 Encoding . For each observed sequence x N 1 1 if x N 1 ∈ A produce the corresponding index i . 2 if x N 1 ∈ B let i = 0 . 5 Decoding . Map each index i back into A ⊂ X M . Mikael Skoglund, Information Theory 11/23 • An error appears with probability Pr( X N 1 ∈ B ) ≤ ε for large N = ⇒ the probability of error can be made to vanish as N → ∞ • An “almost noiseless” source code that maps x N 1 into an index i , where i can be represented using at most ⌈ N · ( H ( S ) + ε ) ⌉ bits. However, since also M ≥ (1 − ε )2 N ( H ( S ) − ε ) , for a large enough N , we need at least ⌊ log(1 − ε ) + N ( H ( S ) − ε ) ⌋ bits. • Thus, for large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per source symbol. = ⇒ “Operational” meaning of entropy rate: the smallest rate at which a source can be coded with arbitrarily low error probability . Mikael Skoglund, Information Theory 12/23

  7. Data Compression • For large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per symbol, having a vanishing probability of error. • The above is an existence result ; it doesn’t tell us how to design codes. • For a fixed finite N , the typical-sequence codes discussed are “almost noiseless” fixed-length to fixed-length codes. • We will now start looking at concrete “zero-error” codes, their performance and how to design them. • Price to pay to get zero errors: fixed-length to variable -length Mikael Skoglund, Information Theory 13/23 Various Classifications • Source alphabet • Discrete sources • Continuous sources • Recovery requirement • Lossless source coding • Lossy source coding • Coding method • Fixed-length to fixed-length • Fixed-length to variable-length • Variable-length to fixed-length • Variable-length to variable-length Mikael Skoglund, Information Theory 14/23

  8. Zero-Error Source Coding • Source coding theorem for symbol codes (today) • Symbol codes, code extensions • Uniquely decodable and instantaneous (prefix) codes • Kraft(-McMillan) inequality • Bounds on the optimal codelength • Source coding theorem for zero-error prefix codes • Specific code constructions (next time) • Symbol codes: Huffman codes, Shannon-Fano codes • Stream codes: arithmetic codes, Lempel-Ziv codes Mikael Skoglund, Information Theory 15/23 What Is a Symbol Code? • D -ary symbol code C for a random variable X C : X → { 0 , 1 , . . . , D − 1 } ∗ • A ∗ = set of finite-length strings of symbols from a finite set A • C ( x ) codeword for x ∈ X • l ( x ) length of C ( x ) (i.e. number of D -ary symbols) • Data compression = ⇒ minimize expected length � L ( C, X ) = p ( x ) l ( x ) x ∈X • Extension of C is C ∗ : X ∗ → { 0 , 1 , . . . , D − 1 } ∗ C ∗ ( x n 1 ) = C ( x 1 ) C ( x 2 ) · · · C ( x n ) , n = 1 , 2 , . . . Mikael Skoglund, Information Theory 16/23

  9. Example: Encoding Coin Flips X Problem C 0 0 1 10 010 C u 00 1 10 10 · · · 0 C i 00 1 01 – Mikael Skoglund, Information Theory 17/23 Uniquely Decodable Codes • C is uniquely decodable if ∀ x , y ∈ X ∗ , ⇒ C ∗ ( x ) � = C ∗ ( y ) x � = y = • Any uniquely decodable code must satisfy the Kraft inequality D − l ( x ) ≤ 1 � x ∈X (McMillan’s result, Karush’s proof in C&T) Mikael Skoglund, Information Theory 18/23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend