chapter 28 entropy and shannon s theorem
play

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall - PDF document

Chapter 28 Entropy and Shannons Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1 Entropy 28.2 Extracting randomness 28.2.1 Enumerating binary strings with j ones 28.2.1.1 Storing all strings of length n and j bits on (A) S


  1. Chapter 28 Entropy and Shannon’s Theorem CS 573: Algorithms, Fall 2013 December 10, 2013 28.1 Entropy 28.2 Extracting randomness 28.2.1 Enumerating binary strings with j ones 28.2.1.1 Storing all strings of length n and j bits on (A) S n,j : set of all strings of length n with j ones in them. (B) T n,j : prefix tree storing all T n,j . T 0 , 0 T 1 , 0 T 1 , 1 28.2.1.2 Prefix tree ∀ binary strings of length n with j ones T n,j # of leafs: | T n,j | = | T n − 1 ,j | + | T n − 1 ,j − 1 | 0 1 ( n ) ( n − 1 ) ( n − 1 ) = + j j j − 1 ( n ) = ⇒ | T n,j | = . T n − 1 ,j T n − 1 ,j − 1 j 1

  2. T n,n T n, 0 0 1 T n − 1 , 0 T n − 1 ,n − 1 28.2.1.3 Encoding a string in S n,j (A) T n,j leafs corresponds to strings of S n,j . (B) Order all strings of S n,j order in lexicographical ordering (C) ≡ ordering leafs of T n,j from left to right. T n,j 0 1 T n − 1 ,j T n − 1 ,j − 1 (D) Input: s ∈ S n,j : compute index of s in sorted set S n,j . (E) EncodeBinomCoeff ( s ) denote this polytime procedure. 28.2.1.4 Decoding a string in S n,j (A) T n,j leafs corresponds to strings of S n,j . (B) Order all strings of S n,j order in lexicographical ordering (C) ≡ ordering leafs of T n,j from left to right. T n,j 0 1 T n − 1 ,j T n − 1 ,j − 1 { ( n )} (D) x ∈ 1 , . . . , : compute x th string in S n,j in polytime. j (E) DecodeBinomCoeff ( x ) denote this procedure. 28.2.1.5 Encoding/decoding strings of S n,j Lemma 28.2.1. S n,j : Set of binary strings of length n with j ones, sorted lexicographically. (A) EncodeBinomCoeff ( α ): Input is string α ∈ S n,j , compute index x of α in S n,j in polynomial time in n . { ( n )} (B) DecodeBinomCoeff ( x ): Input index x ∈ 1 , . . . , . Output x th string α in S n,j , in time j O (polylog n + n ) . 2

  3. 28.2.2 Extracting randomness 28.2.2.1 Extracting randomness Theorem 28.2.2. Consider a coin that comes up heads with probability p > 1 / 2 . For any constant δ > 0 and for n sufficiently large: (A) One can extract, from an input of a sequence of n flips, an output sequence of (1 − δ ) n H ( p ) (unbiased) independent random bits. (B) One can not extract more than n H ( p ) bits from such a sequence. 28.2.2.2 Proof... ( n ) (A) There are input strings with exactly j heads. j (B) each has probability p j (1 − p ) n − j . { ( n )} (C) map string s like that to index number in the set S j = 1 , . . . , . j (D) Given that input string s has j ones (out of n bits) defines a uniform distribution on S n,j . (E) x ← EncodeBinomCoeff ( s ) ( n ) (F) x uniform distributed in { 1 , . . . , N } , N = . j (G) Seen in previous lecture... (H) ... extract in expectation, ⌊ lg N ⌋ − 1 bits from uniform random variable in the range 1 , . . . , N . (I) Extract bits using ExtractRandomness ( x, N ):. 28.2.2.3 Exciting proof continued... (A) Z : random variable: number of heads in input string s . (B) B : number of random bits extracted. n � [ ] [ ] ∑ B = Pr [ Z = k ] E B � Z = k , E � k =0 ⌊ ( n )⌋ [ � ] (C) Know: E B � Z = k ≥ lg − 1. � k (D) ε < p − 1 / 2: sufficiently small constant. (E) n ( p − ε ) ≤ k ≤ n ( p + ε ): ( n ) ( ) ≥ 2 n H ( p + ε ) n ≥ n + 1 , k ⌊ n ( p + ε ) ⌋ ( n (F) ... since 2 n H ( p ) is a good approximation to ) as proved in previous lecture. np 28.2.2.4 Super exciting proof continued... � [ ] [ ] = ∑ n B k =0 Pr [ Z = k ] E B � Z = k . E � [ ] ≥ ∑ ⌈ n ( p + ε ) ⌉ [ ] [ � ] B k = ⌊ n ( p − ε ) ⌋ Pr Z = k B � Z = k E E � ⌈ n ( p + ε ) ⌉ ] (⌊ ( n )⌋ ) [ ∑ ≥ Pr Z = k lg − 1 k k = ⌊ n ( p − ε ) ⌋ ⌈ n ( p + ε ) ⌉ lg 2 n H ( p + ε ) ]( ) [ ∑ ≥ Pr Z = k − 2 n + 1 k = ⌊ n ( p − ε ) ⌋ ( ) = n H ( p + ε ) − lg( n + 1) − 2 Pr [ | Z − np | ≤ εn ] 3

  4. )( ( − nε 2 )) ( ≥ n H ( p + ε ) − lg( n + 1) − 2 1 − 2 exp , 4 p ( ) 2 ) [ ] ( ( ) − np − nε 2 | Z − np | ≥ ε ε since µ = E [ Z ] = np and Pr p pn ≤ 2 exp = 2 exp , by the Chernoff 4 4 p p inequality. 28.2.2.5 Hyper super exciting proof continued... (A) Fix ε > 0, such that H ( p + ε ) > (1 − δ/ 4) H ( p ), p is fixed. (B) = ⇒ n H ( p ) = Ω( n ), (C) For n sufficiently large: − lg( n + 1) ≥ − δ 10 n H ( p ). ( ) − nε 2 δ (D) ... also 2 exp ≤ 10 . 4 p (E) For n large enough; ( ) ( ) 1 − δ 4 − δ 1 − δ E [ B ] ≥ n H ( p ) 10 10 ≥ (1 − δ ) n H ( p ) , 28.2.2.6 Hyper super duper exciting proof continued... (A) Need to prove upper bound. (B) If input sequence x has probability Pr [ X = x ], then y = Ext ( x ) has probability to be generated ≥ Pr [ X = x ]. (C) All sequences of length | y | have equal probability to be generated (by definition). (D) 2 | Ext ( x ) | Pr [ X = x ] ≤ 2 | Ext ( x ) | Pr [ y = Ext ( x )] ≤ 1. (E) = ⇒ | Ext ( x ) | ≤ lg(1 / Pr [ X = x ]) [ ] [ ] [ ] 1 (F) E B = ∑ x Pr X = x | Ext ( x ) | ≤ ∑ x Pr X = x lg [ X = x ] = H ( X ) . Pr 28.3 Coding: Shannon’s Theorem 28.3.0.7 Shannon’s Theorem Definition 28.3.1. The input to a binary symmetric channel with parameter p is a sequence of bits x 1 , x 2 , . . . , and the output is a sequence of bits y 1 , y 2 , . . . , such that Pr [ x i = y i ] = 1 − p independently for each i . 28.3.0.8 Encoding/decoding with noise Definition 28.3.2. A ( k, n ) encoding function Enc : { 0 , 1 } k → { 0 , 1 } n takes as input a sequence of k bits and outputs a sequence of n bits. A ( k, n ) decoding function Dec : { 0 , 1 } n → { 0 , 1 } k takes as input a sequence of n bits and outputs a sequence of k bits. 28.3.0.9 Claude Elwood Shannon Claude Elwood Shannon (April 30, 1916 - February 24, 2001), an American electrical engineer and mathematician, has been called “the father of information theory”. His master thesis was how to building boolean circuits for any boolean function. 4

  5. 28.3.0.10 Shannon’s theorem (1948) Theorem 28.3.3 (Shannon’s theorem). For a binary symmetric channel with parameter p < 1 / 2 and for any constants δ, γ > 0 , where n is sufficiently large, the following holds: (i) For an k ≤ n (1 − H ( p ) − δ ) there exists ( k, n ) encoding and decoding functions such that the probability the receiver fails to obtain the correct message is at most γ for every possible k -bit input messages. (ii) There are no ( k, n ) encoding and decoding functions with k ≥ n (1 − H ( p ) + δ ) such that the probability of decoding correctly is at least γ for a k -bit input message chosen uniformly at random. 28.3.0.11 Intuition behind Shanon’s theorem 28.3.0.12 When the sender sends a string... S = s 1 s 2 . . . s n S S np S np S np (1 − δ ) np (1 + δ ) np One ring to rule them all! 28.3.0.13 Some intuition... (A) senders sent string S = s 1 s 2 . . . s n . (B) receiver got string T = t 1 t 2 . . . t n . (C) p = Pr [ t i ̸ = s i ], for all i . [ ] (D) U : Hamming distance between S and T : U = ∑ s i ̸ = t i . i (E) By assumption: E [ U ] = pn , and U is a binomial variable. [ ] (F) By Chernoff inequality: U ∈ (1 − δ ) np, (1 + δ ) np with high probability, where δ is tiny constant. (G) T is in a ring R centered at S , with inner radius (1 − δ ) np and outer radius (1 + δ ) np . (H) This ring has (1+ δ ) np ( n ) ( ) n ≤ α = 2 · 2 n H ((1+ δ ) p ) . ∑ ≤ 2 i (1 + δ ) np i =(1 − δ ) np strings in it. 5

  6. 28.3.0.14 Many rings for many codewords... 28.3.0.15 Some more intuition... (A) Pick as many disjoint rings as possible: R 1 , . . . , R κ . (B) If every word in the hypercube would be covered... (C) ... use 2 n codewords = ⇒ κ ≥ κ ≥ 2 n 2 n 2 · 2 n H ((1+ δ ) p ) ≈ 2 n (1 − H ((1+ δ ) p )) . | R | ≥ (D) Consider all possible strings of length k such that 2 k ≤ κ . (E) Map i th string in { 0 , 1 } k to the center C i of the i th ring R i . (F) If send C i = ⇒ receiver gets a string in R i . (G) Decoding is easy - find the ring R i containing the received string, take its center string C i , and output the original string it was mapped to. ( ( )) (H) How many bits? k = ⌊ log κ ⌋ = n 1 − H (1 + δ ) p ≈ n (1 − H ( p )), 28.3.0.16 What is wrong with the above? 28.3.0.17 What is wrong with the above? (A) Can not find such a large set of disjoint rings. (B) Reason is that when you pack rings (or balls) you are going to have wasted spaces around. (C) Overcome this: allow rings to overlap somewhat. (D) Makes things considerably more involved. (E) Details in class notes. 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend