Entropy, Randomness, and Information Lecture 23 November 13, 2014 - PowerPoint PPT Presentation

Task at hand: Squeezing good random bits... ...out of bad random bits... b 1 , . . . , b n : result of n coin flips... 1 From a faulty coin! 2 p : probability for head. 3 We need fair bit coins! 4 ⇒ b ′ 1 , . . . , b ′ Convert b 1 , . . . , b n = m . 5 New bits must be truly random : Probability for head is 1 / 2 . 6 7 Q: How many truly random bits can we extract? Sariel (UIUC) CS573 9 Fall 2014 9 / 30

Intuitively... Squeezing good random bits out of bad random bits... Question... Given the result of n coin flips: b 1 , . . . , b n from a faulty coin, with head with probability p , how many truly random bits can we extract? If believe intuition about entropy, then this number should be ≈ n H ( p ) . Sariel (UIUC) CS573 10 Fall 2014 10 / 30

Back to Entropy � � � � entropy of X is H ( X ) = − � x Pr X = x lg Pr X = x . 1 Entropy of uniform variable.. 2 Example A random variable X that has probability 1 / n to be i , for i = 1 , . . . , n , has entropy H ( X ) = − � n n lg 1 1 n = lg n . i =1 Entropy is oblivious to the exact values random variable can 3 have. = ⇒ random variables over − 1 , +1 with equal probability has 4 the same entropy (i.e., 1 ) as a fair coin. Sariel (UIUC) CS573 11 Fall 2014 11 / 30

Lemma: Entropy additive for independent variables Lemma Let X and Y be two independent random variables, and let Z be the random variable ( X , Y ) . Then H ( Z ) = H ( X ) + H ( Y ) . Sariel (UIUC) CS573 12 Fall 2014 12 / 30

Proof In the following, summation are over all possible values that the variables can have. By the independence of X and Y we have 1 � � � H ( Z ) = Pr ( X , Y ) = ( x , y ) lg Pr[( X , Y ) = ( x , y )] x , y 1 � � � � � = Pr X = x Pr Y = y lg Pr[ X = x ] Pr[ Y = y ] x , y 1 � � = Pr[ X = x ] Pr[ Y = y ] lg Pr[ X = x ] x y 1 � � + Pr[ X = x ] Pr[ Y = y ] lg Pr[ Y = y ] y x Sariel (UIUC) CS573 13 Fall 2014 13 / 30

Proof continued 1 � � H ( Z ) = Pr[ X = x ] Pr[ Y = y ] lg Pr[ X = x ] x y 1 � � + Pr[ X = x ] Pr[ Y = y ] lg Pr[ Y = y ] y x 1 � = Pr[ X = x ] lg Pr[ X = x ] x 1 � + Pr[ Y = y ] lg Pr[ Y = y ] y = H ( X ) + H ( Y ) . Sariel (UIUC) CS573 14 Fall 2014 14 / 30

Bounding the binomial coefficient using entropy Lemma q ∈ [0 , 1] nq is integer in the range [0 , n ] . Then � n � 2 n H ( q ) ≤ 2 n H ( q ) . n + 1 ≤ nq Sariel (UIUC) CS573 15 Fall 2014 15 / 30

Proof Holds if q = 0 or q = 1 , so assume 0 < q < 1 . We have � n � q nq (1 − q ) n − nq ≤ ( q + (1 − q )) n = 1 . nq We also have: q − nq (1 − q ) − (1 − q ) n = 2 n ( − q lg q − (1 − q ) lg(1 − q )) = 2 n H ( q ) , we have � n � ≤ q − nq (1 − q ) − (1 − q ) n = 2 n H ( q ) . nq Sariel (UIUC) CS573 16 Fall 2014 16 / 30

Proof continued Other direction... � n � q k (1 − q ) n − k µ ( k ) = 1 k � n � q i (1 − q ) n − i = � n � n i =0 µ ( i ) . 2 i =0 i � n � q nq (1 − q ) n − nq largest term in Claim: µ ( nq ) = 3 nq � n k =0 µ ( k ) = 1 . � n � q k (1 − q ) n − k � � 1 − n − k q ∆ k = µ ( k ) − µ ( k + 1) = , 4 k +1 1 − q k sign of ∆ k = size of last term... 5 � � ( n − k ) q sign(∆ k ) = sign 1 − 6 ( k +1)(1 − q ) � ( k +1)(1 − q ) − ( n − k ) q � = sign . ( k +1)(1 − q ) Sariel (UIUC) CS573 17 Fall 2014 17 / 30

Proof continued ( k + 1)(1 − q ) − ( n − k ) q = 1 k + 1 − kq − q − nq + kq = 1 + k − q − nq . = ⇒ ∆ k ≥ 0 when k ≥ nq + q − 1 2 ∆ k < 0 otherwise. � n � q k (1 − q ) n − k µ ( k ) = 3 k µ ( k ) < µ ( k + 1) , for k < nq , and µ ( k ) ≥ µ ( k + 1) for 4 k ≥ nq . � n = ⇒ µ ( nq ) is the largest term in k =0 µ ( k ) = 1 . 5 µ ( nq ) larger than the average in sum. 6 � n � q k (1 − q ) n − k ≥ 1 = ⇒ n +1 . 7 k � n � n +1 q − nq (1 − q ) − ( n − nq ) = 1 1 n +1 2 n H ( q ) . = ⇒ ≥ 8 nq Sariel (UIUC) CS573 18 Fall 2014 18 / 30

Proof continued ( k + 1)(1 − q ) − ( n − k ) q = 1 k + 1 − kq − q − nq + kq = 1 + k − q − nq . = ⇒ ∆ k ≥ 0 when k ≥ nq + q − 1 2 ∆ k < 0 otherwise. � n � q k (1 − q ) n − k µ ( k ) = 3 k µ ( k ) < µ ( k + 1) , for k < nq , and µ ( k ) ≥ µ ( k + 1) for 4 k ≥ nq . ⇒ µ ( nq ) is the largest term in � n = k =0 µ ( k ) = 1 . 5 µ ( nq ) larger than the average in sum. 6 � n � q k (1 − q ) n − k ≥ 1 = ⇒ n +1 . 7 k � n � n +1 q − nq (1 − q ) − ( n − nq ) = 1 1 n +1 2 n H ( q ) . = ⇒ ≥ 8 nq Sariel (UIUC) CS573 18 Fall 2014 18 / 30

Generalization... Corollary We have: � n � ≤ 2 n H ( q ) . (i) q ∈ [0 , 1 / 2] ⇒ ⌊ nq ⌋ � n � ≤ 2 n H ( q ) . (ii) q ∈ [1 / 2 , 1] ⌈ nq ⌉ � n � (iii) q ∈ [1 / 2 , 1] ⇒ 2 n H ( q ) n +1 ≤ . ⌊ nq ⌋ � n � (iv) q ∈ [0 , 1 / 2] ⇒ 2 n H ( q ) n +1 ≤ . ⌈ nq ⌉ Proof is straightforward but tedious. Sariel (UIUC) CS573 19 Fall 2014 19 / 30

What we have... � n � ≈ 2 n H ( q ) . Proved that 1 nq Estimate is loose. 2 Sanity check... 3 (I) A sequence of n bits generated by coin with probability q for head. (II) By Chernoff inequality... roughly nq heads in this sequence. � n � ≈ 2 n H ( q ) possible (III) Generated sequence Y belongs to nq sequences . (IV) ...of similar probability. � n � (V) = ⇒ H ( Y ) = n H ( q ) ≈ lg . nq Sariel (UIUC) CS573 20 Fall 2014 20 / 30

Just one bit... question Given a coin C with: p : Probability for head. q = 1 − p : Probability for tail. Q: How to get one true random bit, by flipping C . Describe an algorithm! Sariel (UIUC) CS573 21 Fall 2014 21 / 30

Extracting randomness... Entropy can be interpreted as the amount of unbiased random coin flips can be extracted from a random variable. Definition An extraction function Ext takes as input the value of a random variable X and outputs a sequence of bits y , such that � � � 1 Pr Ext ( X ) = y � | y | = k � = 2 k , whenever Pr[ | y | = k ] > 0 , where | y | denotes the length of y . Sariel (UIUC) CS573 22 Fall 2014 22 / 30

Extracting randomness... X : uniform random integer variable out of 0 , . . . , 7 . 1 Ext ( X ) : binary representation of x . 2 Def. subtle: all extracted seqs of same len have same probability. 3 Another example of extraction scheme: 4 X : uniform random integer variable 0 , . . . , 11 . 1 Ext ( x ) : output the binary representation for x if 0 ≤ x ≤ 7 . 2 If x is between 8 and 11 ? 3 Idea... Output binary representation of x − 8 as a two bit 4 number. A valid extractor... 5 � � � = 1 Pr Ext ( X ) = 00 � � | Ext ( X ) | = 2 4 , Sariel (UIUC) CS573 23 Fall 2014 23 / 30

Technical lemma The following is obvious, but we provide a proof anyway. Lemma Let x / y be a faction, such that x / y < 1 . Then, for any i , we have x / y < ( x + i ) / ( y + i ) . Proof. We need to prove that x ( y + i ) − ( x + i ) y < 0 . The left size is equal to i ( x − y ) , but since y > x (as x / y < 1 ), this quantity is negative, as required. Sariel (UIUC) CS573 24 Fall 2014 24 / 30

A uniform variable extractor... Theorem X : random variable chosen uniformly at random from 1 { 0 , . . . , m − 1 } . Then there is an extraction function for X : 2 outputs on average at least 1 ⌊ lg m ⌋ − 1 = ⌊ H ( X ) ⌋ − 1 independent and unbiased bits. Sariel (UIUC) CS573 25 Fall 2014 25 / 30

Proof m : A sum of unique powers of 2 , namely m = � i a i 2 i , where 1 a i ∈ { 0 , 1 } . Example: 2 decomposed { 0 , . . . , m − 1 } into disjoint union of blocks sizes 3 are powers of 2 . If x is in block 2 k , output its relative location in the block in 4 binary representation. Example: x = 10 : 5 then falls into block 2 2 ... x relative location is 2. Output 2 written using two bits, Output: “10”. Sariel (UIUC) CS573 26 Fall 2014 26 / 30

Proof m : A sum of unique powers of 2 , namely m = � i a i 2 i , where 1 a i ∈ { 0 , 1 } . 0 1 2 3 4 5 6 7 8 9 10 12 14 Example: 2 11 13 decomposed { 0 , . . . , m − 1 } into disjoint union of blocks sizes 3 are powers of 2 . If x is in block 2 k , output its relative location in the block in 4 binary representation. Example: x = 10 : 5 then falls into block 2 2 ... x relative location is 2. Output 2 written using two bits, Output: “10”. Sariel (UIUC) CS573 26 Fall 2014 26 / 30

Proof m : A sum of unique powers of 2 , namely m = � i a i 2 i , where 1 a i ∈ { 0 , 1 } . 0 1 2 3 4 5 6 7 8 9 10 12 14 0 1 2 3 4 5 6 7 8 9 10 12 14 Example: 2 11 13 11 13 decomposed { 0 , . . . , m − 1 } into disjoint union of blocks sizes 3 are powers of 2 . If x is in block 2 k , output its relative location in the block in 4 binary representation. Example: x = 10 : 5 then falls into block 2 2 ... x relative location is 2. Output 2 written using two bits, Output: “10”. Sariel (UIUC) CS573 26 Fall 2014 26 / 30

Proof � i a i 2 i , where m : A sum of unique powers of 2 , namely m = 1 a i ∈ { 0 , 1 } . 0 1 2 3 4 5 6 7 8 9 10 12 14 0 1 2 3 4 5 6 7 8 9 10 12 14 Example: 2 11 13 11 13 decomposed { 0 , . . . , m − 1 } into disjoint union of blocks sizes 3 are powers of 2 . If x is in block 2 k , output its relative location in the block in 4 binary representation. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 12 14 Example: x = 10 : 5 11 13 then falls into block 2 2 ... x relative location is 2. Output 2 written using two bits, Output: “10”. Sariel (UIUC) CS573 26 Fall 2014 26 / 30

Proof continued Valid extractor... 1 Theorem holds if m is a power of two. Only one block. 2 m not a power of 2 ... 3 X falls in block of size 2 k : then output k complete random bits.. 4 ... entropy is k . Let 2 k < m < 2 k +1 biggest block. 5 � � lg( m − 2 k ) u = < k . 6 There must be a block of size u in the decomposition of m . two blocks in decomposition of m : sizes 2 k and 2 u . 7 Largest two blocks... 8 2 k + 2 ∗ 2 u > m = ⇒ 2 u +1 + 2 k − m > 0 . 9 10 Y : random variable = number of bits output by extractor. Sariel (UIUC) CS573 27 Fall 2014 27 / 30

Proof continued By lemma, since m − 2 k < 1 : 1 m � � m − 2 k + 2 u +1 + 2 k − m 2 u +1 m − 2 k ≤ +(2 u +1 + 2 k − m ) = 2 u +1 + 2 k . m m By induction (assumed holds for all numbers smaller than m ): 2 � � � E[ Y ] ≥ 2 k m k + m − 2 k � lg( m − 2 k ) − 1 m � �� u = 2 k m k + m − 2 k ( k − k + u − 1) m � �� =0 = k + m − 2 k ( u − k − 1) m Sariel (UIUC) CS573 28 Fall 2014 28 / 30

Proof continued.. We have: 1 ≥ k + m − 2 k � � Y ( u − k − 1) E m 2 u +1 ≥ k + 2 u +1 + 2 k ( u − k − 1) 2 u +1 = k − 2 u +1 + 2 k (1 + k − u ) , since u − k − 1 ≤ 0 as k > u . If u = k − 1 , then E[ Y ] ≥ k − 1 2 · 2 = k − 1 , as required. 2 If u = k − 2 then E[ Y ] ≥ k − 1 3 · 3 = k − 1 . 3 Sariel (UIUC) CS573 29 Fall 2014 29 / 30

Entropy, Randomness, and Information Lecture 23 November 13, 2014 - PowerPoint PPT Presentation

CS 573: Algorithms, Fall 2014 Entropy, Randomness, and Information Lecture 23 November 13, 2014 Sariel (UIUC) CS573 1 Fall 2014 1 / 30 Part I Entropy Sariel (UIUC) CS573 2 Fall 2014 2 / 30 Quote If only once - only once - no

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

On the Security of Election Audits with Low Entropy Randomness Eric Rescorla ekr@rtfm.com

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Entropy and Shannon information Entropy and Shannon information For a random variable X with

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

From Polarized Targets to Polarized Ion Beams EIC Accelerator Collaboration Meeting 2019

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2020 ADMINISTRIVIA , posted

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

Entropy, Randomness, and Information Lecture 27 December 5, 2013 Sariel (UIUC) CS573 1 Fall

Ambipolar Diffusion Effects on the Weakly Ionized Turbulence Molecular Clouds UC-HIPACC: The

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment Frank Wolter University of

Reflections on nonstandard satisfaction Richard Kaye School of Mathematics, University of

Elements of European Fairy Tales revised 01.25.13 || English 1302: Composition II || D. Glen

Entropy, Randomness, and Information Lecture 23 November 13, 2014 - PowerPoint PPT Presentation

CS 573: Algorithms, Fall 2014 Entropy, Randomness, and Information Lecture 23 November 13, 2014 Sariel (UIUC) CS573 1 Fall 2014 1 / 30 Part I Entropy Sariel (UIUC) CS573 2 Fall 2014 2 / 30 Quote If only once - only once - no

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

On the Security of Election Audits with Low Entropy Randomness Eric Rescorla ekr@rtfm.com

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &amp;

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Entropy and Shannon information Entropy and Shannon information For a random variable X with

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

From Polarized Targets to Polarized Ion Beams EIC Accelerator Collaboration Meeting 2019

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2020 ADMINISTRIVIA , posted

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

Entropy, Randomness, and Information Lecture 27 December 5, 2013 Sariel (UIUC) CS573 1 Fall

Ambipolar Diffusion Effects on the Weakly Ionized Turbulence Molecular Clouds UC-HIPACC: The

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment Frank Wolter University of

Reflections on nonstandard satisfaction Richard Kaye School of Mathematics, University of

Elements of European Fairy Tales revised 01.25.13 || English 1302: Composition II || D. Glen

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &