basic definitions and facts
play

Basic Definitions and Facts Iftach Haitner Tel Aviv University. - PowerPoint PPT Presentation

Application of Information Theory, Lecture 1 Basic Definitions and Facts Iftach Haitner Tel Aviv University. October 28, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 1 / 12 The entropy function X


  1. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  2. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  3. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  4. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  5. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  6. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  7. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  8. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  9. Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

  10. Axiomatic derivation of the entropy function Any other choices for defining entropy? Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  11. Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  12. Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  13. Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  14. Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  15. Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms. We prove (assuming additional axiom) that H is the Shannon function. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

  16. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  17. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  18. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  19. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  20. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  21. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  22. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  23. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  24. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k � ) + ) S k S k S k S k S k S i / S k i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  25. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 k H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k ) = 1 S i h ( p i � � ) + ) (2) S k S k S k S k S k S i / S k S k S i i = 2 i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  26. Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 k H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k ) = 1 S i h ( p i � � ) + ) (2) S k S k S k S k S k S i / S k S k S i i = 2 i = 2 Claim follows by combining the above equations. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

  27. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  28. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  29. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  30. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  31. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  32. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  33. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  34. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  35. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = ◮ f ( mn ) = f ( m ) + f ( n ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  36. Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = ◮ f ( mn ) = f ( m ) + f ( n ) ⇒ f ( m k ) = kf ( m ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

  37. f ( m ) = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  38. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  39. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  40. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  41. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  42. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  43. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  44. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n = ⇒ f ( 3 ) = log 3. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  45. f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n = ⇒ f ( 3 ) = log 3. ◮ Proof extends to any integer (not only 3) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

  46. H ( p , q ) = − p log p − q log q Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  47. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  48. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  49. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  50. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  51. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  52. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  53. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) − p log m k − q log m − k = = − p log p − q log q m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  54. H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) − p log m k − q log m − k = = − p log p − q log q m ◮ By continuity axiom, holds for every p , q . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

  55. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  56. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  57. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  58. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  59. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  60. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  61. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  62. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  63. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m = − p 1 log p 1 − p 2 log p 2 − p 3 log p 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  64. H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m = − p 1 log p 1 − p 2 log p 2 − p 3 log p 3 ◮ By continuity axiom, holds for every p 1 , p 2 , p 3 . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

  65. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  66. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  67. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  68. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  69. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  70. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  71. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  72. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  73. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  74. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. ◮ Hence, H ( p 1 , . . . , p m ) = � p i ≤ log � i p i log 1 i p i 1 p i = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  75. 0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. ◮ Hence, H ( p 1 , . . . , p m ) = � p i ≤ log � i p i log 1 i p i 1 p i = log m ◮ Alternatively, for X over { 1 , . . . , m } , 1 1 H ( X ) = E X log P X ( X ) ≤ log E X P X ( X ) = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

  76. H ( g ( X )) ≤ H ( X ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

  77. H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

  78. H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

  79. H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

  80. H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend