Basic Definitions and Facts Iftach Haitner Tel Aviv University. - - PowerPoint PPT Presentation

basic definitions and facts
SMART_READER_LITE
LIVE PREVIEW

Basic Definitions and Facts Iftach Haitner Tel Aviv University. - - PowerPoint PPT Presentation

Application of Information Theory, Lecture 1 Basic Definitions and Facts Iftach Haitner Tel Aviv University. October 28, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 1 / 12 The entropy function X


slide-1
SLIDE 1

Application of Information Theory, Lecture 1

Basic Definitions and Facts

Iftach Haitner

Tel Aviv University.

October 28, 2014

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 1 / 12

slide-2
SLIDE 2

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-3
SLIDE 3

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX. The entropy of X is defined by: H(X) := −

  • x∈X

Pr[X = x] · log2 Pr[X = x] taking 0 log 0 = 0.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-4
SLIDE 4

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX. The entropy of X is defined by: H(X) := −

  • x∈X

Pr[X = x] · log2 Pr[X = x] taking 0 log 0 = 0.

◮ H(X) = −

x p(x) log p(x) = EX log 1 p(X) = EY=p(X) log 1 Y

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-5
SLIDE 5

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX. The entropy of X is defined by: H(X) := −

  • x∈X

Pr[X = x] · log2 Pr[X = x] taking 0 log 0 = 0.

◮ H(X) = −

x p(x) log p(x) = EX log 1 p(X) = EY=p(X) log 1 Y

◮ H(X) was introduced by Shannon as a measure for the uncertainty in X

— number of bits requited to describe X, information we don’t have about X.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-6
SLIDE 6

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX. The entropy of X is defined by: H(X) := −

  • x∈X

Pr[X = x] · log2 Pr[X = x] taking 0 log 0 = 0.

◮ H(X) = −

x p(x) log p(x) = EX log 1 p(X) = EY=p(X) log 1 Y

◮ H(X) was introduced by Shannon as a measure for the uncertainty in X

— number of bits requited to describe X, information we don’t have about X.

◮ When using the natural logarithm, the quantity is called nats (“natural")

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-7
SLIDE 7

The entropy function

X — Discrete random variable (finite number of values) over X with probability mass p = pX. The entropy of X is defined by: H(X) := −

  • x∈X

Pr[X = x] · log2 Pr[X = x] taking 0 log 0 = 0.

◮ H(X) = −

x p(x) log p(x) = EX log 1 p(X) = EY=p(X) log 1 Y

◮ H(X) was introduced by Shannon as a measure for the uncertainty in X

— number of bits requited to describe X, information we don’t have about X.

◮ When using the natural logarithm, the quantity is called nats (“natural") ◮ Entropy is a function of p (sometimes refers to as H(p)).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 2 / 12

slide-8
SLIDE 8

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-9
SLIDE 9

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-10
SLIDE 10

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-11
SLIDE 11

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-12
SLIDE 12

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-13
SLIDE 13

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-14
SLIDE 14

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-15
SLIDE 15

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-16
SLIDE 16

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-17
SLIDE 17

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1

3.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-18
SLIDE 18

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1

3.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-19
SLIDE 19

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-20
SLIDE 20

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-21
SLIDE 21

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-22
SLIDE 22

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-23
SLIDE 23

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-24
SLIDE 24

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-25
SLIDE 25

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q ◮ H(1, 0) = (0, 1) = 0 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-26
SLIDE 26

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q ◮ H(1, 0) = (0, 1) = 0 ◮ H( 1

2, 1 2) = 1

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-27
SLIDE 27

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q ◮ H(1, 0) = (0, 1) = 0 ◮ H( 1

2, 1 2) = 1

◮ h(p) := H(p, 1 − p) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-28
SLIDE 28

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q ◮ H(1, 0) = (0, 1) = 0 ◮ H( 1

2, 1 2) = 1

◮ h(p) := H(p, 1 − p) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-29
SLIDE 29

Examples

  • 1. X ∼ ( 1

2, 1 4, 1 4):

(i.e., for some x1 = x2 = x3, PX(x1) = 1

2, PX(x2) = 1 4, PX(x3) = 1 4)

H(X) = − 1

2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2.

  • 2. H(X) = H( 1

2, 1 4, 1 4).

  • 3. X is uniformly distributed over {0, 1}n:

H(X) = − 2n

i=1 1 2n log 1 2n = − log 1 2n = n.

◮ n bits are needed to describe X ◮ n bits are needed to create X

  • 4. X = X1, . . . , Xn where Xi’s iid over {0, 1}, with PXi(1) = 1
  • 3. H(X) =?
  • 5. X ∼ (p, q), p + q = 1

◮ H(X) = H(p, q) = −p log p − q log q ◮ H(1, 0) = (0, 1) = 0 ◮ H( 1

2, 1 2) = 1

◮ h(p) := H(p, 1 − p) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12

slide-30
SLIDE 30

Axiomatic derivation of the entropy function

Any other choices for defining entropy?

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-31
SLIDE 31

Axiomatic derivation of the entropy function

Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H(p, 1 − p) is continuous function of p. A2 Normalization: H( 1

2, 1 2) = 1

A3 Grouping axiom: H(p1, p2, . . . , pm) = H(p1 + p2, p3, . . . , pm) + (p1 + p2)H(

p1 p1+p2 , p2 p1+p2 )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-32
SLIDE 32

Axiomatic derivation of the entropy function

Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H(p, 1 − p) is continuous function of p. A2 Normalization: H( 1

2, 1 2) = 1

A3 Grouping axiom: H(p1, p2, . . . , pm) = H(p1 + p2, p3, . . . , pm) + (p1 + p2)H(

p1 p1+p2 , p2 p1+p2 )

Why A3?

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-33
SLIDE 33

Axiomatic derivation of the entropy function

Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H(p, 1 − p) is continuous function of p. A2 Normalization: H( 1

2, 1 2) = 1

A3 Grouping axiom: H(p1, p2, . . . , pm) = H(p1 + p2, p3, . . . , pm) + (p1 + p2)H(

p1 p1+p2 , p2 p1+p2 )

Why A3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-34
SLIDE 34

Axiomatic derivation of the entropy function

Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H(p, 1 − p) is continuous function of p. A2 Normalization: H( 1

2, 1 2) = 1

A3 Grouping axiom: H(p1, p2, . . . , pm) = H(p1 + p2, p3, . . . , pm) + (p1 + p2)H(

p1 p1+p2 , p2 p1+p2 )

Why A3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-35
SLIDE 35

Axiomatic derivation of the entropy function

Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H(p, 1 − p) is continuous function of p. A2 Normalization: H( 1

2, 1 2) = 1

A3 Grouping axiom: H(p1, p2, . . . , pm) = H(p1 + p2, p3, . . . , pm) + (p1 + p2)H(

p1 p1+p2 , p2 p1+p2 )

Why A3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms. We prove (assuming additional axiom) that H is the Shannon function.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12

slide-36
SLIDE 36

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-37
SLIDE 37

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-38
SLIDE 38

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-39
SLIDE 39

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-40
SLIDE 40

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-41
SLIDE 41

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-42
SLIDE 42

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1) = H(S3, p4, . . . , pm) + S3h( p3 S3 ) + S2h( p2 S2 )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-43
SLIDE 43

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1) = H(S3, p4, . . . , pm) + S3h( p3 S3 ) + S2h( p2 S2 ) . . . = H(Sk, pk+1, . . . , pm) +

k

  • i=2

Sih( pi Si )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-44
SLIDE 44

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1) = H(S3, p4, . . . , pm) + S3h( p3 S3 ) + S2h( p2 S2 ) . . . = H(Sk, pk+1, . . . , pm) +

k

  • i=2

Sih( pi Si ) Hence, H( p1 Sk , . . . , pk Sk ) = H(Sk−1 Sk , pk Sk ) +

k−1

  • i=2

Si Sk h( pi/Sk Si/Sk )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-45
SLIDE 45

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1) = H(S3, p4, . . . , pm) + S3h( p3 S3 ) + S2h( p2 S2 ) . . . = H(Sk, pk+1, . . . , pm) +

k

  • i=2

Sih( pi Si ) Hence, H( p1 Sk , . . . , pk Sk ) = H(Sk−1 Sk , pk Sk ) +

k−1

  • i=2

Si Sk h( pi/Sk Si/Sk ) = 1 Sk

k

  • i=2

Sih( pi Si ) (2)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-46
SLIDE 46

Generalization of the grouping axiom

Fix p = (p1, . . . , pm) and let Sk = k

i=1 pi.

Grouping axiom: H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2H( p1

S2 , p2 S2 ).

Claim 1 (Generalized grouping axiom) H(p1, p2, . . . , pm) = H(Sk, pk+1, . . . , pm) + Sk · H( p1

Sk , . . . , pk Sk )

Proof: Let h(q) = H(q, 1 − q). H(p1, p2, . . . , pm) = H(S2, p3, . . . , pm) + S2h( p2 S2 ) (1) = H(S3, p4, . . . , pm) + S3h( p3 S3 ) + S2h( p2 S2 ) . . . = H(Sk, pk+1, . . . , pm) +

k

  • i=2

Sih( pi Si ) Hence, H( p1 Sk , . . . , pk Sk ) = H(Sk−1 Sk , pk Sk ) +

k−1

  • i=2

Si Sk h( pi/Sk Si/Sk ) = 1 Sk

k

  • i=2

Sih( pi Si ) (2) Claim follows by combining the above equations.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12

slide-47
SLIDE 47

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-48
SLIDE 48

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-49
SLIDE 49

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-50
SLIDE 50

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-51
SLIDE 51

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) =

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-52
SLIDE 52

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) =

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-53
SLIDE 53

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) = 2f(3) = 2H( 1

3, 1 3, 1 3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-54
SLIDE 54

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) = 2f(3) = 2H( 1

3, 1 3, 1 3)

= ⇒ f(3n) = nf(3).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-55
SLIDE 55

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) = 2f(3) = 2H( 1

3, 1 3, 1 3)

= ⇒ f(3n) = nf(3).

◮ f(mn) = f(m) + f(n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-56
SLIDE 56

Further generalization of the grouping axiom

Let 1 = k1 < k2 < . . . < kq < m and let Ct = kt+1−1

i=kt

pi (letting kq+1 = m + 1). Claim 2 (Generalized++ grouping axiom) H(p1, p2, . . . , pm) = H(C1, . . . , Cq) + C1 · H( p1

C1 , . . . , pk2−1 C1 ) + . . . + Cq · H( pkq+1 Cq , . . . , pm Cq )

Proof: Follow by the extended group axiom and the symmetry of H Implication: Let f(m) = H( 1 m, . . . , 1 m

  • m

)

◮ f(32) = 2f(3) = 2H( 1

3, 1 3, 1 3)

= ⇒ f(3n) = nf(3).

◮ f(mn) = f(m) + f(n)

= ⇒ f(mk) = kf(m)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12

slide-57
SLIDE 57

f(m) = log m

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-58
SLIDE 58

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-59
SLIDE 59

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-60
SLIDE 60

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-61
SLIDE 61

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-62
SLIDE 62

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1). ◮ By grouping axiom, k < nf(3) < k + 1.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-63
SLIDE 63

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1). ◮ By grouping axiom, k < nf(3) < k + 1.

= ⇒

⌊n log 3⌋ n

< f(3) < ⌊n log 3⌋+1

n

for any n ∈ N

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-64
SLIDE 64

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1). ◮ By grouping axiom, k < nf(3) < k + 1.

= ⇒

⌊n log 3⌋ n

< f(3) < ⌊n log 3⌋+1

n

for any n ∈ N = ⇒ f(3) = log 3.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-65
SLIDE 65

f(m) = log m

We give a proof under the additional axiom A4 f(m) < f(m + 1) (you can Google for a proof using only A1–A3)

◮ For n ∈ N let k = ⌊n log 3⌋. ◮ By A4, f(2k) < f(3n) < f(2k+1). ◮ By grouping axiom, k < nf(3) < k + 1.

= ⇒

⌊n log 3⌋ n

< f(3) < ⌊n log 3⌋+1

n

for any n ∈ N = ⇒ f(3) = log 3.

◮ Proof extends to any integer (not only 3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12

slide-66
SLIDE 66

H(p, q) = −p log p − q log q

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-67
SLIDE 67

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-68
SLIDE 68

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-69
SLIDE 69

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-70
SLIDE 70

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k). ◮ Hence,

H(p, q) = log m − p log k − q log(m − k)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-71
SLIDE 71

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k). ◮ Hence,

H(p, q) = log m − p log k − q log(m − k)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-72
SLIDE 72

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k). ◮ Hence,

H(p, q) = log m − p log k − q log(m − k) = p(logm − logk) + q(log m − log(m − k))

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-73
SLIDE 73

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k). ◮ Hence,

H(p, q) = log m − p log k − q log(m − k) = p(logm − logk) + q(log m − log(m − k)) = −p log m k − q log m − k m = −p log p − q log q

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-74
SLIDE 74

H(p, q) = −p log p − q log q

◮ For rational p, q, let p = k

m and q = m−k m , where m is the smallest

common multiplier.

◮ By grouping axiom, f(m) = H(p, q) + p · f(k) + q · f(m − k). ◮ Hence,

H(p, q) = log m − p log k − q log(m − k) = p(logm − logk) + q(log m − log(m − k)) = −p log m k − q log m − k m = −p log p − q log q

◮ By continuity axiom, holds for every p, q.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12

slide-75
SLIDE 75

H(p1, p2, . . . , pm) = − m

i −pi log pi

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-76
SLIDE 76

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-77
SLIDE 77

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-78
SLIDE 78

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-79
SLIDE 79

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-80
SLIDE 80

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3) ◮ Hence,

H(p1, p2, p3) = log m − p1 log k1 − p2 log k2 − p3 log k3

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-81
SLIDE 81

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3) ◮ Hence,

H(p1, p2, p3) = log m − p1 log k1 − p2 log k2 − p3 log k3

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-82
SLIDE 82

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3) ◮ Hence,

H(p1, p2, p3) = log m − p1 log k1 − p2 log k2 − p3 log k3 = −p1 log k1 m − p2 log k2 m − p3 k3 m

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-83
SLIDE 83

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3) ◮ Hence,

H(p1, p2, p3) = log m − p1 log k1 − p2 log k2 − p3 log k3 = −p1 log k1 m − p2 log k2 m − p3 k3 m = −p1 log p1 − p2 log p2 − p3 log p3

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-84
SLIDE 84

H(p1, p2, . . . , pm) = − m

i −pi log pi

We prove for m = 3. Proof for arbitrary m follows the same lines.

◮ For rational p1, p2, p3, let p1 = k1

m , q = k2 m and p3 = k3 m , where

m = k1 + k2 + k3 is the smallest common multiplier.

◮ f(m) = H(p1, p2, p3) + p1f(k1) + p2f(k2) + p3f(k3) ◮ Hence,

H(p1, p2, p3) = log m − p1 log k1 − p2 log k2 − p3 log k3 = −p1 log k1 m − p2 log k2 m − p3 k3 m = −p1 log p1 − p2 log p2 − p3 log p3

◮ By continuity axiom, holds for every p1, p2, p3.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12

slide-85
SLIDE 85

0 ≤ H(p1, . . . , pm) ≤ log m

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-86
SLIDE 86

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-87
SLIDE 87

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-88
SLIDE 88

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-89
SLIDE 89

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-90
SLIDE 90

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-91
SLIDE 91

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2) = ⇒ (by induction) ∀ t1, . . . , tk, λ1, . . . , λk ∈ [0, 1] with

i λi = 1

  • i λif(λiti) ≤ f(

i λiti)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-92
SLIDE 92

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2) = ⇒ (by induction) ∀ t1, . . . , tk, λ1, . . . , λk ∈ [0, 1] with

i λi = 1

  • i λif(λiti) ≤ f(

i λiti)

= ⇒ (Jensen inequality): E f(X) ≤ f(E X) for any random variable X.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-93
SLIDE 93

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2) = ⇒ (by induction) ∀ t1, . . . , tk, λ1, . . . , λk ∈ [0, 1] with

i λi = 1

  • i λif(λiti) ≤ f(

i λiti)

= ⇒ (Jensen inequality): E f(X) ≤ f(E X) for any random variable X.

◮ log(x) is (strictly) concave for x > 0, since its second derivative (− 1

x2 ) is

always negative.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-94
SLIDE 94

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2) = ⇒ (by induction) ∀ t1, . . . , tk, λ1, . . . , λk ∈ [0, 1] with

i λi = 1

  • i λif(λiti) ≤ f(

i λiti)

= ⇒ (Jensen inequality): E f(X) ≤ f(E X) for any random variable X.

◮ log(x) is (strictly) concave for x > 0, since its second derivative (− 1

x2 ) is

always negative.

◮ Hence, H(p1, . . . , pm) =

i pi log 1 pi ≤ log i pi 1 pi = log m

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-95
SLIDE 95

0 ≤ H(p1, . . . , pm) ≤ log m

◮ Tight bounds

◮ H(p1, . . . , pm) = 0 for (p1, . . . , pm) = (1, 0, . . . , 0). ◮ H(p1, . . . , pm) = log m for (p1, . . . , pm) = ( 1

m, . . . , 1 m).

◮ Non negativity is clear. ◮ A function f is concave if ∀ t1, t2, λ ∈ [0, 1] ≤ 1

λf(t1) + (1 − λ)f(t2) ≤ f(λt1 + (1 − λ)t2) = ⇒ (by induction) ∀ t1, . . . , tk, λ1, . . . , λk ∈ [0, 1] with

i λi = 1

  • i λif(λiti) ≤ f(

i λiti)

= ⇒ (Jensen inequality): E f(X) ≤ f(E X) for any random variable X.

◮ log(x) is (strictly) concave for x > 0, since its second derivative (− 1

x2 ) is

always negative.

◮ Hence, H(p1, . . . , pm) =

i pi log 1 pi ≤ log i pi 1 pi = log m

◮ Alternatively, for X over {1, . . . , m},

H(X) = EX log

1 PX (X) ≤ log EX 1 PX (X) = log m

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12

slide-96
SLIDE 96

H(g(X)) ≤ H(X)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-97
SLIDE 97

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-98
SLIDE 98

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-99
SLIDE 99

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-100
SLIDE 100

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-101
SLIDE 101

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-102
SLIDE 102

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-103
SLIDE 103

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) =

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-104
SLIDE 104

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-105
SLIDE 105

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-106
SLIDE 106

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-107
SLIDE 107

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof:

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-108
SLIDE 108

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof: pX(X) = PY(Y).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-109
SLIDE 109

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof: pX(X) = PY(Y).

◮ If g is non-injective (over Supp(X)), then H(Y) < H(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-110
SLIDE 110

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof: pX(X) = PY(Y).

◮ If g is non-injective (over Supp(X)), then H(Y) < H(X).

Proof: ?

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-111
SLIDE 111

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof: pX(X) = PY(Y).

◮ If g is non-injective (over Supp(X)), then H(Y) < H(X).

Proof: ?

◮ H(X) = H(2X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-112
SLIDE 112

H(g(X)) ≤ H(X)

Let X be a random variable, and let g be over Supp(X) := {x : PX(x) > 0}.

◮ H(Y = g(X)) ≤ H(X).

Proof: H(X) = −

  • x

PX(x) log PX(x) = −

  • y
  • x : g(x)=y

PX(x) log PX(x) ≥ −

  • y

PY(y) max

x : g(x)=y log PX(x)

≥ −

  • y

PY(y) log PY(y) = H(Y)

◮ If g is injective, then H(Y) = H(X).

Proof: pX(X) = PY(Y).

◮ If g is non-injective (over Supp(X)), then H(Y) < H(X).

Proof: ?

◮ H(X) = H(2X). ◮ H(X) < H(cos(X)), if 0, 2π ∈ Supp(X).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12

slide-113
SLIDE 113

Notation

◮ [n] = {1, . . . , n}

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12

slide-114
SLIDE 114

Notation

◮ [n] = {1, . . . , n} ◮ PX(x) = Pr[X = x]

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12

slide-115
SLIDE 115

Notation

◮ [n] = {1, . . . , n} ◮ PX(x) = Pr[X = x] ◮ Supp(X) := {x : PX(x) > 0}

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12

slide-116
SLIDE 116

Notation

◮ [n] = {1, . . . , n} ◮ PX(x) = Pr[X = x] ◮ Supp(X) := {x : PX(x) > 0} ◮ For random variable X over X, let p(x) be its density function:

p(x) = PX(x).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12

slide-117
SLIDE 117

Notation

◮ [n] = {1, . . . , n} ◮ PX(x) = Pr[X = x] ◮ Supp(X) := {x : PX(x) > 0} ◮ For random variable X over X, let p(x) be its density function:

p(x) = PX(x). In other words, X ∼ p(x).

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12

slide-118
SLIDE 118

Notation

◮ [n] = {1, . . . , n} ◮ PX(x) = Pr[X = x] ◮ Supp(X) := {x : PX(x) > 0} ◮ For random variable X over X, let p(x) be its density function:

p(x) = PX(x). In other words, X ∼ p(x).

◮ For random variable Y over Y, let p(y) be its density function:

p(y) = PY(y)...

Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 12 / 12