Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate - - PowerPoint PPT Presentation

chapter 3 asymptotic equipartition property
SMART_READER_LITE
LIVE PREVIEW

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate - - PowerPoint PPT Presentation

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 3 Asymptotic Equipartition Property 3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of


slide-1
SLIDE 1

Chapter 3 Asymptotic Equipartition Property

Peng-Hua Wang

Graduate Inst. of Comm. Engineering National Taipei University

slide-2
SLIDE 2

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 2/17

Chapter Outline

  • Chap. 3 Asymptotic Equipartition Property

3.1 Asymptotic Equipartition Property Theorem 3.2 Consequences of the AEP: Data Compression 3.3 High-Probability Sets and Typical Set

slide-3
SLIDE 3

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 3/17

3.1 Asymptotic Equipartition Property Theorem

slide-4
SLIDE 4

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 4/17

Definition of convergence

Given a sequence of random variables, X1, X2, . . . we say that the sequence X1, X2, . . . converges to a random variable X

■ In probability if for every ǫ > 0,

lim

n→∞ Pr {|Xn − X| > ǫ} = 0

  • r, equivalently,

lim

n→∞ Pr {|Xn − X| < ǫ} = 1

slide-5
SLIDE 5

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 5/17

Definition of convergence

■ In mean square if

lim

n→∞ E[|Xn − X|2] = 0 ■ With probability 1 or called almost surely if

Pr

  • lim

n→∞ Xn = X

  • = 1
slide-6
SLIDE 6

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 6/17

Weak law of large numbers

For i.i.d. random variables X1, X2, . . . , Xn with common mean m, we have

1 n

n

  • i=1

Xi → m

in probability. That is, for any ǫ > 0,

lim

n→∞ Pr

  • 1

n

n

  • i=1

Xi − m

  • > ǫ
  • = 0
slide-7
SLIDE 7

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 7/17

AEP

Theorem 3.1.1 (AEP) If X1, X2, . . . are i.i.d. ∼ p(x), then

−1 n log p(X1, X2, . . . , Xn) → H(X)

in probability

  • Proof. Let Zi = − log p(Xi) be i.i.d. random variables. That is, Zi = − log p[Xi = x] if

Xi = x, we have E[Zi] = −

  • p[Xi = x] log p[Xi = x] = H(Xi) = H(X)

Now, by the weak law of large numbers,

1 n

  • i

Zi → H(X)

in probability

⇒ − 1 n

  • i

log p(Xi) → H(X)

in probability

⇒ − 1 n log p(X1, X2, . . . , Xn) → H(X)

in probability

slide-8
SLIDE 8

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 8/17

Interpretation of AEP

■ When n is sufficient large, p(X1, X2, . . . , Xn) = 2−nH(X) with

high probability.

■ For example, Let the random number Xi with probability

P[Xi = 1] = p and P[Xi = 0] = 1 − p = q. If X1, X2, . . . , Xn

are i.i.d.,

p(X1, X2, . . . , Xn) = p

Xiqn− Xi.

When n → ∞,

p(X1, X2, . . . , Xn) → pnpqnq = 2−nH.

It means that the number of 1’s in the sequence is close to np, and all such sequences have roughly the same probability 2−nH.

slide-9
SLIDE 9

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 9/17

Interpretation of AEP

■ Thus for large n we can divide the sequences X1, X2, . . . , Xn into

two types: the typical type consisting of sequences each with probability roughly 2−nH, and another type, consisting of other sequences.

slide-10
SLIDE 10

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 10/17

Typical set

Definition (Typical set) The typical set A(n)

ǫ

with respect to p(x) is the set of sequence (x1, x2, . . . , xn) ∈ X n with the property

2−n(H(X)+ǫ) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−ǫ).

Theorem 3.1.2

  • 1. If (x1, x2, . . . , xn) ∈ A(n)

ǫ

, then

H(X) − ǫ ≤ −1 n log p(x1, x2, . . . , xn) ≤ H(X) + ǫ

.

  • Proof. By the definition of typical set.

.

slide-11
SLIDE 11

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 11/17

Theorems

Theorem 3.1.2

  • 2. Pr{A(n)

ǫ } > 1 − ǫ for n sufficiently large.

  • Proof. This property follows directly from Theorem 3.1.1, since the

convergence in the mean can be written as

Pr

  • −1

n log p(X1, X2, . . . , Xn) − H(X)

  • < ǫ
  • > 1 − δ

Setting δ = ǫ, we obtain the desired result.

slide-12
SLIDE 12

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 12/17

Theorems

Theorem 3.1.2

  • 3. |A(n)

ǫ | ≤ 2n(H(X)+ǫ), where |A| denotes the number of elements in

the set A. Proof.

1 =

  • x∈X n

p(x) ≥

  • x∈A(n)

ǫ

p(x) ≥

  • x∈A(n)

ǫ

2−n[H(X)+ǫ] = 2−n[H(X)+ǫ]|A(n)

ǫ |

slide-13
SLIDE 13

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 13/17

Theorems

Theorem 3.1.2

  • 4. |A(n)

ǫ | ≥ (1 − ǫ)2n(H(X)−ǫ) for n sufficiently large.

  • Proof. For n sufficiently large, Pr{A(n)

ǫ } > 1 − ǫ, so that,

1 − ǫ < Pr{A(n)

ǫ } ≤

  • x∈A(n)

ǫ

2−n[H(X)−ǫ] = 2−n[H(X)−ǫ]|A(n)

ǫ |

slide-14
SLIDE 14

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 14/17

3.2 Consequences of the AEP: Data Compression

slide-15
SLIDE 15

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 15/17

Typical set and source coding

■ There are |X|n elements in the whole set. ■ There are |A(n) ǫ | ≈ 2n(H+ǫ) elements in the typical set. We need

n(H + ǫ) + 1 bits to encode these elements, and one addition bit to

indicate they are typical sequences.

■ There are |X|n − |A(n) ǫ | elements in the nontypical set. We can use

n log |X| + 1 bits to encode them, and one addition bit to indicate

they are non-typical sequences.

slide-16
SLIDE 16

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 16/17

Average length of codeword

E[l(Xn)] =

  • xn

p(xn)l(xn) =

  • xn∈A(n)

ǫ

p(xn)l(xn) +

  • xn∈[A(n)

ǫ

]c

p(xn)l(xn) ≤

  • xn∈A(n)

ǫ

p(xn)[n(H + ǫ) + 2] +

  • xn∈[A(n)]c

ǫ

p(xn)[n log |X | + 2] = Pr{A(n)

ǫ

}[n(H + ǫ) + 2] + Pr{[A(n)

ǫ

]c}[n log |X | + 2] ≤ n(H + ǫ) + ǫn log |X | + 2 = n(H + ǫ′)

where ǫ′ = ǫ + ǫ log |X| + 2

n

slide-17
SLIDE 17

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 3 - p. 17/17

Theorems

Theorem 3.2.1 Let Xn be i.i.d. ∼ p(x). Let ǫ > 0. Then there exists a code that maps sequences xn of length n into binary strings such that the mapping is one-to-one (and therefore invertible) and

E 1 nl(Xn)

  • ≤ H(X) + ǫ

for n sufficiently large.