Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang - - PowerPoint PPT Presentation

chapter 4 entropy rates of a stochastic process
SMART_READER_LITE
LIVE PREVIEW

Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang - - PowerPoint PPT Presentation

Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 4 Entropy Rates of a Stochastic Process 4.1 Markov Chains 4.2 Entropy Rate 4.3 Example: Entropy


slide-1
SLIDE 1

Chapter 4 Entropy Rates of a Stochastic Process

Peng-Hua Wang

Graduate Inst. of Comm. Engineering National Taipei University

slide-2
SLIDE 2

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 2/13

Chapter Outline

  • Chap. 4 Entropy Rates of a Stochastic Process

4.1 Markov Chains 4.2 Entropy Rate 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph 4.4 Second Law of Thermodynamics 4.5 Functions of Markov Chains

slide-3
SLIDE 3

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 3/13

4.1 Markov Chains

slide-4
SLIDE 4

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 4/13

Stationary

Definition (Stationary) A stochastic process is said to be stationary if

Pr{X1 = x1, X2 = x2, . . . , Xn = xn} = Pr{X1+ℓ = x1, X2+ℓ = x2, . . . , Xn+ℓ = xn}

for every n and every shift ℓ.

■ the joint distribution of any subset of the sequence of random

variables is invariant with respect to shifts in the time index.

slide-5
SLIDE 5

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 5/13

Markov chain

Definition (Markov chain) A discrete stochastic process X1, X2, . . . is said to be a Markov chain or a Markov process if for n = 1, 2, . . . ,

Pr{Xn+1 = xn+1|Xn = xn, Xn−1 = xn−1, . . . , X1 = x1} = Pr{Xn+1 = xn+1|Xn = xn}.

■ The joint pmf can be written as

p(x1, x2, . . . , xn) = p(x1)p(x2|x1)p(x3|x2) · · · p(xn|xn−1).

Definition (Time invariant) The Markov chain is said to be time invariant if the transition probability p(xn+1|xn),

Pr{Xn+1 = b|Xn = a} = Pr{X2 = b|X1 = a}

for all a, b ∈ X.

slide-6
SLIDE 6

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 6/13

Markov chain

■ We will assume that the Markov chain is time invariant. ■ Xn is called the state at time n. ■ A time invariant Markov chain is characterized by its initial state and a

probability transition matrix P = [Pij], i, j ∈ {1, 2, . . . , m}, where

Pi,j = Pr{Xn+1 = j|Xn = i}.

■ The pmf at time n + 1 is

p(xn+1) =

  • xn

p(xn)Pxnxn+1

■ A distribution on the states such that the distribution at time n + 1 is

the same as the distribution at time n is called a stationary distribution.

slide-7
SLIDE 7

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 7/13

Example 4.1.1

Consider a two-state Markov chain with a probability transition matrix

P =

  • 1 − α

α β 1 − β

  • .

Find its stationary distribution and entropy.

  • Solution. Let µ1, µ2 be the stationary distribution.

µ1 = µ1(1 − α) + µ2β µ2 = µ1α + µ2(1 − β)

and

µ1 + µ2 = 1.

slide-8
SLIDE 8

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 8/13

4.2 Entropy Rate

slide-9
SLIDE 9

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 9/13

Entropy Rate

Definition (Entropy Rate) The entropy of a random process {Xi} is defined by

H(X) = lim

n→∞

1 nH(X1, X2, . . . , Xn).

Definition (Conditional Entropy Rate) The entropy of a random process {Xi} is defined by

H′(X) = lim

n→∞ H(Xn|X1, X2, . . . , Xn−1).

slide-10
SLIDE 10

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 10/13

Entropy Rate

■ If X1, X2, . . . are i.i.d. random variables. Then

H(X) = lim

n→∞

H(X1, X2, . . . , Xn) n = lim nH(X1) n = H(X1).

■ If X1, X2, . . . are independent but not identical distributed

H(X) = lim

n→∞

1 n

n

  • i=1

H(Xi).

■ We can choose a sequence of distributions on X1, X2 . . . such that

the limit does not exist.

slide-11
SLIDE 11

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 11/13

Entropy Rate

Theorem 4.2.2 For a stationary stochastic process,

H(Xn|Xn−1, . . . , X1) is nonincreasing in n and has a limit H′(X).

Proof.

H(Xn+1|X1, X2, . . . , Xn) ≤H(Xn+1|X2, . . . , Xn) (conditioning reduce entropy) =H(Xn|X1, . . . , Xn−1) (stationary)

Since H(Xn|Xn−1, . . . , X1) is nonnegative and decreasinging, it has a limit H′(X).

slide-12
SLIDE 12

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 12/13

Entropy Rate

Theorem 4.2.1 For a stationary stochastic process, both H(X) and

H′(X) exist and are equal. H(X) = H′(X).

  • Proof. By the chain rule,

1 nH(X1, X2, . . . , Xn) = 1 n

n

  • i=1

H(Xi|, Xi−1, . . . , X1),

that is, the entropy rate is the time average of the conditional entropies. Since the conditional entropies has a limit H′(X). We conclude that the entropy rate has the same limit by Theorem of Ces´ aro mean.

slide-13
SLIDE 13

Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 13/13

Ces´ aro mean

Theorem (Ces´ aro mean) If an → a and bn = 1

n

n

i=1 ai, then

bn → a.

  • Proof. Let ǫ > 0. Since an → a, there exists a number N such that

|an − a| ≤ ǫ for n > N. Hence, |bn − a| =

  • 1

n

n

  • i=1

(ai − a)

  • ≤ 1

n

n

  • i=1

|(ai − a)| ≤ 1 n

N

  • i=1

|(ai − a)| + n − N n ǫ ≤ 1 n

N

  • i=1

|(ai − a)| + ǫ ≤ ǫ

when n is large enough.