[PPT] - Average Redundancy of the Shannon Code for Markov Sources Neri PowerPoint Presentation

SLIDE 1

Average Redundancy of the Shannon Code for Markov Sources

Neri Merhav and Wojciech Szpankowski Technion and Purdue University May 27, 2013

NSF STC Center for Science of Information

AofA, Menorca 2013 Dedicated to PHILIPPE FLAJOLET

SLIDE 2

Outline

1. Source Coding
2. Redundancy: Known Sources
3. Shannon and Huffman Redundancy for Memoryless Sources
4. Shannon Coding Redundancy for Markov Sources

SLIDE 3

Source Coding

A source code is a bijective mapping C : A∗ → {0, 1}∗ from sequences over the alphabet A to set {0, 1}∗ of binary sequences. The basic problem of source coding (i.e., data compression) is to find codes with shortest descriptions (lengths) either on average or for individual sequences. Three Basic Types of Source Coding:

Fixed-to-Variable (FV) length codes (e.g., Huffman and Shannon codes).
Variable-to-Fixed (VF) length codes (e.g., Tunstall and Khodak codes).
Variable-to-Variable (VV) length codes (e.g., Khodak VV code).

SLIDE 4

Prefix Codes

Prefix code is such that no codeword is a prefix of another codeword. We write: P (x) be the probability of x ∈ A∗; L(C, x) be the code length for the source sequence x ∈ A∗; H(P ) = −

x∈A∗ P (x) lg P (x) for entropy.

Kraft’s Inequality A binary code is a prefix code iff the code lengths ℓ1, ℓ2, . . . , ℓN satisfy

2

lmax li – i

∑

2

lmax

< li lmax li –

N

i=1 2−ℓi ≤ 1.

Shannon First Theorem For any prefix code the average code length E[L(C, X)] cannot be smaller than the entropy of the source H(P ), that is, E[L(Cn, X)] ≥ H(P ). Exercise: There exists at least one sequence ˜ xn

1 such that L(˜

xn

1)

≥ − log2 P (˜ xn

1).

SLIDE 5

Redundancy

Known Source P . The pointwise redundancy R(x) and the average redundancy ¯ R: R(x) = L(C, x) + lg P (x) ¯ R = E[L(C, X)] − H(P ) ≥ 0 Optimal Code: min

L

x

L(x)P (x) subject to

x

2−L(x) ≤ 1. Solution: By Lagrangian multipliers we find Lopt(x) = − lg P (x). The smaller the redundancy is, the better (closer to the optimal) the code is.

SLIDE 6

Outline Update

1. Source Coding
2. Redundancy: Known Sources
3. Shannon and Huffman Redundancy for Memoryless Sources
4. Shannon Coding Redundancy for Markov Sources

SLIDE 7

Redundancy for Huffman’s Code

We consider fixed-to-variable length codes: Shannon & Huffman codes. For a known source P , we consider fixed length sequences xn

1 = x1 . . . xn.

Huffman Code: The following optimization problem ¯ Rn = min

Cn∈CExn 1 [L(Cn, xn 1) + log2 P (xn 1)].

is solved by Huffman’s code. We review first the average redundancy for a binary memoryless sources with p denoting the probability of generating “0” and q = 1 − p. In 1994 Stubley proposed the following for Huffman’s average redundancy ¯ RH

n = 2 − n

k=0

n k

pkqn−kαk + βn − 2

n

k=0

n k

pkqn−k2−αk+βn + o(1).

where α = log2 1 − p p

,

β = log2

1

1 − p

and x = x − ⌊x⌋ is the fractional part of x.

SLIDE 8

Main Result

Theorem 1 (W.S., 2000). Consider the Huffman block code of length n over a binary memoryless source with p < 1

2. Then as n → ∞

¯ RH

n =

    

3 2 − 1 ln 2 + o(1) ≈ 0.057304

α irrational

3 2 − 1 M

βMn − 1

2

−

1 M(1−2−1/M)2−nβM/M + O(ρn)

α = N

M

where N, M are integers such that gcd(N, M) = 1 and ρ < 1.

70 60 50 40 30 20 10 0.08 0.07 0.06 0.05 0.04 0.03 70 60 50 40 30 20 10 0.08 0.06 0.04 0.02

(a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational

α = log2(1 − p)/p with p = 1/π; (b) rational α = log2(1 − p)/p with p = 1/9.

SLIDE 9

Why Two Modes: Shannon Code

Consider the Shannon code that assigns the length L(CS

n, xn 1) = ⌈− lg P (xn 1)⌉

to the source sequence xn

1. Observe that

P (xn

1) = pk(1 − p)n−k

where p is known probability of generating 0 and k is the number of 0s. The Shannon code redundancy is ¯ RS

n

=

n

k=0

n k

pk(1 − p)n−k

⌈− log2(pk(1 − p)n−k)⌉ + log2(pk(1 − p)n−k)

=

1 −

n

k=0

n k

pk(1 − p)n−kαk + βn

where x = x − ⌊x⌋ is the fractional part of x, and α = log2 1 − p p

,

β = log2

1

1 − p

.

SLIDE 10

Sketch of Proof

We need to understand asymptotic behavior of the following sum (cf. Bernoulli distributed sequences modulo 1)

n

k=0

n k

pk(1 − p)n−kf(αk + y)

for fixed p and some Riemann integrable function f : [0, 1] → R. Lemma 1. Let 0 < p < 1 be a fixed real number and α be an irrational

number. Then for every Riemann integrable function f : [0, 1] → R

lim

n→∞ n

k=0

n k

pk(1 − p)n−kf(αk + y) =

1 f(t) dt, where the convergence is uniform for all shifts y ∈ R. Lemma 2. Let α = N

M be a rational number with gcd(N, M) = 1. Then for

bounded function f : [0, 1] → R

n

k=0

n k

pk(1 − p)n−kf(αk + y) = 1

M

M−1

l=0

f l M + My M

+ O(ρn)

uniformly for all y ∈ R and some ρ < 1.

SLIDE 11

Shannon Redundancy – Rational Case

Assume α = N/M where gcd(N, M) = 1. Denote pn,k = n

k

pkqn−k.

Sn =

n

k=0

n k

pkqn−k
k N

M + βn

=

M−1

ℓ=0
m: k=ℓ+mM≤n

pn,k

ℓ N

M + N + βn

=

M−1

ℓ=0

ℓ M + βn

m: k=ℓ+mM≤n

pn,k. Lemma 3. For fixed ℓ ≤ M and M, there exist ρ < 1 such that

m: k=ℓ+mM≤n

n k

pk(1 − p)n−k = 1

M + O(ρn).

Proof. Let ωk = e2πik/M for k = 0, 1, . . . , M − 1 be the Mth root of

unity. 1 M

M−1

k=0

ωn

k =

1 if M|n

therwise.

where M|n means that M divides n. Then

m: k=ℓ+mM≤n

n k

pkqn−k = 1 + (pω1 + q)n−ℓ + .. + (pωM−1 + q)n−ℓ

M = 1 M + O(ρn), since |(pωr + q)| = p2 + q2 + 2pq cos(2πr/M) < 1 for r = 0.

SLIDE 12

Finishing the Rational Case

We shall use the following Fourier series; for real x x = 1 2 −

∞

m=1

sin 2πmx mπ = 1 2 −

m∈Z−{0}

cme2πimx, cm = − i 2πm, Continuing the derivation and using the above lemma we obtain Sn = 1 M

M−1

ℓ=0

 1 2 −

m=0

cme2πim(ℓ/M+βn)   = 1 2 −

m=0

cme2πimnβ 1 M

M−1

ℓ=0

e2πim ℓ

M

= 1 2 − 1 M

m=kM=0

ckMe2πikMβn = 1 2 − 1 M 1 2 − βnM

.

SLIDE 13

Outline Update

1. Source Coding
2. Redundancy: Known Sources
3. Shannon and Huffman Redundancy for Memoryless Sources
4. Shannon Coding Redundancy for Markov Sources

SLIDE 14

Markov Sources

Source sequence X1, X2, . . .,

ver alphabet A

= {1, 2, . . . , r} is generated by a first–order Markov chain with a given matrix P = {p(j|k)}r

j,k=1.

with initial state probabilities pk, k = 1, 2, . . . , r; stationary state probabilities πk, k = 1, 2, . . . , r. For xn = (x1, . . . , xn) ∈ An under the given Markov source, is P (xn) = px1

n

t=2

p(xt|xt−1). The average redundancy of the Shannon code is defined as Rn = E[⌈− log P (Xn)⌉ + log P (Xn)] = E[̺(− log P (Xn))]. ̺(u) = ⌈u⌉ − u.

SLIDE 15

Main Result for Markov Sources

Theorem 2 (Merhav & W.S.). Consider the Shannon code of length n for an aperiodic and irreducible Markov source. Define αjk = log p(j|1)p(j|j) p(k|1)p(j|k)

,

j, k ∈ {1, 2, . . . , r}. (a) If not all {αjk} are rational, then Rn = 1 2 + o(1). (b) If all {αjk} are rational, then let ζjk(n) = M[−(n − 1) log p(1|1) + log p(j|1) − log p(k|1) − log pj], and Ωn = 1 2

1 − 1

M

+ 1

M

r

j=1

r

k=1

pjπk̺[ζjk(n)], M is the smallest common multiple of the denominators of {αjk}. Then, there exists a positive sequence ξn → 0 Rn ≤ Ωn + 1 M

r

j=1

r

k=1

pjπkI{̺[ζjk(n)] / ∈ (ξn, 1 − ξn)} + o(1), Rn ≥ Ωn − 1 M

r

j=1

r

k=1

pjπkI{̺[ζjk(n)] / ∈ (ξn, 1 − ξn)} − o(1).

SLIDE 16

Sketch of Proof

1. We note that ρ(u) has the following Fourier series expansion

̺(u) = 1 2 +

m=0

ame2πimu, am = 1 2πm where am·k = am/k for integers m, k.

2. Since Rn = E[̺(log P (Xn))] (for aperiodic irreducible MC) we have

Rn = 1 2 +

m=0

amE[e−2πim log P(Xn)]. which we can re-write as Rn = 1 2 +

m=0

am

x∈An

n

t=1

p(xt|xt−1) exp [−2πim log p(xt|xt−1)] since P (xn) = px1 n

t=2 p(xt|xt−1).

SLIDE 17

Continuation

3. Define for j, k = 1, . . . , r

ajk(m) = p(k|j) exp [−2πim log p(k|j)] , Am = [ajk(m)]r

j,k=1,

and r–dimensional column vectors: cm = (p1 exp[−2πim log p1)], . . . , pr exp[−2πim log pr])T, and 1 = (1, 1, . . . , 1)T. Then Rn = 1 2 +

m=0

am · cT

mAn−1 m

1.

4. Let li,m and ri,m be, respectively, the left eigenvector and the right

eigenvector pertaining to the eigenvalue λi,m of the matrix Am such that ρ(Am) := |λ1,m| ≥ |λ2,m| ≥ · · · |λr,m|. By the spectral representation of matrices An−1

m

1 =

r

i=1

λn−1

i,m · (lT i,m, 1) · ri,m,

leading to Rn = 1 2 +

m=0

am ·

r

i=1

λn−1

i,m · (lT i,m, 1) · (cT m, ri,m).

SLIDE 18

Conclusions

5. If (a) all eigenvalues λi,m < 1, then by Fejer’s theorem

Rn → 1 2. (b) If some (largest) λi,m = 1, we have oscillatory provided ρ(Am) = ρ(P ) = 1. Lemma 4. Let F = {fkj} and G = {gkj} be two r × r matrices. Assume that F is a real, non–negative and irreducible matrix, G is a complex matrix, and fkj = |gkj|, k, j ∈ {1, 2, . . . , r}. Then ρ(G) = ρ(F ) if and only if there exist real numbers s, and w1, . . . , wr such that G = e2πisDF D−1, where D = diag{e2πiw1, . . . , e2πiwr}. Thus ρ(Am) = ρ(P ) = 1 if and only if there exist s and w1, . . . wr: −m log p(j|k) = (s + wk − wj) mod 1, j, k = 1, . . . , r, where x = y mod 1 means x = y.

SLIDE 19

Extensions

We can extend our Theorem to irreducible and periodic Markov chains, but not further than this as example below shows. Example 2. Reducible Markov source. Consider the case r = 2 with the following transition matrix P = 1 − α α 1

.

Assume also that p1 = 1 and p2 = 0. Direct computation shows that Rn =

∞

k=0

α(1 − α)k̺[− log α − k log(1 − α)] + o(1). We see then that there is no oscillatory mode in this case, as Rn always tends to a constant that depends on α.

SLIDE 20