Lecture 7 Lossy Source Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

lecture 7 lossy source coding
SMART_READER_LITE
LIVE PREVIEW

Lecture 7 Lossy Source Coding I-Hsiang Wang Department of - - PowerPoint PPT Presentation

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lecture 7 Lossy Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 2, 2015 1 / 39 I-Hsiang


slide-1
SLIDE 1

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem

Lecture 7 Lossy Source Coding

I-Hsiang Wang

Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw

December 2, 2015

1 / 39 I-Hsiang Wang IT Lecture 7

slide-2
SLIDE 2

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem

The Block-to-Block Source Coding Problem

Source Encoder Source Decoder Source Destination

s[1 : N] b[1 : K] b s[1 : N]

Recall: in Lecture 03, we investigated the fundamental limit of (almost) lossless block-to-block (or fixed-to-fixed) source coding. The recovery criterion is vanishing probability of error : lim

N→∞ P

{ SN ̸= SN} = 0. The minimum compression ratio to fulfill lossless reconstruction is the entropy rate of the source: R∗ = H ({Si} ) ,

for stationary and ergodic {Si}.

2 / 39 I-Hsiang Wang IT Lecture 7

slide-3
SLIDE 3

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem

The Block-to-Block Source Coding Problem

Source Encoder Source Decoder Source Destination

s[1 : N] b[1 : K] b s[1 : N]

In this lecture, we turn our focus to lossy block-to-block source coding, where the setting is the same as before, except The recovery criterion is reconstruction to with a given distortion D : lim sup

N→∞

E [ d ( SN, SN)] ≤ D. The minimum compression ratio to fulfill reconstruction to within a given distortion D is the rate-distortion function: R(D) = min

p

S|S: E[d(S,

S)]≤D

I ( S ; S ) ,

for DMS {Si}.

3 / 39 I-Hsiang Wang IT Lecture 7

slide-4
SLIDE 4

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem

Why lossy source coding? Sometimes it might be too expensive to reconstruct the source in a lossless way. Sometimes it is impossible to reconstruct the source losslessly. For example, if the source is continuous-valued, the entropy rate of the source is usually infinite! Lossy source coding has wide range of applications, including quantization/digitization of continuous-valued signals, image/video/audio compression, etc. In this lecture, we first focus on discrete memoryless sources (DMS). Then, we employ the discretization technique to extend the coding theorems from the discrete-source case to the continuous-source case. In particular, Gaussian sources will be our main focus.

4 / 39 I-Hsiang Wang IT Lecture 7

slide-5
SLIDE 5

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem

Lossless vs. Lossy Source Coding

The general lossy source coding problem involves quantizing all possible source sequences sN ∈ SN into 2K reconstruction sequences sN ∈ SN, which can be represented by K bits. The goal is to design the correspondence between sN and sN so that the distortion (quantization error) is below a prescribed level D. Lossy source coding has a couple of notable differences from lossless source coding: Source alphabet S and the reconstruction alphabet S could be different in general. Performance is determined by the chosen distortion measure.

5 / 39 I-Hsiang Wang IT Lecture 7

slide-6
SLIDE 6

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

6 / 39 I-Hsiang Wang IT Lecture 7

slide-7
SLIDE 7

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

7 / 39 I-Hsiang Wang IT Lecture 7

slide-8
SLIDE 8

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Distortion Measures

We begin with the definition of the distortion measure per symbol. Definition 1 (Distortion Measure) A per-symbol distortion measure is a mapping d (s, s) that maps from S × S to [0, ∞), and it is understood as the cost of representing s by s. For two length N sequences sN and sN, the distortion between them is defined as the average of the per-symbol distortion: d ( sN, sN) ≜ 1

N

∑N

i=1 d (si,

si) . Examples: below are two widely used distortion measures: Hamming distortion: S = S, d (s, s) ≜ 1 {s ̸= s}. Squared-error distortion: S = S = R, d (s, s) ≜ (s − s)2.

8 / 39 I-Hsiang Wang IT Lecture 7

slide-9
SLIDE 9

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Lossy Source Coding: Problem Setup

Source Encoder Source Decoder Source Destination

s[1 : N] b[1 : K] b s[1 : N]

1 A

( 2NR, N ) source code consists of

an encoding function (encoder) encN : SN → {0, 1}K that maps each source sequence sN to a bit sequence bK, where K ≜ ⌊NR⌋. a decoding function (decoder) decN : {0, 1}K → SN that maps each bit sequence bK to a reconstructed source sequence sN.

2 The expected distortion of the code D(N) ≜ E

[ d ( SN, SN)] .

3 A rate-distortion pair (R, D) is said to be achievable if there exist a

sequence of ( 2NR, N ) codes such that lim sup

N→∞

D(N) ≤ D. The optimal compression rate R(D) ≜ inf {R | (R, D) : achievable}.

9 / 39 I-Hsiang Wang IT Lecture 7

slide-10
SLIDE 10

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Rate Distortion Trade-off

Dmin ≜ min

  • s(s) E [d (S,

s (S))]

It denotes the minimum possible target distortion so that the rate is finite. Even the decoder knows the entire sN and finds a best representative sN ( sN) , the expected distortion is still Dmin.

Dmax ≜ min

  • s

E [d (S, s)]

Dmax Dmin

R D

Achievable Not Achievable H (S ) R (D) R (Dmin)

Let s∗ ≜ arg min

  • s

E [d (S, s)]. Then for target distortion D ≥ Dmax, we can use a single representative s∗ ≜ ( s∗, s∗, . . . , s∗) to reconstruct all sN ∈ SN (rate is 0!), and D(N) = E [ d ( SN, s∗)] = 1

N

∑N

i=1 E [d (Si,

s∗)] = Dmax ≤ D. Hence, R (D) = 0 for all D ≥ Dmax.

10 / 39 I-Hsiang Wang IT Lecture 7

slide-11
SLIDE 11

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Lossy Source Coding Theorem

Source Encoder Source Decoder Source Destination

s[1 : N] b[1 : K] b s[1 : N]

Theorem 1 (A Lossy Source Coding Theorem for DMS) For a discrete memoryless source {Si | i ∈ N}, R(D) = min

p

S|S: E[d(S,

S)]≤D

I ( S ; S ) . (1) Interpretation: H (S ) − H ( S

  • S

) = I ( S ; S )

Uncertainty

  • f source S

Uncertainty of S after learning S

=

The rate used in compressing S to S

11 / 39 I-Hsiang Wang IT Lecture 7

slide-12
SLIDE 12

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

12 / 39 I-Hsiang Wang IT Lecture 7

slide-13
SLIDE 13

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Properties of Rate Distortion Function

Dmax Dmin

R D

Achievable Not Achievable H (S ) R (D) R (Dmin)

A rate distortion function R (D) satisfies the following properties:

1 Nonnegative 2 Non-increasing in D. 3 Convex in D. 4 Continuous in D. 5 R (Dmin) ≤ H (S ). 6 R (D) = 0 if D ≥ Dmax.

These properties are all quite intuitive. Below we sketch the proof of these properties.

13 / 39 I-Hsiang Wang IT Lecture 7

slide-14
SLIDE 14

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Monotonicity Clear from the definition. Convexity The goal is to prove that D1, D2 ≥ Dmin and λ ∈ (0, 1), λ ≜ 1 − λ, R ( λD1 + λD2 ) ≤ λR (D1) + λR (D2) . Let pi ( s|s) ≜ arg min

p

S|S: E[d(S,

S)]≤Di

I ( S ; S ) , the optimizing conditional distribution that achieves distortion Di, for i = 1, 2. Let pλ ≜ λp1 + λp2. Under pλ ( s|s), the expected distortion between S and S ≤ λD1 + λD2, ∵ Ep(S)pλ(

s|s)

[ d ( S, S )] = ∑

s

  • s

p (s) [ λp1 ( s|s) + λp2 ( s|s) ] d (s, s) . Proof is complete since I ( S ; S ) is convex in pS|

S with a fixed pS:

R ( λD1 + λD2 ) ≤ I ( S ; S )

≤ λI ( S ; S )

p1

+ λI ( S ; S )

p2

= λR (D1) + λR (D2) .

14 / 39 I-Hsiang Wang IT Lecture 7

slide-15
SLIDE 15

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Nonnegativity Clear from the definition. Continuity It is well-known that convexity within an open interval implies continuity within that open interval.

Dmax Dmin

R D

violates convexity Dmax Dmin

R D

discontinuity is possible here violates convexity

Hence, the only point where R (D) might be discontinuous is at the boundary D = Dmin. The proof is technical and can be found in Gallager[2].

15 / 39 I-Hsiang Wang IT Lecture 7

slide-16
SLIDE 16

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Example: Bernoulli Source with Hamming Distortion

Source (binary) Si ∈ S = {0, 1}, and Si

i.i.d.

∼ Ber (p) ∀ i. Distortion (Hamming) d (s, s) = 1 {s ̸= s}. Example 1 Derive the rate distortion function of the Bernoulli p source with Hamming distortion and show that it is given by R (D) = { Hb (p) − Hb (D) , 0 ≤ D ≤ min (p, 1 − p) 0, D > min (p, 1 − p) . This is the first example about how to compute the rate distortion function, that is, how to solve (1) in the lossy source coding theorem.

16 / 39 I-Hsiang Wang IT Lecture 7

slide-17
SLIDE 17

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

  • sol. The first step is to identify Dmin and Dmax.

Dmin = 0 because one can choose s (s) = s. Dmax = min (p, 1 − p) because one can choose s = { p ≤ 1

2

1 p ≥ 1

2

.

The next step is to lower bound I ( S ; S ) = H (S ) − H ( S

  • S

) .

It is equivalent to upper bounding H ( S

  • S

) : H ( S

  • S

) = H ( S ⊕ S

  • S

) ≤ H ( S ⊕ S ) = Hb (q) , where we assume that S ⊕ S ∼ Ber (q) for some q ∈ [0, 1]. Observe that d ( S, S ) ≡ S ⊕

  • S. Hence, E

[ d ( S, S )] ≤ D = ⇒ q ≤ D. Since D ≤ Dmax ≤ 1

2, we see that Hb (q) is maximized when q = D.

Hence, I ( S ; S ) ≥ Hb (p) − Hb (D).

17 / 39 I-Hsiang Wang IT Lecture 7

slide-18
SLIDE 18

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Final step: show that the lower bound Hb (p) − Hb (D) can be attained.

The goal is to find a probability transition matrix p ( s|s) such that

  • S ⊥

⊥ S ⊕ S

so that H ( S ⊕ S

  • S

) = H ( S ⊕ S )

and P { S ⊕ S = 1 } = D. At first glance this looks hard.

The difficulty can be resolved via an auxiliary reverse channel.

Consider a channel with input S, output S, additive noise Z ∼ Ber (D)⊥ ⊥ S. S = S ⊕ Z = ⇒ Z = S ⊕ S. The reverse channel specifies the joint distribution p (s, s) and hence p ( s|s)!

1

  • S

S 1

D 1 − D α p 1 − α 1 − p D 1 − D

p = (1 − α)D + α(1 − D) = ⇒ α = p − D 1 − 2D

18 / 39 I-Hsiang Wang IT Lecture 7

slide-19
SLIDE 19

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Example: Gaussian Source with Squared Error Distortion

Source (Gaussian) Si ∈ S = R, and Si

i.i.d.

∼ N ( µ, σ2) ∀ i. Distortion (Squared Error) d (s, s) = |s − s|2. Example 2 Derive the rate distortion function of the Gaussian source with squared error distortion and show that it is given by R (D) =   

1 2 log

(

σ2 D

) , 0 ≤ D ≤ σ2 0, D > σ2 . Remark: Although the source is continuous, one can use weak typicality

  • r the discretization method used in channel coding to extend the lossy

source coding theorem from discrete memoryless sources to continuous. Note: In particular, note that R (0) = ∞, which is quite intuitive!

19 / 39 I-Hsiang Wang IT Lecture 7

slide-20
SLIDE 20

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

  • sol. First step: identify Dmin and Dmax.

Dmin = 0 because one can choose s (s) = s. Dmax = σ2 because one can choose s = µ, the mean of S.

Next step: lower bound I ( S ; S ) = h (S ) − h ( S

  • S

) .

It is equivalent to upper bounding h ( S

  • S

) : h ( S

  • S

) = h ( S − S

  • S

) ≤ h ( S − S ) ≤ 1 2 log (2πe D) , where the last inequality holds since Var [ S − S ] ≤ E [

  • S −

S

  • 2]

≤ D.

Hence, I ( S ; S ) ≥ 1

2 log

( 2πe σ2) − 1

2 log (2πe D) = 1 2 log

(

σ2 D

) .

20 / 39 I-Hsiang Wang IT Lecture 7

slide-21
SLIDE 21

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Final step: show that the lower bound 1

2 log

(

σ2 D

) can be attained.

The goal is to find a conditional distribution p ( s|s) such that

  • S⊥

⊥ ( S − S )

so that h ( S − S

  • S

) = h ( S − S )

and ( S − S ) ∼ N (0, D) .

Again, this can be done via an auxiliary reverse channel.

Consider a channel with input S, output S, additive noise Z ∼ N (0, D)⊥ ⊥ S. S = S + Z = ⇒ Z = S − S. The reverse channel specifies the joint distribution p (s, s) and hence p ( s|s)! X ∼ N

  • µ, σ2
  • X

Z ∼ N (0, D)

  • X ∼ N
  • µ, σ2 − D

  • X −

X

  • 21 / 39

I-Hsiang Wang IT Lecture 7

slide-22
SLIDE 22

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Lossy Source Coding Theorem Rate Distortion Function

Example: Source Alphabet ̸= Reconstruction Alphabet

Source (ternary) Si ∈ S = {0, ∗, 1}, and Si

i.i.d.

∼ pS ∀ i, where pS (0) = pS (1) = ε ≤ 1

2.

Reconstruction (binary)

  • S = {0, 1}.

Distortion d (s, s) = { 1 if s ̸= ∗ and s ̸= s if s = ∗ or s . In other words, there is a don’t-care symbol ∗, and S ̸= S. Example 3 (HW5) Derive the rate distortion function and show that it is given by R (D) = { 2ε ( 1 − Hb ( D

)) , 0 ≤ D ≤ ε 0, D > ε .

22 / 39 I-Hsiang Wang IT Lecture 7

slide-23
SLIDE 23

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

23 / 39 I-Hsiang Wang IT Lecture 7

slide-24
SLIDE 24

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

24 / 39 I-Hsiang Wang IT Lecture 7

slide-25
SLIDE 25

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Proof of the Converse of Theorem 1

We aim to show that for any sequence of ( 2NR, N ) source codes with lim sup

N→∞

D(N) ≤ D, the rate R must satisfy R ≥ R(D) (defined in (1)). We begin with similar steps as in lossless source coding (cf. Lecture 03). pf: Note that BK is a r.v. because it is generated by another r.v, SN. K = NR ≥ H ( BK ) ≥ I ( BK ; SN ) (a) ≥ I ( SN ; SN )

(b)

=

N

i=1

I ( Si ; SN

  • Si−1 ) (c)

=

N

i=1

I ( Si ; SN, Si−1 ) ≥

N

i=1

I ( Si ; Si ) (a) is due to SN − BK − SN and the data processing inequality. (b) is due to Chain Rule. (c) is due to Si ⊥ ⊥ Si−1 (memoryless source). So far, we have not yet used the condition on distortion.

25 / 39 I-Hsiang Wang IT Lecture 7

slide-26
SLIDE 26

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Further working on the inequality: NR ≥ ∑N

i=1 I

( Si ; Si )

(d)

≥ ∑N

i=1 R

( E [ d ( Si, Si )]) = N ∑N

i=1 1 NR

( E [ d ( Si, Si )])

(e)

≥ NR (∑N

i=1 1 NE

[ d ( Si, Si )]) = NR ( E [

1 N

∑N

i=1 d

( Si, Si )]) = NR ( E [ d ( SN, SN)]) = NR ( D(N)) . (d) is due to the definition of R (D) in (1). (e) is due to the convexity of R (D) and Jensen’s inequality. Hence, R ≥ lim sup

N→∞

R ( D(N))

(f)

≥ R ( lim sup

N→∞

D(N) )

(g)

≥ R (D). (f) is due to continuity of R (D). (g) is due to lim sup

N→∞

D(N) ≤ D and R (D) is non-increasing.

26 / 39 I-Hsiang Wang IT Lecture 7

slide-27
SLIDE 27

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Remarks

You might note that in the previous proof of converse, we do not make use of lower bounds on error probability such as Fano’s inequality. This is because that in our formulation of the lossy source coding problem, the reconstruction criterion is laid on the expected distortion. Instead of the criterion lim sup

N→∞

D(N) ≤ D where D(N) ≜ E [ d ( SN, SN)] , we could use a stronger criterion as follows: P(N,δ)

e

≜ P { d ( SN, SN) > D + δ } , δ > 0 (Probability of Error) lim

N→∞ P(N,δ) e

= 0, ∀ δ > 0 (Reconstruction Criterion) Under this stronger criterion, we can then give a new operational definition of the rate distortion function. It turns out Theorem 1 remains the same! (converse is implied by our converse)

27 / 39 I-Hsiang Wang IT Lecture 7

slide-28
SLIDE 28

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

1 Lossy Source Coding Theorem for Memoryless Sources

Lossy Source Coding Theorem Rate Distortion Function

2 Proof of the Coding Theorem

Converse Proof Achievability

28 / 39 I-Hsiang Wang IT Lecture 7

slide-29
SLIDE 29

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Idea of Constructing Good Source Code

Key in source coding:

1 Find a good set of representatives (quantization codewords). 2 For each source sequence, determine which codeword to be used.

Main tools we use so far in developing achievability of coding theorems:

1 Random coding: construct the codebook randomly and show that

at least one realization can achieve the desired target performance.

2 Typicality: help give bounds in performance analysis.

In the following, we prove the achievability part of Theorem 1 by

1

Random coding – show existence of good quantization codebook.

2

Typicality encoding – determine which codeword to be used.

29 / 39 I-Hsiang Wang IT Lecture 7

slide-30
SLIDE 30

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

SN

  • SN

Random Coding Typicality Encoding

30 / 39 I-Hsiang Wang IT Lecture 7

slide-31
SLIDE 31

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Proof Program

1 Random Codebook Generation:

Generate a random ensemble of quantization codebooks, each of which contains 2K codewords.

2 Analysis of Expected Distortion:

Goal: Show that lim sup

N→∞

EC,SN [ d ( SN, SN)] ≤ D, and conclude that there must exist a codebook c such that the expected distortion satisfies lim sup

N→∞

ESN [ d ( SN, SN)] ≤ D. Note that for a source sequence sN, the optimal encoder chooses an index w ∈ [ 1 : 2K] , that is, a codeword sN (w) in the codebook, so that d ( sN, sN (w) ) is minimized. However, similar to ML decoding in channel coding, such optimal encoder is hard to analyze. To simplify analysis, we shall introduce a suboptimal encoder based on typicality.

31 / 39 I-Hsiang Wang IT Lecture 7

slide-32
SLIDE 32

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Random Codebook Generation

Fix the conditional p.m.f. that attains R (

D 1+ε

) : q

S|S =

arg min

p

S|S: E[d(S,

S)]≤

D 1+ε

I ( S ; S ) (2) Based on the chosen q

S|S and the source distribution pS, calculate p S,

the marginal distribution of the reconstruction S. Generate 2K codewords { sN (w) | w = 1, 2, . . . , 2K} , i.i.d. according to p ( sN) = ∏N

i=1 p S (

si). In other words, if we think of the quantization codebook as a 2K × N matrix C, the elements of C will be i.i.d. distributed according to p

S.

Remark: observe the resemblance with the channel coding achievability.

32 / 39 I-Hsiang Wang IT Lecture 7

slide-33
SLIDE 33

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Encoding and Decoding

Encoding: unlike channel coding, the encoding process in source coding problem is usually much involved. We use typicality encoding: (resembling typicality decoding in channel coding) Given a source sequence sN, find an index w ∈ [ 1 : 2K] such that ( sN, sN (w) ) ∈ T (N)

ε

( pS,

S

) . Recall the joint distribution pS,

S = pS × q S|S as defined in (2).

If there is no or more than one such index, randomly pick one w ∈ [ 1 : 2K] . Send out the bit sequence that represent the chosen w. Decoding: Upon receiving the bit sequence representing w, generate the reconstructed sN (w) by looking up the quantization codebook.

33 / 39 I-Hsiang Wang IT Lecture 7

slide-34
SLIDE 34

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Analysis of Expected Distortion

Why typicality encoder? Typical average lemma (Lemma 2, Lecture 04): For any nonnegative function g (x) on X, if xn ∈ T (n)

ε

(X), then (1 − ε) E [g (X)] ≤ 1

n n

i=1

g (xi) ≤ (1 + ε) E [g (X)] . In analyzing EC,SN [ d ( SN, SN)] , we can then distinguish into two cases: E ≜ {( SN, SN) / ∈ T (N)

ε

} and Ec ≜ {( SN, SN) ∈ T (N)

ε

} : P {E} EC,SN [ d ( SN, SN)

  • E

] + P {Ec} EC,SN [ d ( SN, SN)

  • Ec]

≤ P {E} max

s, s d (s,

s) + P {Ec} (1 + ε) D 1 + ε ≤ P {E} max

s, s d (s,

s) + D. Hence, as long as P {E} vanishes as N → ∞, we are done.

34 / 39 I-Hsiang Wang IT Lecture 7

slide-35
SLIDE 35

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Analysis of Expected Distortion → Analysis of P {E}

With typicality encoding, analysis of expected distortion is made easy: just need to control P {E}, where E ≜ {( SN, SN) / ∈ T (N)

ε

} . Let us look at event E: it is the event that the reconstructed SN is not jointly typical with SN, which can only happen when none of the quantization codewords in the codebook is jointly typical with SN. Hence, E ⊆ ∩2K

w=1 Ac w, where Aw ≜

{( SN, SN(w) ) ∈ T (N)

ε

} . = ⇒ P {E} ≤ P {∩2K

w=1 Ac w

} . Unfortunately, the events { Ac

w | w = 1, . . . , 2K}

may not be mutually independent, because they all involve a common random sequence SN. However, for fixed sN, the events Ac

w

( sN) ≜ {( sN, SN(w) ) / ∈ T (N)

ε

} , w = 1, . . . , 2K, are indeed mutually independent!

35 / 39 I-Hsiang Wang IT Lecture 7

slide-36
SLIDE 36

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Analysis of P {E}, E ≜ {(SN, SN) / ∈ T (N)

ε

}

Motivated by the above observation, we give an alternative upper bound: P {E} ≤ ∑

sN∈SN p

( sN) P {∩2K

w=1 Ac w

( sN)} = ∑

sN∈SN p

( sN) ∏2K

w=1 P

{ Ac

w

( sN)} = ∑

sN∈SN p

( sN) ∏2K

w=1

( 1 − P { Aw ( sN)}) Question: Is there a way to lower bound P { Aw ( sN)} ≜ P {( sN, SN(w) ) ∈ T (N)

ε

( pS,

S

)} ? Yes – As long as sN ∈ T (N)

ε′

(pS) for some ε′ < ε, Lemma 1 (next slide) guarantees that P { Aw ( sN)} ≥ 2−N(I(S ;

S)+δ(ε)) for sufficiently large N,

where limε→0 δ(ε) = 0.

36 / 39 I-Hsiang Wang IT Lecture 7

slide-37
SLIDE 37

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Joint Typicality Lemma

The following lemma formally states the bounds.

(Proof is omitted – see Section 2.5 of ElGamal&Kim[6])

Lemma 1 (Joint Typicality Lemma) Consider a joint p.m.f. pX,Y = pX · pY|X = pY · pX|Y. Then, there exist δ(ε) > 0 with limε→0 δ(ε) = 0 such that:

1 For an arbitrary sequence xn and random Yn ∼ ∏n i=1 pY (yi),

P { (xn, Yn) ∈ T (n)

ε

(pX,Y) } ≤ 2−n(I(X ;Y )−δ(ε)).

2 For an ε′-typical sequence xn ∈ T (n) ε′

(pX) with ε′ < ε, and random Yn ∼ ∏n

i=1 pY (yi), for sufficiently large n,

P { (xn, Yn) ∈ T (n)

ε

(pX,Y) } ≥ 2−n(I(X ;Y )+δ(ε)).

37 / 39 I-Hsiang Wang IT Lecture 7

slide-38
SLIDE 38

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

Finalizing the Proof

Revoking Lemma 1, the additional condition that sN ∈ T (N)

ε′

(pS) for some ε′ < ε motivates us to split the upper bound on P {E} as follows: P {E} ≤ ∑

sN∈SN p

( sN) ∏2K

w=1

( 1 − P { Aw ( sN)}) ≤ ∑

sN / ∈T (N)

ε′

(pS)

p ( sN) + ∑

sN∈T (N)

ε′

(pS)

p ( sN) ∏2K

w=1

( 1 − P { Aw ( sN)}) ≤ P { SN / ∈ T (N)

ε′

(pS) } + ∑

sN∈T (N)

ε′

(pS)

p ( sN) ( 1 − 2−N(I(S ;

S)+δ(ε)))2K

≤ P { SN / ∈ T (N)

ε′

(pS) } + ( 1 − 2−N(I(S ;

S)+δ(ε)))2K

≤ P { SN / ∈ T (N)

ε′

(pS) } + exp ( −2K × 2−N(I(S ;

S)+δ(ε)))

. The last step is due to (1 − x)r ≤ e−rx for x ∈ [0, 1] and r ≥ 0.

38 / 39 I-Hsiang Wang IT Lecture 7

slide-39
SLIDE 39

Lossy Source Coding Theorem for Memoryless Sources Proof of the Coding Theorem Converse Proof Achievability

We obtain a nice upper bound P {E} ≤ P { SN / ∈ T (N)

ε′

(pS) } + exp ( −2K × 2−N(I(S ;

S)+δ(ε)))

. The first term vanishes as N → ∞ due to AEP. The second term vanishes as N → ∞ if R > I ( S ; S ) + δ(ε) = R (

D 1+ε

) + δ(ε). Hence, for any R > R (

D 1+ε

) + δ(ε) can achieve average distortion ≤ D. Finally, due to the continuity of rate-distortion function, we take ε → 0 and complete the proof.

39 / 39 I-Hsiang Wang IT Lecture 7