Coding Theorems for Reversible Embedding Frans Willems and Ton - - PowerPoint PPT Presentation

coding theorems for reversible embedding
SMART_READER_LITE
LIVE PREVIEW

Coding Theorems for Reversible Embedding Frans Willems and Ton - - PowerPoint PPT Presentation

Coding Theorems for Reversible Embedding Frans Willems and Ton Kalker T.U. Eindhoven, Philips Research DIMACS, March 16-19, 2003 Outline 1. Gelfand-Pinsker coding theorem 2. Noise-free embedding 3. Reversible embedding 4. Robust and


slide-1
SLIDE 1

Coding Theorems for Reversible Embedding

Frans Willems and Ton Kalker T.U. Eindhoven, Philips Research DIMACS, March 16-19, 2003

slide-2
SLIDE 2

Outline

  • 1. Gelfand-Pinsker coding theorem
  • 2. Noise-free embedding
  • 3. Reversible embedding
  • 4. Robust and reversible embedding
  • 5. Partially reversible embedding
  • 6. Remarks

1

slide-3
SLIDE 3
  • I. The Gelfand-Pinsker Coding Theorem

✲ ✲ ✲ ✲ ✻ ✻

Ps(x) Pc(z|y, x) Y N = e(W, XN) ˆ W = d(ZN) ˆ W ZN Y N XN W

Messages: Pr{W = w} = 1/M for w ∈ {1, 2, · · · , M}. Side information: Pr{XN = xN} = ΠN

n=1Ps(xn) for xN ∈ X N.

Channel: discrete memoryless {Y × X, Pc(z|y, x), Z}. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1

N log2(M).

2

slide-4
SLIDE 4

Capacity

The side-information capacity Csi is the largest ρ such that for all ǫ > 0 there exist for all large enough N encoders and decoders with R ≥ ρ − ǫ and PE ≤ ǫ. THEOREM (Gelfand-Pinsker [1980]): Csi = max

Pt(u,y|x) I(U; Z) − I(U; X).

(1) Achievability proof: Fix a test-channel Pt(u, y|x). Consider sets Aǫ(·)

  • f strongly typical sequences, etc.

(a) For each message index w ∈ {1, · · · , 2NR}, generate 2NRu sequences uN at random according to P(u) =

x,y Ps(x)Pt(u, y|x). Give these se-

quences the label w. (b) When message index w has to be transmitted choose a sequence uN having label w such that (uN, xN) ∈ Aǫ(U, X). Such a sequence exists almost always if Ru > I(U; X) (roughly).

3

slide-5
SLIDE 5

(c) The input sequence yN results from applying the ”channel” P(y|u, x) = Pt(y, u|x)/

y Pt(u, y|x) to uN and xN. Then yN is transmitted.

(d) The decoder upon receiving zN, looks for the unique sequence uN such that (uN, zN) ∈ Aǫ(U, Z). If R + Ru < I(U; Z) (roughly) such a unique sequence exists. The message index is the label of uN. Conclusion is that R < I(U; Z) − I(U; X) is achievable.

Observations

A: As an intermediate result the decoder recovers the sequence uN. B: The transmitted uN is jointly typical with the side-info sequence xN, i.e. (uN, xN) ∈ Aǫ(U, X) thus their joint composition is OK. Note that P(u, x) =

y Ps(x)Pt(u, y|x).

4

slide-6
SLIDE 6
  • II. Noise-free Embedding

✻ ✲ ✲ ✲ ✲

ˆ W = d(Y N) Y N = e(W, XN) Ps(x) Y N W XN ˆ W

Messages: Pr{W = w} = 1

M for w ∈ {1, 2, · · · , M}.

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1

N log2(M).

Embedding distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for some

distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.

5

slide-7
SLIDE 7

Achievable region noise-free embedding

A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. THEOREM (Chen [2000], Barron [2000]): The set of achievable rate-distortion pairs is equal to Gnfe which is defined as Gnfe = {(ρ, ∆xy) : 0 ≤ ρ ≤ H(Y |X), ∆xy ≥

  • x,y

P(x, y)Dxy(x, y), for P(x, y) = Ps(x)Pt(y|x)}. (2) Again {X, Pt(y|x), Y} is called test-channel.

6

slide-8
SLIDE 8

Proof:

Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = Y . Then (xN, yN) ∈ Aǫ(X, Y ) hence Dxy is OK. For the embedding rate we

  • btain

R = I(U; Z) − I(U; X) = I(Y ; Y ) − I(Y ; X) = H(Y |X). Converse: Rate part: log2(M) ≤ H(W) − H(W| ˆ W) + Fano term ≤ H(W|XN) − H(W|XN, Y N) + Fano term = I(W; Y N|XN) + Fano term ≤ H(Y N|XN) + Fano term ≤

  • n=1,N

H(Yn|Xn) + Fano term ≤ NH(Y |X) + Fano term,

7

slide-9
SLIDE 9

where X and Y are random variables with Pr{(X, Y ) = (x, y)} = 1 N

  • n=1,N

Pr{(Xn, Yn) = (x, y)}, for x ∈ X and y ∈ Y. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion part: Dxy =

  • xN,yN

Pr{(XN, Y N) = (xN, yN)} 1 N

  • n

Dxy(xn, yn) =

  • x,y

Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.

8

slide-10
SLIDE 10
  • III. Reversible Embedding

✻ ✲ ✲ ✲ ✲ ✲

( ˆ W, ˆ XN

1 ) = d(Y N)

Y N = e(W, XN) Ps(x) ˆ XN

1

Y N W XN ˆ W

Messages: Pr{W = w} = 1

M for w ∈ {1, 2, · · · , M}.

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W ∨ ˆ XN

1 = XN}.

Rate: R = 1

N log2(M).

Embedding distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for some

distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.

Inspired by Fridrich, Goljan, and Du, ”Lossless data embedding for all image formats,”

  • Proc. SPIE, Security and Watermarking of Multimedia Contents, San Jose, CA, 2002.

9

slide-11
SLIDE 11

Achievable region for reversible embedding

A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT (Kalker-Willems [2002]): The set of achievable rate-distortion pairs is equal to Gre which is defined as Gre = {(ρ, ∆xy) : 0 ≤ ρ ≤ H(Y ) − H(X), ∆xy ≥

  • x,y

P(x, y)Dxy(x, y), for P(x, y) = Ps(x)Pt(y|x)}. (3) Note that {X, Pt(y|x), Y} is the test channel.

10

slide-12
SLIDE 12

Proof:

Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = [X, Y ]. Then xN can be reconstructed by the decoder and (xN, yN) ∈ Aǫ(X, Y ) hence Dxy is OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([X, Y ]; Y ) − I([X, Y ]; X) = H(Y ) − H(X). Converse: Rate part: log2(M) ≤ H(W) − H(W, XN| ˆ W, ˆ XN

1 ) + Fano term

= H(W, XN) − H(W, XN| ˆ W, ˆ XN

1 ) − H(XN) + Fano term

≤ H(W, XN) − H(W, XN|Y N, ˆ W, ˆ XN

1 ) − H(XN) + Fano term

= I(W, XN; Y N) − H(XN) + Fano term = H(Y N) − H(XN) + Fano term ≤

  • n=1,N

[H(Yn) − H(Xn)] + Fano term ≤ N[H(Y ) − H(X)] + Fano term,

11

slide-13
SLIDE 13

where X and Y are random variables with Pr{(X, Y ) = (x, y)} = 1 N

  • n=1,N

Pr{(Xn, Yn) = (x, y)}, for x ∈ X and y ∈ Y. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion part: Dxy =

  • xN,yN

Pr{(XN, Y N) = (xN, yN)} 1 N

  • n

Dxy(xn, yn) =

  • x,y

Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.

12

slide-14
SLIDE 14

Example: Binary source, Hamming distortion

py

✟ ✟ ❍ ❍

X Y 1 d0 d1 1 px

✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗

Since ∆xy ≥ pxd1 + (1 − px)d0 py = px(1 − d1) + (1 − px)d0 we can write py ≤ ∆xy + px(1 − 2d1). Assume w.l.o.g. that px ≤ 1/2. First let ∆xy be such that ∆xy+px ≤ 1/2

  • r ∆xy ≤ 1/2 − px. Then we have

py ≤ ∆xy + px ≤ 1/2,

13

slide-15
SLIDE 15

and hence ρ ≤ h(py) − h(px) ≤ h(px + ∆xy) − h(px). However ρ = h(px + ∆xy) − h(px) is achievable with ∆xy by taking d1 = 0 and d0 = ∆xy 1 − px . Note that the test channel is not symmetric and that d0 = ∆xy 1 − px ≤ 1/2 − px 1 − px ≤ 1/2. For ∆xy + px ≥ 1/2 the rate is bounded as ρ ≤ 1 − h(px) but also achievable.

14

slide-16
SLIDE 16

Plot of rate-distortion region Gre

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 DISTORTION RATE in BITS

Horizontal axis ∆xy, vertical axis ρ, for px = 0.2. Maximum embedding rate 1 − h(0.2) ≈ 0.278.

15

slide-17
SLIDE 17

Another perspective

xN(k)

yN(k + 1) yN(k) w(k + 1) w(k) xN(k + 1)

Consider a blocked system with blocks of length N. In block k message bits can be (noise-free) embedded with rate H(Y |X) and corresponding distortion. Then in block k + 1 message bits are embedded that allow for recon- struction of xN(k) given yN(k). This requires NH(X|Y ) bits. Therefore the resulting embedding rate is R = H(Y |X) − H(X|Y ) = H(Y, X) − H(X) − H(X|Y ) = H(Y ) − H(X).

16

slide-18
SLIDE 18
  • IV. Robust and Reversible Embedding

✲ ✲ ✲ ✲ ✲ ✲ ✻

( ˆ W, ˆ XN

1 ) = d(ZN)

ZN ˆ XN

1

Pc(z|y) Ps(x) Y N = e(W, XN) ˆ W XN W Y N

Messages: Pr{W = w} = 1

M for w ∈ {1, 2, · · · , M}.

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Channel: discrete memoryless {Y, Pc(z|y), Z}. Error probability: PE = Pr{ ˆ W = W ∨ ˆ XN

1 = XN}.

Rate: R = 1

N log2(M).

Embedding distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for some

distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.

17

slide-19
SLIDE 19

Achievable region for robust and reversible embedding

A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT (Willems-Kalker [2003]): The set of achievable rate-distortion pairs is equal to Grre which is defined as Grre = {(ρ, ∆xy) : 0 ≤ ρ ≤ I(Y ; Z) − H(X), ∆xy ≥

  • x,y

P(x, y)Dxy(x, y), for P(x, y, z) = Ps(x)Pt(y|x)Pc(z|y)}. (4)

18

slide-20
SLIDE 20

Proof:

Achievability: In the Gelfand-Pinsker achievability proof again take the auxiliary random variable U = [X, Y ]. Then xN can be reconstructed by the decoder and since (xN, yN) ∈ Aǫ(X, Y ) the embedding distortion Dxy is OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([X, Y ]; Z) − I([X, Y ]; X) = I(Y ; Z) − H(X). Converse: Rate part: log2(M) ≤ H(W) − H(W, XN| ˆ W, ˆ XN

1 ) + Fano term

= H(W, XN) − H(W, XN| ˆ W, ˆ XN

1 ) − H(XN) + Fano term

≤ H(W, XN) − H(W, XN|ZN, ˆ W, ˆ XN

1 ) − H(XN) + Fano term

= I(W, XN; ZN) − H(XN) + Fano term = I(Y N; ZN) − H(XN) + Fano term ≤

  • n=1,N

[I(Yn; Zn) − H(Xn)] + Fano term ≤ N[I(Y ; Z) − H(X)] + Fano term,

19

slide-21
SLIDE 21

where X, Y and Z are random variables with Pr{(X, Y, Z) = (x, y, z)} = 1 N

  • n=1,N

Pr{(Xn, Yn, Zn) = (x, y, z)}, for x ∈ X, y ∈ Y, and z ∈ Z. Note that for x ∈ X Pr{X = x} = Ps(x). and for y ∈ Y and z ∈ Z Pr{Z = z|Y = y} = Pc(z|y). Distortion part: Dxy =

  • xN,yN

Pr{(XN, Y N) = (xN, yN)} 1 N

  • n

Dxy(xn, yn) =

  • x,y

Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.

20

slide-22
SLIDE 22

Example: Binary source, Hamming distortion, binary symmetric channel

pz

✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ✟ ✟ ❍ ❍ ✟ ✟ ❍ ❍

X Y Z 1 d0 α α d1 1 1 px py

✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗

Similar analysis as before.

21

slide-23
SLIDE 23

Plot of achievable region Grre

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 DISTORTION RATE in BITS

Horizontal axis ∆xy, vertical axis ρ for px = α = 0.1. Minimal distortion ≈ 0.218, maximum embedding rate 1 − h(0.1) − h(0.1) ≈ 0.062.

22

slide-24
SLIDE 24

The zero-rate case: Robustification

✲ ✲ ✲ ✲ ✲

Y N = e(XN) ˆ XN

1 = d(ZN)

Ps(x) Pc(z|y) ˆ XN

1

ZN Y N XN

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Channel: discrete memoryless {Y, Pc(z|y), Z}. Error probability: PE = Pr{ ˆ XN

1 = XN}.

Robustification distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for

some distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.

23

slide-25
SLIDE 25

Achievable distortions for robustification

A distortion ∆xy is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT: The set of achievable distortions is equal to Grob which is defined as Grob = {∆xy : ∆xy ≥

  • x,y

P(x, y)Dxy(x, y), for P(x, y, z) = Ps(x)Pt(y|x)Pc(z|y) such that H(X) ≤ I(Y ; Z)}. (5) Related to Shannon’s separation principle ! Robustification is not possi- ble if H(X) > maxPt(y) I(Y ; Z).

24

slide-26
SLIDE 26
  • V. Partially Reversible Embedding

✻ ✲ ✲ ✲ ✲ ✲

V N = f(Y N) V N ˆ W = d(Y N) Y N = e(W, XN) Ps(x) Y N W XN ˆ W

Messages: Pr{W = w} = 1

M for w ∈ {1, 2, · · · , M}.

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1

N log2(M).

Embedding distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for some

distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}. Restoration distortion: Dxv = E[ 1

N

  • n=1,N Dxv(Xn, fn(Y N))] for a dis-

tortion matrix {Dxv(x, v), x ∈ X, z ∈ V}.

25

slide-27
SLIDE 27

Achievable region for partially reversible embedding

A rate-distortion triple (ρ, ∆xy, ∆xv) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, Dxv ≤ ∆xv + ǫ, PE ≤ ǫ. RESULT (Willems-Kalker [2002]): The set of achievable rate-distortion triples is given by Gpre which is defined as Gpre = {(ρ, ∆xy, ∆xv) : 0 ≤ ρ ≤ H(Y ) − I(X; Y, V ), ∆xy ≥

  • x,y,v

P(x, y, v)Dxy(x, y), ∆xv ≥

  • x,y,v

P(x, y, v)Dxv(x, v), for P(x, y, v) = Ps(x)Pt(y, v|x)}. (6)

26

slide-28
SLIDE 28

Proof:

Achievability: In the Gelfand-Pinsker achievability proof, note again that Z = Y (noiseless channel) and take the auxiliary random variable U = [Y, V ]. Then vN can be reconstructed by the decoder and since (xN, yN, vN) ∈ Aǫ(X, Y, V ) both Dxy and Dxv are OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([Y, V ]; Y ) − I([Y, V ]; X) = H(Y ) − I(X; Y, V ). Converse: Rate part: log2(M) = H(Y N, V N, W) − H(Y N, V N|W) ≤ H(Y N) + H(W| ˆ W) − I(Y N, V N; XN|W) ≤ H(Y N) − H(XN|W) + H(XN|W, Y N, V N) + Fano term ≤ H(Y N) − H(XN) + H(XN|Y N, V N) + Fano term ≤

  • n=1,N

H(Yn) −

  • n=1,N

H(Xn) +

  • n=1,N

H(Xn|Yn, Vn) + Fano term ≤ N[H(Y ) − H(X) + H(X|Y, V )] + Fano term = N[H(Y ) − I(X; Y, V )] + Fano term,

27

slide-29
SLIDE 29

where X, Y and V are random variables with Pr{(X, Y, V ) = (x, y, v)} = 1 N

  • n=1,N

Pr{(Xn, Yn, Vn) = (x, y, v)}, for x ∈ X, y ∈ Y, and v ∈ V. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion parts: Dxy =

  • xN,yN

Pr{(XN, Y N) = (xN, yN)} 1 N

  • n

Dxy(xn, yn) =

  • x,y

Pr{(X, Y ) = (x, y)}Dxy(x, y). Dxv =

  • xN,vN

Pr{(XN, V N) = (xN, vN)} 1 N

  • n

Dxv(xn, vn) =

  • x,v

Pr{(X, V ) = (x, v)}Dxv(x, v). Let PE ↓ 0, etc.

28

slide-30
SLIDE 30

The other perspective again

Consider a blocked system with blocks of length N. In block k a message can be (noise-free) embedded with rate H(Y |X) and the corresponding distortion. Then in block k + 1 data is embedded that specifies a restoration se- quence vN(k) given yN(k). This requires NI(X; V |Y ) bits. Therefore the remaining embedding rate is R = H(Y |X) − I(X; V |Y ) = H(Y |X) − H(X|Y ) + H(X|Y, V ) = H(Y ) − H(X) + H(X|Y, V ) = H(Y ) − I(X; Y, V ).

29

slide-31
SLIDE 31

The zero-rate case: Self-Embedding

✲ ✲ ✲ ✲

Y N = e(XN) Ps(x) V N V N = f(Y N) XN Y N

Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Embedding distortion: Dxy = E[ 1

N

  • n=1,N Dxy(Xn, en(W, XN))] for some

distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}. Restoration distortion: Dxv = E[ 1

N

  • n=1,N Dxv(Xn, fn(Y N))] for a dis-

tortion matrix {Dxv(x, v), x ∈ X, z ∈ V}.

30

slide-32
SLIDE 32

Achievable distortions for self-embedding

A distortion pair (∆xy, ∆xv) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that Dxy ≤ ∆xy + ǫ, Dxv ≤ ∆xv + ǫ. RESULT (Willems-Kalker [2002]): The set of achievable distortion pairs is equal to Gse which is defined as Gse = {(∆xy, ∆xv) : ∆xy ≥

  • x,y,v

P(x, y, v)Dxy(x, y), ∆xv ≥

  • x,y,v

P(x, y, v)Dxv(x, v), for P(x, y, v) = Ps(x)Pt(y, v|x) such that H(Y ) ≥ I(X; Y, V )}. (7) Self-embedding is putting a vector quantizer into a scalar quantizer. Or making an abstract index to a restoration vector vN meaningful.

31

slide-33
SLIDE 33
  • VI. Remarks
  • 1. Our results are related to results of Sutivong, Cover, et al. Slightly

different setups however. Embedding distortion.

  • 2. We cannot do the partially reversible AND robust case. An achievable

region would be similar to the Sutivong, Cover, Chiang, Kim [2002]

  • region. No converse.
  • 3. Coding techniques for the reversible case have been studied (with

Deran Maas [2002]).

  • 4. Open problems: (A) Arimoto-Blahut methods to compute the rate-

distortion functions, (B) Coding techniques, especially for the zero- rate cases.

32