Coding Theorems for Reversible Embedding Frans Willems and Ton - - PowerPoint PPT Presentation
Coding Theorems for Reversible Embedding Frans Willems and Ton - - PowerPoint PPT Presentation
Coding Theorems for Reversible Embedding Frans Willems and Ton Kalker T.U. Eindhoven, Philips Research DIMACS, March 16-19, 2003 Outline 1. Gelfand-Pinsker coding theorem 2. Noise-free embedding 3. Reversible embedding 4. Robust and
Outline
- 1. Gelfand-Pinsker coding theorem
- 2. Noise-free embedding
- 3. Reversible embedding
- 4. Robust and reversible embedding
- 5. Partially reversible embedding
- 6. Remarks
1
- I. The Gelfand-Pinsker Coding Theorem
✲ ✲ ✲ ✲ ✻ ✻
Ps(x) Pc(z|y, x) Y N = e(W, XN) ˆ W = d(ZN) ˆ W ZN Y N XN W
Messages: Pr{W = w} = 1/M for w ∈ {1, 2, · · · , M}. Side information: Pr{XN = xN} = ΠN
n=1Ps(xn) for xN ∈ X N.
Channel: discrete memoryless {Y × X, Pc(z|y, x), Z}. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1
N log2(M).
2
Capacity
The side-information capacity Csi is the largest ρ such that for all ǫ > 0 there exist for all large enough N encoders and decoders with R ≥ ρ − ǫ and PE ≤ ǫ. THEOREM (Gelfand-Pinsker [1980]): Csi = max
Pt(u,y|x) I(U; Z) − I(U; X).
(1) Achievability proof: Fix a test-channel Pt(u, y|x). Consider sets Aǫ(·)
- f strongly typical sequences, etc.
(a) For each message index w ∈ {1, · · · , 2NR}, generate 2NRu sequences uN at random according to P(u) =
x,y Ps(x)Pt(u, y|x). Give these se-
quences the label w. (b) When message index w has to be transmitted choose a sequence uN having label w such that (uN, xN) ∈ Aǫ(U, X). Such a sequence exists almost always if Ru > I(U; X) (roughly).
3
(c) The input sequence yN results from applying the ”channel” P(y|u, x) = Pt(y, u|x)/
y Pt(u, y|x) to uN and xN. Then yN is transmitted.
(d) The decoder upon receiving zN, looks for the unique sequence uN such that (uN, zN) ∈ Aǫ(U, Z). If R + Ru < I(U; Z) (roughly) such a unique sequence exists. The message index is the label of uN. Conclusion is that R < I(U; Z) − I(U; X) is achievable.
Observations
A: As an intermediate result the decoder recovers the sequence uN. B: The transmitted uN is jointly typical with the side-info sequence xN, i.e. (uN, xN) ∈ Aǫ(U, X) thus their joint composition is OK. Note that P(u, x) =
y Ps(x)Pt(u, y|x).
4
- II. Noise-free Embedding
✻ ✲ ✲ ✲ ✲
ˆ W = d(Y N) Y N = e(W, XN) Ps(x) Y N W XN ˆ W
Messages: Pr{W = w} = 1
M for w ∈ {1, 2, · · · , M}.
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1
N log2(M).
Embedding distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for some
distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.
5
Achievable region noise-free embedding
A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. THEOREM (Chen [2000], Barron [2000]): The set of achievable rate-distortion pairs is equal to Gnfe which is defined as Gnfe = {(ρ, ∆xy) : 0 ≤ ρ ≤ H(Y |X), ∆xy ≥
- x,y
P(x, y)Dxy(x, y), for P(x, y) = Ps(x)Pt(y|x)}. (2) Again {X, Pt(y|x), Y} is called test-channel.
6
Proof:
Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = Y . Then (xN, yN) ∈ Aǫ(X, Y ) hence Dxy is OK. For the embedding rate we
- btain
R = I(U; Z) − I(U; X) = I(Y ; Y ) − I(Y ; X) = H(Y |X). Converse: Rate part: log2(M) ≤ H(W) − H(W| ˆ W) + Fano term ≤ H(W|XN) − H(W|XN, Y N) + Fano term = I(W; Y N|XN) + Fano term ≤ H(Y N|XN) + Fano term ≤
- n=1,N
H(Yn|Xn) + Fano term ≤ NH(Y |X) + Fano term,
7
where X and Y are random variables with Pr{(X, Y ) = (x, y)} = 1 N
- n=1,N
Pr{(Xn, Yn) = (x, y)}, for x ∈ X and y ∈ Y. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion part: Dxy =
- xN,yN
Pr{(XN, Y N) = (xN, yN)} 1 N
- n
Dxy(xn, yn) =
- x,y
Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.
8
- III. Reversible Embedding
✻ ✲ ✲ ✲ ✲ ✲
( ˆ W, ˆ XN
1 ) = d(Y N)
Y N = e(W, XN) Ps(x) ˆ XN
1
Y N W XN ˆ W
Messages: Pr{W = w} = 1
M for w ∈ {1, 2, · · · , M}.
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W ∨ ˆ XN
1 = XN}.
Rate: R = 1
N log2(M).
Embedding distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for some
distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.
Inspired by Fridrich, Goljan, and Du, ”Lossless data embedding for all image formats,”
- Proc. SPIE, Security and Watermarking of Multimedia Contents, San Jose, CA, 2002.
9
Achievable region for reversible embedding
A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT (Kalker-Willems [2002]): The set of achievable rate-distortion pairs is equal to Gre which is defined as Gre = {(ρ, ∆xy) : 0 ≤ ρ ≤ H(Y ) − H(X), ∆xy ≥
- x,y
P(x, y)Dxy(x, y), for P(x, y) = Ps(x)Pt(y|x)}. (3) Note that {X, Pt(y|x), Y} is the test channel.
10
Proof:
Achievability: In the Gelfand-Pinsker achievability proof, note that Z = Y (noiseless channel) and take the auxiliary random variable U = [X, Y ]. Then xN can be reconstructed by the decoder and (xN, yN) ∈ Aǫ(X, Y ) hence Dxy is OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([X, Y ]; Y ) − I([X, Y ]; X) = H(Y ) − H(X). Converse: Rate part: log2(M) ≤ H(W) − H(W, XN| ˆ W, ˆ XN
1 ) + Fano term
= H(W, XN) − H(W, XN| ˆ W, ˆ XN
1 ) − H(XN) + Fano term
≤ H(W, XN) − H(W, XN|Y N, ˆ W, ˆ XN
1 ) − H(XN) + Fano term
= I(W, XN; Y N) − H(XN) + Fano term = H(Y N) − H(XN) + Fano term ≤
- n=1,N
[H(Yn) − H(Xn)] + Fano term ≤ N[H(Y ) − H(X)] + Fano term,
11
where X and Y are random variables with Pr{(X, Y ) = (x, y)} = 1 N
- n=1,N
Pr{(Xn, Yn) = (x, y)}, for x ∈ X and y ∈ Y. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion part: Dxy =
- xN,yN
Pr{(XN, Y N) = (xN, yN)} 1 N
- n
Dxy(xn, yn) =
- x,y
Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.
12
Example: Binary source, Hamming distortion
py
✟ ✟ ❍ ❍
X Y 1 d0 d1 1 px
✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗
Since ∆xy ≥ pxd1 + (1 − px)d0 py = px(1 − d1) + (1 − px)d0 we can write py ≤ ∆xy + px(1 − 2d1). Assume w.l.o.g. that px ≤ 1/2. First let ∆xy be such that ∆xy+px ≤ 1/2
- r ∆xy ≤ 1/2 − px. Then we have
py ≤ ∆xy + px ≤ 1/2,
13
and hence ρ ≤ h(py) − h(px) ≤ h(px + ∆xy) − h(px). However ρ = h(px + ∆xy) − h(px) is achievable with ∆xy by taking d1 = 0 and d0 = ∆xy 1 − px . Note that the test channel is not symmetric and that d0 = ∆xy 1 − px ≤ 1/2 − px 1 − px ≤ 1/2. For ∆xy + px ≥ 1/2 the rate is bounded as ρ ≤ 1 − h(px) but also achievable.
14
Plot of rate-distortion region Gre
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 DISTORTION RATE in BITS
Horizontal axis ∆xy, vertical axis ρ, for px = 0.2. Maximum embedding rate 1 − h(0.2) ≈ 0.278.
15
Another perspective
xN(k)
✻
yN(k + 1) yN(k) w(k + 1) w(k) xN(k + 1)
✻
Consider a blocked system with blocks of length N. In block k message bits can be (noise-free) embedded with rate H(Y |X) and corresponding distortion. Then in block k + 1 message bits are embedded that allow for recon- struction of xN(k) given yN(k). This requires NH(X|Y ) bits. Therefore the resulting embedding rate is R = H(Y |X) − H(X|Y ) = H(Y, X) − H(X) − H(X|Y ) = H(Y ) − H(X).
16
- IV. Robust and Reversible Embedding
✲ ✲ ✲ ✲ ✲ ✲ ✻
( ˆ W, ˆ XN
1 ) = d(ZN)
ZN ˆ XN
1
Pc(z|y) Ps(x) Y N = e(W, XN) ˆ W XN W Y N
Messages: Pr{W = w} = 1
M for w ∈ {1, 2, · · · , M}.
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Channel: discrete memoryless {Y, Pc(z|y), Z}. Error probability: PE = Pr{ ˆ W = W ∨ ˆ XN
1 = XN}.
Rate: R = 1
N log2(M).
Embedding distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for some
distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.
17
Achievable region for robust and reversible embedding
A rate-distortion pair (ρ, ∆xy) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT (Willems-Kalker [2003]): The set of achievable rate-distortion pairs is equal to Grre which is defined as Grre = {(ρ, ∆xy) : 0 ≤ ρ ≤ I(Y ; Z) − H(X), ∆xy ≥
- x,y
P(x, y)Dxy(x, y), for P(x, y, z) = Ps(x)Pt(y|x)Pc(z|y)}. (4)
18
Proof:
Achievability: In the Gelfand-Pinsker achievability proof again take the auxiliary random variable U = [X, Y ]. Then xN can be reconstructed by the decoder and since (xN, yN) ∈ Aǫ(X, Y ) the embedding distortion Dxy is OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([X, Y ]; Z) − I([X, Y ]; X) = I(Y ; Z) − H(X). Converse: Rate part: log2(M) ≤ H(W) − H(W, XN| ˆ W, ˆ XN
1 ) + Fano term
= H(W, XN) − H(W, XN| ˆ W, ˆ XN
1 ) − H(XN) + Fano term
≤ H(W, XN) − H(W, XN|ZN, ˆ W, ˆ XN
1 ) − H(XN) + Fano term
= I(W, XN; ZN) − H(XN) + Fano term = I(Y N; ZN) − H(XN) + Fano term ≤
- n=1,N
[I(Yn; Zn) − H(Xn)] + Fano term ≤ N[I(Y ; Z) − H(X)] + Fano term,
19
where X, Y and Z are random variables with Pr{(X, Y, Z) = (x, y, z)} = 1 N
- n=1,N
Pr{(Xn, Yn, Zn) = (x, y, z)}, for x ∈ X, y ∈ Y, and z ∈ Z. Note that for x ∈ X Pr{X = x} = Ps(x). and for y ∈ Y and z ∈ Z Pr{Z = z|Y = y} = Pc(z|y). Distortion part: Dxy =
- xN,yN
Pr{(XN, Y N) = (xN, yN)} 1 N
- n
Dxy(xn, yn) =
- x,y
Pr{(X, Y ) = (x, y)}Dxy(x, y). Let PE ↓ 0, etc.
20
Example: Binary source, Hamming distortion, binary symmetric channel
pz
✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ✟ ✟ ❍ ❍ ✟ ✟ ❍ ❍
X Y Z 1 d0 α α d1 1 1 px py
✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗
Similar analysis as before.
21
Plot of achievable region Grre
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 DISTORTION RATE in BITS
Horizontal axis ∆xy, vertical axis ρ for px = α = 0.1. Minimal distortion ≈ 0.218, maximum embedding rate 1 − h(0.1) − h(0.1) ≈ 0.062.
22
The zero-rate case: Robustification
✲ ✲ ✲ ✲ ✲
Y N = e(XN) ˆ XN
1 = d(ZN)
Ps(x) Pc(z|y) ˆ XN
1
ZN Y N XN
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Channel: discrete memoryless {Y, Pc(z|y), Z}. Error probability: PE = Pr{ ˆ XN
1 = XN}.
Robustification distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for
some distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}.
23
Achievable distortions for robustification
A distortion ∆xy is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that Dxy ≤ ∆xy + ǫ, PE ≤ ǫ. RESULT: The set of achievable distortions is equal to Grob which is defined as Grob = {∆xy : ∆xy ≥
- x,y
P(x, y)Dxy(x, y), for P(x, y, z) = Ps(x)Pt(y|x)Pc(z|y) such that H(X) ≤ I(Y ; Z)}. (5) Related to Shannon’s separation principle ! Robustification is not possi- ble if H(X) > maxPt(y) I(Y ; Z).
24
- V. Partially Reversible Embedding
✻ ✲ ✲ ✲ ✲ ✲
V N = f(Y N) V N ˆ W = d(Y N) Y N = e(W, XN) Ps(x) Y N W XN ˆ W
Messages: Pr{W = w} = 1
M for w ∈ {1, 2, · · · , M}.
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Error probability: PE = Pr{ ˆ W = W}. Rate: R = 1
N log2(M).
Embedding distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for some
distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}. Restoration distortion: Dxv = E[ 1
N
- n=1,N Dxv(Xn, fn(Y N))] for a dis-
tortion matrix {Dxv(x, v), x ∈ X, z ∈ V}.
25
Achievable region for partially reversible embedding
A rate-distortion triple (ρ, ∆xy, ∆xv) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that R ≥ ρ − ǫ, Dxy ≤ ∆xy + ǫ, Dxv ≤ ∆xv + ǫ, PE ≤ ǫ. RESULT (Willems-Kalker [2002]): The set of achievable rate-distortion triples is given by Gpre which is defined as Gpre = {(ρ, ∆xy, ∆xv) : 0 ≤ ρ ≤ H(Y ) − I(X; Y, V ), ∆xy ≥
- x,y,v
P(x, y, v)Dxy(x, y), ∆xv ≥
- x,y,v
P(x, y, v)Dxv(x, v), for P(x, y, v) = Ps(x)Pt(y, v|x)}. (6)
26
Proof:
Achievability: In the Gelfand-Pinsker achievability proof, note again that Z = Y (noiseless channel) and take the auxiliary random variable U = [Y, V ]. Then vN can be reconstructed by the decoder and since (xN, yN, vN) ∈ Aǫ(X, Y, V ) both Dxy and Dxv are OK. For the embedding rate we obtain R = I(U; Z) − I(U; X) = I([Y, V ]; Y ) − I([Y, V ]; X) = H(Y ) − I(X; Y, V ). Converse: Rate part: log2(M) = H(Y N, V N, W) − H(Y N, V N|W) ≤ H(Y N) + H(W| ˆ W) − I(Y N, V N; XN|W) ≤ H(Y N) − H(XN|W) + H(XN|W, Y N, V N) + Fano term ≤ H(Y N) − H(XN) + H(XN|Y N, V N) + Fano term ≤
- n=1,N
H(Yn) −
- n=1,N
H(Xn) +
- n=1,N
H(Xn|Yn, Vn) + Fano term ≤ N[H(Y ) − H(X) + H(X|Y, V )] + Fano term = N[H(Y ) − I(X; Y, V )] + Fano term,
27
where X, Y and V are random variables with Pr{(X, Y, V ) = (x, y, v)} = 1 N
- n=1,N
Pr{(Xn, Yn, Vn) = (x, y, v)}, for x ∈ X, y ∈ Y, and v ∈ V. Note that for x ∈ X Pr{X = x} = Ps(x). Distortion parts: Dxy =
- xN,yN
Pr{(XN, Y N) = (xN, yN)} 1 N
- n
Dxy(xn, yn) =
- x,y
Pr{(X, Y ) = (x, y)}Dxy(x, y). Dxv =
- xN,vN
Pr{(XN, V N) = (xN, vN)} 1 N
- n
Dxv(xn, vn) =
- x,v
Pr{(X, V ) = (x, v)}Dxv(x, v). Let PE ↓ 0, etc.
28
The other perspective again
Consider a blocked system with blocks of length N. In block k a message can be (noise-free) embedded with rate H(Y |X) and the corresponding distortion. Then in block k + 1 data is embedded that specifies a restoration se- quence vN(k) given yN(k). This requires NI(X; V |Y ) bits. Therefore the remaining embedding rate is R = H(Y |X) − I(X; V |Y ) = H(Y |X) − H(X|Y ) + H(X|Y, V ) = H(Y ) − H(X) + H(X|Y, V ) = H(Y ) − I(X; Y, V ).
29
The zero-rate case: Self-Embedding
✲ ✲ ✲ ✲
Y N = e(XN) Ps(x) V N V N = f(Y N) XN Y N
Source (host): Pr{XN = xN} = Πn=1,NPs(xn) for xN ∈ X N. Embedding distortion: Dxy = E[ 1
N
- n=1,N Dxy(Xn, en(W, XN))] for some
distortion matrix {Dxy(x, y), x ∈ X, y ∈ Y}. Restoration distortion: Dxv = E[ 1
N
- n=1,N Dxv(Xn, fn(Y N))] for a dis-
tortion matrix {Dxv(x, v), x ∈ X, z ∈ V}.
30
Achievable distortions for self-embedding
A distortion pair (∆xy, ∆xv) is said to be achievable if for all ǫ > 0 there exists for all large enough N encoders and decoders such that Dxy ≤ ∆xy + ǫ, Dxv ≤ ∆xv + ǫ. RESULT (Willems-Kalker [2002]): The set of achievable distortion pairs is equal to Gse which is defined as Gse = {(∆xy, ∆xv) : ∆xy ≥
- x,y,v
P(x, y, v)Dxy(x, y), ∆xv ≥
- x,y,v
P(x, y, v)Dxv(x, v), for P(x, y, v) = Ps(x)Pt(y, v|x) such that H(Y ) ≥ I(X; Y, V )}. (7) Self-embedding is putting a vector quantizer into a scalar quantizer. Or making an abstract index to a restoration vector vN meaningful.
31
- VI. Remarks
- 1. Our results are related to results of Sutivong, Cover, et al. Slightly
different setups however. Embedding distortion.
- 2. We cannot do the partially reversible AND robust case. An achievable
region would be similar to the Sutivong, Cover, Chiang, Kim [2002]
- region. No converse.
- 3. Coding techniques for the reversible case have been studied (with
Deran Maas [2002]).
- 4. Open problems: (A) Arimoto-Blahut methods to compute the rate-
distortion functions, (B) Coding techniques, especially for the zero- rate cases.
32