Refined Strong Converse for the Constant Composition Codes Hao-Chung - - PowerPoint PPT Presentation

refined strong converse for the constant composition codes
SMART_READER_LITE
LIVE PREVIEW

Refined Strong Converse for the Constant Composition Codes Hao-Chung - - PowerPoint PPT Presentation

Refined Strong Converse for the Constant Composition Codes Hao-Chung Cheng 1 glu 2 Bar s Nakibo 1 Department of Applied Mathematics and Theoretical Physics University of Cambridge 2 Department of Electrical and Electronics Engineering


slide-1
SLIDE 1

Refined Strong Converse for the Constant Composition Codes

Hao-Chung Cheng 1 Barı¸ s Nakibo˘ glu 2

1Department of Applied Mathematics and Theoretical Physics

University of Cambridge

2Department of Electrical and Electronics Engineering

Middle East Technical University

ISIT 2020 arXiv:2002.11414

slide-2
SLIDE 2

1/22

Probability of Error in Channel Coding

P(n)

e

Rate R C Error exponent regime Strong converse regime ◮ R < C: Probability of erroneous decoding decays exponentially (error exponent regime) ◮ R > C: Probability of erroneous decoding converges to one (strong converse regime)

slide-3
SLIDE 3

2/22

Historical Remarks on Strong Converse

Exponential strong converse: P(n)

e

≥ 1 − e−nEsc(R) ◮ Arimoto established an exponential strong converse bound in 1973 ◮ One-shot bound for more general channels [Aug78, She82, PV10, Nak19b] ◮ Classical-quantum channels & classical data compression with quantum side information (via the data-processing inequality of the quantum sandwiched R´ enyi divergence) [Nag01, WWY14, MO17, CHDH18b, CHDH18a] ◮ The Esc(R) is optimal for const. comp. codes, Gaussian channels, and DSPCs [Omu75, DK79, Ooh17] ◮ The Esc(R) is optimal for classical-quantum channels & classical data compression with quantum side information [MO17, MO18, CHDH18b]

slide-4
SLIDE 4

3/22

Question

Error exponent regime: P(n)

e

  • n−

1−E′ sp(R) 2

e−nEsp(R)

  • , ∀R ∈[Rcrit, C]

for certain symmetric channels, Gaussian channels, and const. comp. codes [Eli55, Sha59, Dob62, AW14, AW19, Nak20] Strong converse regime: P(n)

e

≥ 1 − O

  • n− 1−E′

sc(R) 2

e−nEsc(R) ?

slide-5
SLIDE 5

4/22

Main Contributions

  • 1. Refined strong converse for hypothesis testing:

1 − An− 1

2αe−nD1(wq αw) ≥ P0

e ≥ 1 − An− 1

2αe−nD1(wq αw)

provided that P1

e = e−nD1(wq

αq) for an α ≥ 1 and w ≺ q.

  • 2. Refined strong converse for the constant composition codes in channel coding:

P(n)

e

≥ 1 − O

  • n− 1−E′

sc(R) 2

e−nEsc(R)

  • 3. Exponent trade-off in the error exponent saturation regime
slide-6
SLIDE 6

5/22

Exponents Trade-Off in Hypothesis Testing (D1(w q) < ∞)

lim

n→∞ − 1 n ln P0 e

lim

n→∞ − 1 n ln P1 e

D1(w q)

error exponent

lim

n→∞ − 1 n ln

  • 1 − P0

e

  • lim

n→∞ − 1 n ln P1 e

D1(w q)

strong converse exponent

D1(w q

α w)

D1(w q

α q)

D1(w q

1 q) = D1(w q)

α = 1 α = 0 α ↑ ∞

Divergence trade-off

slide-7
SLIDE 7

6/22

Exponents Trade-Off in Hypothesis Testing (D1(w q) < ∞)

lim

n→∞ − 1 n ln P0 e

lim

n→∞ − 1 n ln P1 e

D1(w q)

error exponent

lim

n→∞ − 1 n ln

  • 1 − P0

e

  • lim

n→∞ − 1 n ln P1 e

D1(w q)

strong converse exponent

D1(w q

α w)

D1(w q

α q)

D1(w q

1 q) = D1(w q)

α = 1 α = 0 α ↑ ∞

Divergence trade-off

slide-8
SLIDE 8

7/22

Exponents Trade-Off in Hypothesis Testing (limα↑1D1(w q

α q) = ∞) lim

n→∞ − 1 n ln P0 e

lim

n→∞ − 1 n ln P1 e

D1(q w)

} ln

1 wac

No strong converse regime!

lim

α↑1 D1(wq α q) = ∞

  • either

w≺q and D1(w q) = ∞

  • r

w⊀q and D1

  • wac

wac

  • q
  • = ∞
slide-9
SLIDE 9

8/22

Exponents Trade-Off in Hypothesis Testing (w ≺q & limα↑1D1(w q

α q)<∞) lim

n→∞ − 1 n ln P0 e

lim

n→∞ − 1 n ln P1 e

D1(w q

1 q)

D1(w q

1 w)

error exponent saturation

D1(w q

1 q)

lim

n→∞ − 1 n ln

  • wacn − P0

e

  • lim

n→∞ − 1 n ln P1 e

w ≺q & limα↑1 D1(wq

α q)<∞

⇔ lim

α↑1 wq α = wac wac =:wq 1 = w & lim α↑1 D1(wq α q)=D1

  • wq

1

  • q
slide-10
SLIDE 10

8/22

Exponents Trade-Off in Hypothesis Testing (w ≺q & limα↑1D1(w q

α q)<∞) lim

n→∞ − 1 n ln P0 e

lim

n→∞ − 1 n ln P1 e

D1(w q

1 q)

D1(w q

1 w)

error exponent saturation

D1(w q

1 q)

lim

n→∞ − 1 n ln

  • wacn − P0

e

  • lim

n→∞ − 1 n ln P1 e

D1(w q

1 w)

D1(w q

α w)

D1(w q

α q)

D1(w q

1 q)

Divergence trade-off

slide-11
SLIDE 11

8/22

Layout

Motivation & Our Contributions The Binary Hypothesis Testing Problem & The Refined Strong Converse Hypothesis Testing and Tilting Refined Strong Converse for Channel Coding Main Result Discussion

slide-12
SLIDE 12

9/22

Main Result: Refined Strong Converse for Hypothesis Testing

Lemma

Let w =⊗n

t=1wt and q =⊗n t=1qt, wt, qt ∈P(Yt), and let wt,ac be the component of wt that

is absolutely continuous in qt. For any α∈(1, ∞), and any E ∈ Yn

1 satisfying

q(E) ≤ e−D1(wq

αq), there exists an A > 0 such that

w(Yn

1\E

)≥ n

  • t=1

wt,ac

  • − An− 1

2α e−D1(wq αw).

◮ The tilted distribution wq

α will be introduced later

◮ When w ≺ q, n

t=1wt,ac = 1

◮ The term n− 1

2α e−D1(wq αw) is optimal up to a multiplicative constant; see matching

bound in arXiv:2002.11414

slide-13
SLIDE 13

10/22

Proof Strategy

How to employ the Berry–Essen theorem to obtain a refined strong converse?

  • 1. Introduce auxiliary decision intervals for ln dw

dq

  • 2. Properly control the probability evaluated on those intervals

◮ Use change of measures by the proposed tilted distribution ◮ Apply Berry–Esseen Theorem to bound the probability on each interval ◮ Use the formula of the sum of geometric series

slide-14
SLIDE 14

11/22

A New Titled Distribution

For w and q, it was defined as dwq

α

dν e(1−α)Dα(wq)( dw dν )α( dq dν )1−α [Nak20]

◮ Error exponent trade-off: D1(w q

αw) vs. D1(w q αq) for α ∈ (0, 1)

However, it is not defined for w⊀q and α ≥ 1 New definition: for α ∈ R+ satisfying Dα

  • wq

1

  • q
  • < ∞,

d dqwq α e(1−α)Dα(wq

1 q) dwq 1

dq

α , wq

1 wac wac

◮ wq

α converges in total variation to wq 1 , rather than w

◮ limα↑1 D1

  • wq

α

  • q
  • = D1
  • wq

1

  • q
  • instead of D1(w q)

◮ consistent with the previous definition for α ∈ (0, 1) Change of measure: ln dwq

α

dq = D1(wq α q) + α

  • ln dw

dq − Ewq

α

  • ln dw

dq

  • ,

q-a.s. ln dwq

α

dw = D1(wq αw) + (α−1)

  • ln dw

dq − Ewq

α

  • ln dw

dq

  • q-a.s.
slide-15
SLIDE 15

12/22

Proof

Goal: To bound w(Yn

1 \E) from below given q(E) ≤ e−D1(wq

αq)

Decision region: Bκ

  • yn

1 :τ + κ ≤ ln dw dq − Ewq

α

  • ln dw

dq

  • < τ + (κ + 1)
  • , κ ∈ Z

w(Yn

1 \E) ≥ w(∪κBκ\E)

= w(∪κBκ) −

  • κ≤0 w(E ∩ Bκ) −
  • κ>0 w(E ∩ Bκ)

= n

t=1 wt,ac −

  • κ≤0 w(E ∩ Bκ) −
  • κ>0 w(E ∩ Bκ)

◮ It remains to show

κ≤0 w(E ∩ Bκ) ≈ κ>0 w(E ∩ Bκ) = O

  • n− 1

2α e−D1(wq αw)

slide-16
SLIDE 16

13/22

Proof (Bounding the first term)

To show

κ≤0 w(E ∩ Bκ) = O

  • n− 1

2α e−D1(wq αw)

w(E ∩ Bκ) ≤ wq

α(E ∩ Bκ)e−D1(wq

αw)−(α−1)τ−(α−1)κ

by change of measure ≤ q(E ∩ Bκ)e−D1(wq

αw)+D1(wq αq)+τ+α+κ

by change of measure ≤ e−D1(wq

αw)+τ+α+κ

∵ q(E) ≤ e−D1(wq

αq)

  • κ≤0 w(E ∩ Bκ) ≤ c1e−D1(wq

αw)+τ

by the formula for the sum of geometric series Choosing τ ≈ − ln n

2α arrives at the desired bound

slide-17
SLIDE 17

14/22

Proof (Bounding the second term)

To show

κ>0 w(E ∩ Bκ) = O

  • n− 1

2α e−D1(wq αw)

w(E ∩ Bκ) ≤ wq

α(E ∩ Bκ)e−D1(wq

αw)−(α−1)τ−(α−1)κ

by change of measure ≤ c2n− 1

2 e−D1(wq αw)−(α−1)τ−(α−1)κ

by the Berry–Esseen Thm. ⇒

  • κ>0

w(E ∩ Bκ) ≤ c3n− 1

2 e−D1(wq αw)+(1−α)τ

by the formula for the sum of geo. series Finally, choosing τ ≈ − ln n

2α proves the claim

slide-18
SLIDE 18

15/22

Product Channels and Constant Composition Codes

Product Channel W[1,n] : Xn

1 → P(Yn 1 )

Encoder Ψ : M → Xn

1

Decoder Θ : Yn

1 →

M Xn

1

Yn

1

M

  • M

(Component) Channel W : X → P(Y) Product Channel W[1,n] : Xn

1 → P(Yn 1 ) such that

W[1,n](xn

1 ) =

n

t=1 W (xt)

∀xn

1 ∈ Xn 1.

Encoding Function Ψ : M → Xn

1 where M {1, . . . , M}

Decoding Function Θ : Yn

1 →

M where M {L : L ⊂ M & |L| ≤ L} Pm

e EW[1,n](Ψ(m))

  • 1{m/

∈Θ(Yn

1)}

  • ,

Pe

1 M

  • m∈M Pm

e .

Constant Composition Codes: The empirical distribution of Ψ(m) is the same for all m ∈ M.

slide-19
SLIDE 19

16/22

Strong Converse Exponent and Tilting

Definition (Strong Converse Exponent)

For any rate R ∈ R+, channel W :X → P(Y), and input dist. p ∈P(X) Esc(R, W, p) supα∈(1,∞)

1−α α (Iα(

p;W) − R) Augustin information: Iα( p;W) inf

q∈P(Y) Dα(W q| p) = Dα(W qα,p| p)

Definition (Tilted Channel)

Given any x ∈ X, the tilted channel W qα,p

α

  • utputs the tilted measure of W (x) and qα,p.

◮ Using the fixed-point property [Nak19a]:

x p(x)W qα,p α

(x) = qα,p ⇒ I1

  • p;W qα,p

α

  • = D1
  • W qα,p

α

  • qα,p
  • p
  • ր in α ∈ [1, ∞) from I1(

p;W) onward [Nak19a] I1( p;W)<R < lim

α↑∞ I1

  • p;W qα,p

α

  • ⇒ ∃! α ∈ (1, ∞) s.t.
  • Esc(R, W, p)

= D1

  • W qα,p

α

  • W
  • p
  • R

= D1

  • W qα,p

α

  • qα,p
  • p
  • R

Esc(R, W, p) I1( p;W)

→ Augustin mean

slide-20
SLIDE 20

17/22

Refined Strong Converse & Hypothesis Testing Problem

For any length n, list size L, message set size M code let R = 1

n ln M L .

◮ For each m, let wm = ⊗n

t=1W (Ψt(m)), q = ⊗n t=1qα,p,

D1(wq

αwm)=nD1

  • W qα,p

α

  • W
  • p
  • =nEsc(R, W, p)

∀m ∈ {1, . . . , M}, D1(wq

αq)=nD1

  • W qα,p

α

  • qα,p
  • p
  • =nR

∀m ∈ {1, . . . , M}. ◮ Apply the previous lemma for each m and E = {yn

1 : m∈Θ(yn 1 )},

Pm

e ≥1− 2eα∆

1/

α

(α−1)

1/

α

  • q(m∈Θ)M

L

α−1

α

e

−nEsc( 1 n ln M L ,W ,p)

n

1/ 2α

◮ Concavity of z → z

α−1 α

together with the Jensen’s inequality imply

  • m∈M

1 M

  • q(m∈Θ)M

L

α−1

α

  • m∈M

1 M q(m∈Θ)M L

α−1

α

= 1 ◮

∂Esc(R,W,p) ∂R

= α−1

α , implies n− 1

2α = n− 1−E′ sc(R,W,p) 2

.

slide-21
SLIDE 21

18/22

Main Result

Theorem

For any W :X→P(Y), M, L, n∈Z+, p ∈P(X) satisfying I1( p;W)< 1

n ln M L < lim α↑∞ I1

  • p;W qα,p

α

  • and any (M, L) channel code of length n whose codewords all have the same composition p,

P(n)

e

≥ 1 − 2eα

∆ α−1

1

α n− 1 2α e−nEsc( 1 n ln M L ,W ,p)

for α satisfying I1

  • p;W qα,p

α

  • = 1

n ln M L , where

a2 = Ep⊛W

qα,p α

  • ln dW

dqα,p − EW

qα,p α

  • ln dW

dqα,p

  • 2

, a3 = Ep⊛W

qα,p α

  • ln dW

dqα,p − EW

qα,p α

  • ln dW

dqα,p

  • 3

, ∆

1 e√a2

  • 1

√ 2π + 2 0.56a3 a2

  • .
slide-22
SLIDE 22

19/22

High Rate Regime

If 1

n ln M L ≥ limα↑∞ I1

  • p;W qα,p

α

  • , then we only have

P(n)

e

≥ 1 − e−nEsc( 1

n ln M L ,W ,p)

i.e., for rates larger than or equal to limα↑∞ I1

  • p;W qα,p

α

  • there is no refinement.

Not surprising because

1 nln M L ≥limα↑∞I1(

p;W qα,p

α

) ⇒ E ′

sc

1

n ln M L , W, p

  • =1

⇒ n−

1−E′ sc( 1 n ln M L ,W,p) 2

=1

slide-23
SLIDE 23

20/22

Channel with the Error Exponent Saturation Regime

A shift invariant channel W satisfying Y = X + Z − ⌊X + Z⌋ for an additive noise Z s.t.

P[Z ≤ z] =      z < 0

1+z2 2

z ∈ [0, 1] 1 z > 1

⇒ C1,W = ∞

P(n)

e

≥    An− 1

2α e−n[R+ 1 1+α]

R < 1

2ln 4 e 1 2n

  • 1−An− 1

2α e−n[R+ 1 1+α −ln 2]

R > 1

2ln 4 e

where A depends on R and α is the unique root

  • f the equation ln(1 + α) −

α 1+α = R 1 2ln 4 e

saturation rate ln2 R

− ln P(n)

e

n

The divergence trade-off for α ∈ (0, 1) and the SPE for R ∈ (0, 1

2ln 4 e )

The divergence trade-off α ∈ (1, ∞) The error exponent for R ∈ ( 1

2ln 4 e , ∞)

slide-24
SLIDE 24

20/22

Layout

Motivation & Our Contributions The Binary Hypothesis Testing Problem & The Refined Strong Converse Discussion

slide-25
SLIDE 25

21/22

Error Exponent Saturation Regime D1(w q

1 q) ≤ D1(w q)

  • 1. D1(w q

1 q) = D1(w q) < ∞:

  • 2. ∞ = D1(w q

1 q) = D1(w q):

  • 3. D1(w q

1 q) < D1(w q) = ∞:

a threshold determining the error going to 0/1 stationary point for exponents (w ≺ q & D1

  • wq

1

  • q
  • < ∞)

error exponent regime: [0, D1(w q

1 q))

strong converse regime: (D1(w q) , ∞] strong converse regime: ∅ error expo. saturation regime:

(D1(w q

1 q) , D1(w q)]

slide-26
SLIDE 26

22/22

Discussion

◮ Hypothesis testing between w =⊗n

t=1wt and q =⊗n t=1qt with w ≺ q & D1

  • wq

1

  • q
  • <∞:

P0

e = 1 − Θ

  • n− 1

2α e−nD1(wq αw)

if P1

e =e−nD1(wq

αq)

◮ Hypothesis testing between w =⊗n

t=1wt and q =⊗n t=1qt with w ≺ q & D1

  • wq

1

  • q
  • <∞:

P0

e =

n

t=1 wt,ac

  • −Θ
  • n− 1

2α e−D1(wq αw)

if P1

e =e−nD1(wq

αq)

⇐ ⇒ P0

e =

  • 1−Θ
  • n− 1

2α e−D1(wq αwq 1 ) n

t=1 wt,ac

if P1

e =e−nD1(wq

αq)

◮ The refined strong converse applies for AWGN channels with quadratic cost functions and the R´ enyi symmetric channels [Nak20] ◮ Exact multiplicative constants for Θ (·)’s can be found either by [Str62, Thm. 1.1] of Strassen as in [CL71] or by the saddle-point approximation as in [VVFKL18, LOD+20] ◮ Refined strong converse for classical-quantum channels?

slide-27
SLIDE 27

22/22

Questions?

Barı¸ s Nakibo˘ glu: bnakib@metu.edu.tr Hao-Chung Cheng: HaoChung.Ch@gmail.com

slide-28
SLIDE 28

22/22

Udo Augustin. Noisy Channels. Habilitation thesis, Universit¨ at Erlangen-N¨ urnberg, 1978. (http://bit.ly/2ID8h7m).

  • Y. Altu˘

g and A. B. Wagner. Refinement of the random coding bound. IEEE Transactions on Information Theory, 60(10):6005–6023, Oct 2014.

  • Y. Altu˘

g and A. B. Wagner. On exact asymptotics of the error probability in channel coding: symmetric channels. arXiv:1908.11419 [cs.IT], 2019. H.-C. Cheng, E. P. Hanson, N. Datta, and M.H. Hsieh. Duality between source coding with quantum side information and c-q channel coding. arXiv:1809.11143 [quant-ph], 2018. H.-C. Cheng, E. P. Hanson, N. Datta, and M.H. Hsieh. Non-asymptotic classical data compression with quantum side informations. arXiv:1803.07505 [quant-ph], 2018.

  • I. Csisz´

ar and G. Longo.

slide-29
SLIDE 29

22/22

On the error exponent for source coding and for testing simple statistical hypotheses. Studia Scientiarum Mathematicarum Hungarica, 6:181–191, 1971. (http://real-j.mtak.hu/id/eprint/5457).

  • G. Dueck and J. Korner.

Reliability function of a discrete memoryless channel at rates above capacity (corresp.). IEEE Transactions on Information Theory, 25(1):82–85, Jan 1979.

  • R. Dobrushin.

Asymptotic estimates of the probability of error for transmission of messages over a discrete memoryless communication channel with a symmetric transition probability matrix. Theory of Probability & Its Applications, 7(3):270–300, 1962.

  • P. Elias.

Coding for two noisy channels. In Proceedings of Third London Symposium of Information Theory, pages 61–74, London, 1955. Butterworth Scientific.

  • A. Lancho, J. Ostman, G. Durisi, T. Koch, and G. Vazquez-Vilar.

Saddlepoint approximations for short-packet wireless communications. IEEE Transactions on Wireless Communications (Early Access), 2020. (arXiv:1904.10442[cs.IT]).

slide-30
SLIDE 30

22/22

  • M. Mosonyi and T. Ogawa.

Strong converse exponent for classical-quantum channel coding. Communications in Mathematical Physics, 355(1):373–426, Oct 2017. Mosonyi and Ogawa. Divergence radii and the strong converse exponent of classical-quantum channel coding with constant compositions. arXiv:1811.10599v6 [cs.IT], 2018.

  • H. Nagaoka.

Strong converse theorems in quantum information theory. In Proceedings of the ERATO Conference on Quantum Information Science (EQIS), volume 33, 2001. (also appeared as Chap. 4 of Asymptotic Theory of Quantum Statistical Inference: Selected Papers, ed. by M. Hayashi).

  • B. Nakibo˘

glu. The Augustin Capacity and Center. Problems of Information Transmission, 55(4):299–342, October 2019. (arXiv:1803.07937 [cs.IT]).

  • B. Nakibo˘

glu. The Sphere Packing Bound via Augustin’s Method.

slide-31
SLIDE 31

22/22

IEEE Transactions on Information Theory, 65(2):816–840, Feb 2019. (arXiv:1611.06924 [cs.IT]).

  • B. Nakibo˘

glu. A simple derivation of the refined sphere packing bound under certain symmetry hypotheses. Turkish Journal Of Mathematics, 44(3):919–948, 2020. (arXiv:1904.12780 [cs.IT]).

  • J. K. Omura.

A lower bounding method for channel and source coding probabilities. Information and Control, 27(2):148 – 177, 1975.

  • Y. Oohama.

The optimal exponent function for the additive white gaussian noise channel at rates above the capacity. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1053–1057, Aachen, Germany, June 2017.

  • Y. Polyanskiy and S. Verd´

u. Arimoto channel coding converse and R´ enyi divergence. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pages 1327 –1333, Oct 2010.

  • C. E. Shannon.
slide-32
SLIDE 32

22/22

Probability of error for optimal codes in a Gaussian channel. The Bell System Technical Journal, 38(3):611–656, May 1959.

  • A. Yu Sheverdyaev.

Lower bound for error probability in a discrete memoryless channel with feedback. Problems of Information Transmission, 18(4):5–15, 1982. Volker Strassen. Asymptotische absch¨ atzungen in Shannons Informationstheorie. In Trans. Third Prague Conf. Inf. Theory, pages 689–723, 1962. (http://www.math.cornell.edu/~pmlut/strassen.pdf).

  • G. Vazquez-Vilar, A. G. i Fabregas, T. Koch, and A. Lancho.

Saddlepoint approximation of the error probability of binary hypothesis testing. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 2306–2310, June 2018.

  • M. M. Wilde, A. Winter, and D. Yang.

Strong converse for the classical capacity of entanglement-breaking and Hadamard channels via a sandwiched R´ enyi relative entropy. Communications in Mathematical Physics, 331(2):593–622, Oct 2014.