Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 - - PowerPoint PPT Presentation

r nyi divergences and hypothesis testing problems
SMART_READER_LITE
LIVE PREVIEW

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 - - PowerPoint PPT Presentation

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 Fsica Terica: Informaci i Fenomens Quntics Universitat Autnoma Barcelona 2 Mathematical Institute Budapest University of Technology and Economics Paris 2015


slide-1
SLIDE 1

Rényi divergences and hypothesis testing problems

Milán Mosonyi1,2

1Física Teòrica: Informació i Fenomens Quàntics

Universitat Autónoma Barcelona

2Mathematical Institute

Budapest University of Technology and Economics

Paris 2015

slide-2
SLIDE 2

Binary state discrimination

  • Two candidates for the true state of a system: H0 : ρ vs.

H1 : σ

  • Many identical copies are available: H0 : ρ⊗n vs.

H1 : σ⊗n

  • Decision is based on a binary POVM

(T, I − T)

  • n H⊗n.
  • error probabilities:

αn(T) := Tr ρ⊗n(In − T) (first kind) βn(T) := Tr σ⊗nT (second kind)

  • trade-off:

min0≤T≤I {αn(T) + βn(T)} > 0 unless ρn ⊥ σn

slide-3
SLIDE 3

Binary state discrimination

  • Two candidates for the true state of a system: H0 : ρ vs.

H1 : σ

  • Many identical copies are available: H0 : ρ⊗n vs.

H1 : σ⊗n

  • Decision is based on a binary POVM

(T, I − T)

  • n H⊗n.
  • error probabilities:

αn(T) := Tr ρ⊗n(In − T) (first kind) βn(T) := Tr σ⊗nT (second kind)

  • trade-off:

min0≤T≤I {αn(T) + βn(T)} > 0 unless ρn ⊥ σn

  • Quantum Stein’s lemma:1

αn(Tn) → 0 = ⇒ βn(Tn) ∼ e−nD1(ρσ) is the optimal decay D1(ρσ) := Tr ρ(log ρ − log σ) relative entropy2

1Hiai, Petz, 1991, Ogawa, Nagaoka, 2001; 2Umegaki, 1962

slide-4
SLIDE 4

Relative entropy

  • The (quantum) Stein’s lemma gives an operational interpretation to

the (quantum) relative entropy (Kullback-Leibler divergence).

  • Notion of “distance” on the state space.
  • All relevant information measures are derived from it:

entropy: H(ρ) := −D1(ρI) mutual information: I(A : B)ρ := D1(ρABρA ⊗ ρB) all sorts of channel capacities, etc.

slide-5
SLIDE 5

Relative entropy

  • The (quantum) Stein’s lemma gives an operational interpretation to

the (quantum) relative entropy (Kullback-Leibler divergence).

  • Notion of “distance” on the state space.
  • All relevant information measures are derived from it (?)

entropy: H(ρ) := −D1(ρI) mutual information: I(A : B)ρ := D1(ρABρA ⊗ ρB) all sorts of channel capacities, etc.

  • Statistical divergence ∆ on the state space:

(1) ∆(ρσ) ≥ 0, ∆(ρ, σ) = 0 ⇐ ⇒ ρ = σ (2) ∆(Φ(ρ) Φ(σ)) ≤ ∆(ρσ) Φ stochastic map Example: f-divergences (3) operational interpretation?

slide-6
SLIDE 6

Other statistical divergences

  • Trace-norm distance:

H0 : ρ vs. H1 : σ min

0≤T≤I {α(T) + β(T)} = 1 − 1

2 ρ − σ1 ∆Tr(ρσ) := 1

2 ρ − σ1

slide-7
SLIDE 7

Other statistical divergences

  • Trace-norm distance:

H0 : ρ vs. H1 : σ min

0≤T≤I {α(T) + β(T)} = 1 − 1

2 ρ − σ1 ∆Tr(ρσ) := 1

2 ρ − σ1

  • Chernoff bound theorem:1

1 − 1 2

  • ρ⊗n − σ⊗n
  • 1 ∼ e−nC(ρ,σ)

C(ρ, σ) := − inf

0<α<1(α − 1)Dα(ρσ)

Chernoff divergence Dα(ρσ) := 1 α − 1 log Tr ρασ1−α Rényi divergences

1Nussbaum, Szkoła, 2006;

Audenaert et al., 2006

slide-8
SLIDE 8

Quantifying the trade-off

  • Stein’s lemma:

αn(Tn) → 0 = ⇒ βn(Tn) ∼ e−nD1(ρσ)

slide-9
SLIDE 9

Quantifying the trade-off

  • Stein’s lemma:

αn(Tn) → 0 = ⇒ βn(Tn) ∼ e−nD1(ρσ)

  • Direct domain:

Quantum Hoeffding bound1 βn(Tn) ∼ e−nr = ⇒ αn(Tn) ∼ e−nHr, r < D1(ρσ) Converse domain: Quantum Han-Kobayashi bound2 βn(Tn) ∼ e−nr = ⇒ αn(Tn) ∼ 1 − e−nH∗

r ,

r > D1(ρσ)

  • Hoeffding divergences:

Hr := sup

0<α<1

α − 1 α [r − Dα (ρ σ)] H∗

r := sup 1<α

α − 1 α [r − D∗

α (ρ σ)]

1Hayashi; Nagaoka; Audenaert, Nussbaum, Szkoła, Verstraete; 2006 2Mosonyi, Ogawa, 2013

slide-10
SLIDE 10

Quantum Rényi divergences

  • p, q probability distributions on X,

α ∈ [0, +∞) \ {1}: Dα (pq) := 1 α − 1 log

  • x p(x)αq(x)1−α
slide-11
SLIDE 11

Quantum Rényi divergences

  • p, q probability distributions on X,

α ∈ [0, +∞) \ {1}: Dα (pq) := 1 α − 1 log

  • x p(x)αq(x)1−α
  • Quantum Rényi divergences:1

Dα (ρ σ) := 1 α − 1 log Tr ρασ1−α D∗

α (ρ σ) :=

1 α − 1 log Tr

  • ρ

1 2 σ 1−α α ρ 1 2

α

  • The right quantum extension is

Dq

α(ρσ) :=

  • Dα(ρσ),

α ∈ [0, 1), D∗

α(ρσ),

α ∈ (1, +∞].

1Petz 1986;

Müller-Lennert, Dupuis, Szehr, Fehr, Tomamichel, 2013; Wilde, Winter, Yang, 2013

slide-12
SLIDE 12

Mathematical properties

  • Both Dα and D∗

α are monotone increasing in α

lim

α→1 D(v) α (ρσ) = D1(ρσ) := D(ρσ) := Tr ρ(log ρ − log σ)

  • Araki-Lieb-Thirring inequality:

D∗

α (ρ σ) ≤ Dα (ρ σ) ,

α ∈ [0, +∞] Equality for α = 1 and commuting states.

  • Monotonicity:

Dα (Φ(ρ) Φ(σ)) ≤ Dα (ρ σ) , α ∈ [0, 2] D∗

α (Φ(ρ) Φ(σ)) ≤ D∗ α (ρ σ) ,

α ∈ [1/2, +∞] = ⇒ Dq

α(Φ(ρ) Φ(σ)) ≤ Dq α(ρσ),

α ∈ [0, +∞]

slide-13
SLIDE 13

The fidelity

D∗

α (ρ σ) :=

1 α − 1 log Tr

  • ρ

1 2 σ 1−α α ρ 1 2

α α = 1/2: D∗

α (ρ σ)

= −2 log Tr

  • ρ

1 2 σρ 1 2

= −2 log F(ρ, σ) Operational interpretation??

slide-14
SLIDE 14

More Rényi divergences

  • In classical information theory, trade-offs in many problems are

quantified by Rényi divergences and derived quantities.

  • How about quantum?

Probably also.

  • Do we get any other notions of Rényi divergences apart from Dα and

D∗

α?

Probably not.

  • What are the right (=operational) definitions of the Rényi extensions
  • f information quantities?

E.g., Rényi mutual information, Rényi capacity, Rényi conditional mutual information?

slide-15
SLIDE 15

More Rényi divergences

  • Rényi mutual information:

I(v)

α (A : B)ρ := inf σB D(v) α (ρABρA ⊗ σB)

Operational interpretation? Yes, for all quantum values.1 Hypothesis testing H0 : ρ⊗n

AB

vs. H1 : ρ⊗n

A ⊗ S(H⊗n B ).

  • Rényi-Holevo capacities:

W : X → S(HB) channel χ(v)

α (W) := sup

  • I(v)

α (X : B)ρXB : ρXB =

  • x

p(x)|xx|X ⊗ W(x)

  • Operational interpretation2 for α > 1 and (x) = ∗

Strong converse exponent of classical-quantum channel coding.

  • Channel Rényi mutual information: N : A → B

CPTP I(v)

α (N) := sup ψRA

I(v)

α (R : B)N(ψRA)

Partial results (Cooney, Mosonyi, Wilde, 2014).

1Hayashi, Tomamichel, 2014; 2Mosonyi, Ogawa, 2014

slide-16
SLIDE 16

More Rényi divergences

  • Channel Rényi divergences:

Ni : A → B CPTP D(v)

α (N1N2) := sup ψRA

D(v)

α (N1(ψRA)N2(ψRA))

Operational interpretation? Trivial one for all quantum values. Non-trivial one for α > 1, (x) = ∗, and N2(.) = Rσ(.) := σ Tr(.) replacer channel. (Cooney, Mosonyi, Wilde, 2014)

.

slide-17
SLIDE 17

Binary channel discrimination

  • Two candidates for the identity of a channel:

H0 : N0, H1 : N1 n independent uses: H0 : N ⊗n , H1 : N ⊗n

1

  • Adaptive discrimination strategy:

Binary measurement at the end.

  • Non-adaptive strategy:

input ϕRnAn = ⇒ output N ⊗n

i

ϕRnAn Product strategy: ϕRnAn = ϕ⊗n

RA

= ⇒ ouput (NiϕRA)⊗n

slide-18
SLIDE 18

Binary channel discrimination

  • output

ρRnBn (N = N0)

  • r

σRnBn (N = N1) measurement (Tn, I − Tn) at the end

  • error probabilities:

βx

ε (N ⊗n

N ⊗n

1

) := inf{Tr σRnBnTn : Tr ρRnBn(I − Tn) ≤ ε} αx

r(N ⊗n

N ⊗n

1

) := inf

  • Tr ρRnBn(I − Tn) : Tr σRnBnTn ≤ 2−nr

x = pr or x = ad

slide-19
SLIDE 19

Trade-off exponents with product strategies

  • error probabilities:

βx

ε (N ⊗n

N ⊗n

1

) := inf{Tr σRnBnTn : Tr ρRnBn(I − Tn) ≤ ε} αx

r(N ⊗n

N ⊗n

1

) := inf

  • Tr ρRnBn(I − Tn) : Tr σRnBnTn ≤ 2−nr
  • If only product strategies are allowed:

x = pr lim

n→+∞ − 1

n log βx

ε (N ⊗n

N ⊗n

1

) = D(N0N1) := sup

ψRA

D(N0(ψRA)N1(ψRA)), lim

n→+∞ − 1

n log αx

n,r = Hr(N0N1)

:= sup

ψRA

Hr(N0(ψRA)N1(ψRA)), lim

n→+∞ − 1

n log(1 − αx

n,r) = H∗ r (N0N1)

:= inf

ψRA H∗ r (N0(ψRA)N1(ψRA)),

slide-20
SLIDE 20

Channel divergences

  • Channel Hoeffding (anti-)divergences:

Hr(N0N1) = sup

ψRA

Hr(N0(ψRA)N1(ψRA)), H∗

r (N0N1) = inf ψRA H∗ r (N0(ψRA)N1(ψRA)),

  • alternative expressions (due to minimax)

Hr(N0N1) = sup

0<α<1

α − 1 α [r − Dα(N0N1)] , H∗

r (N0N1) = sup 1<α

α − 1 α [r − D∗

α(N0N1)] ,

where Dα(N0N1) and D∗

α(N0N1) are the channel Rényi

divergences: Dα(N0N1) := sup

ψRA

Dα(N0(ψRA)N1(ψRA)), D∗

α(N0N1) := sup ψRA

D∗

α(N0(ψRA)N1(ψRA)).

slide-21
SLIDE 21

Adaptive strategies

  • What about adaptive strategies?
  • Classical channels: No difference between the error exponents

(Hayashi, 2009)

  • Replacer channels:

N0(.) = Rρ(.) := ρ Tr(.), N1(.) = Rσ(.) := σ Tr(.) All channel divergences coincide with the state divergences; e.g. Dα(RρRσ) = Dα(ρσ) Expectation: adaptive strategies don’t make a difference, and the channel discrimination problem reduces to a state discrimnation problem.

slide-22
SLIDE 22

Channel vs. state discrimination

  • Interpolation between state discrimination and channel discrimination:

H0 : N0 = N general channel H1 : N1 = R

σ replacer channel

  • Theorem:1 The strong converse exponent is given by

lim

n→+∞ − 1

n log(1 − αad

r (N ⊗nR⊗n

  • σ )) = sup

α>1

α − 1 α [r − D∗

α(NR σ)] .

Adaptive strategies don’t give a benefit over product strategies.

  • Corollary: (Stein’s lemma)

lim

n→+∞ − 1

n log βad

ε (N ⊗nR⊗n

  • σ )) = D(NR

σ).

1Cooney, Mosonyi, Wilde, 2014

slide-23
SLIDE 23

Proof of achievability

αad

n,r := αad r (N ⊗nR⊗n

  • σ )

lim sup

n→+∞

− 1 n log(1 − αad

n,r)

≤ inf

ψRA lim sup n→+∞

− 1 n log

  • 1 − αr(N(ψRA)⊗nR

σ(ψRA)⊗n)

  • = inf

ψRA H∗ r (N(ψRA)R σ(ψRA))

= inf

ψRA sup α>1

α − 1 α [r − D∗

α (N(ψRA)ψR ⊗

σ)] = sup

α>1

inf

ψRA

α − 1 α [r − D∗

α (N(ψRA)ψR ⊗

σ)] = sup

α>1

α − 1 α [r − D∗

α(NR σ)]

= H∗

r (NRσ)

slide-24
SLIDE 24

Proof of optimality

  • post-measurement probabilities:

pn := Tr TρRnBn, qn := Tr TσRnBn

  • Monotonicity of Rényi divergences:

α > 1 D∗

α(ρRnBnσRnBn) ≥ Dα((pn, 1 − pn)(qn, 1 − qn))

≥ 1 α − 1 log pα

nq1−α n

1 n log pn ≤ α − 1 α 1 n log qn + 1 nD∗

α(ρRnBnσRnBn)

  • assume qn ≤ e−nr

1 n log(1 − αad

n,r) ≤ α − 1

α

  • −r + 1

nD∗

α(ρRnBnσRnBn)

slide-25
SLIDE 25

Proof of optimality

D∗

α(ρRnBnσRnBn)

= D∗

α (NAn→Bn (ρRnAn) σRn ⊗

σ) = α α − 1 log

  • (σRn ⊗

σ)

1−α 2α NAn→Bn (ρRnAn) (σRn ⊗

σ)

1−α 2α

  • α

= α α − 1 log

  • Θ

σ

1−α α

  • NAn→Bn
  • σ

1−α 2α

Rn ρRnAnσ

1−α 2α

Rn

  • α

ΘX(.) := X1/2(.)X1/2

slide-26
SLIDE 26

Proof of optimality

  • Θ

σ

1−α α

  • NAn→Bn
  • σ

1−α 2α

Rn ρRnAnσ

1−α 2α

Rn

  • α

=

  • Θ

σ

1−α α

  • NAn→Bn
  • σ

1−α 2α

Rn ρRnAnσ

1−α 2α

Rn

  • α
  • σ

1−α 2α

Rn ρRnσ

1−α 2α

Rn

  • α

·

  • σ

1−α 2α

Rn ρRnσ

1−α 2α

Rn

  • α

≤   sup

XRnAn≥0

  • Θ

σ

1−α α

  • NAn→Bn

XRnAn

  • α

XRnα   ·

  • σ

1−α 2α

Rn ρRnσ

1−α 2α

Rn

  • α

=

  • Θ

σ

1−α α

  • N
  • CB,1→α ·
  • σ

1−α 2α

Rn ρRnσ

1−α 2α

Rn

  • α

.

slide-27
SLIDE 27

Proof of optimality

D∗

α(ρRnBnσRnBn)

≤ α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α +

α α − 1 log

  • σ

1−α 2α

Rn ρRnσ

1−α 2α

Rn

  • α

= α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α + D∗

α(ρRnσRn)

≤ α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α + D∗

α(ρRnAnσRnAn)

≤ α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α + D∗

α(ρRn−1Bn−1σRn−1Bn−1)

. . . ≤ n α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α
slide-28
SLIDE 28

Proof of optimality

1 n log(1 − αad

n,r)

≤ α − 1 α

  • −r + 1

nD∗

α(ρRnBnσRnBn)

α − 1 α

  • −r +

α α − 1 log

  • Θ

σ

1−α α

  • N
  • CB,1→α
  • =

α − 1 α [−r + D∗

α(NR σ)] .

lim inf

n→+∞ − 1

n log(1 − αad

n,r)

≥ sup

α>1

α − 1 α [r − D∗

α(NR σ)]

= H∗

r (NR σ)

slide-29
SLIDE 29

Channel vs. state discrimination

H0 : N0 = N general channel H1 : N1 = R

σ replacer channel

  • Theorem:2 The strong converse exponent is given by

lim

n→+∞ − 1

n log(1 − αad

r (N ⊗nR⊗n

  • σ )) = sup

α>1

α − 1 α [r − D∗

α(NR σ)] .

Adaptive strategies don’t give a benefit over product strategies.

  • Corollary: (Stein’s lemma)

lim

n→+∞ − 1

n log βad

ε (N ⊗nR⊗n

  • σ )) = D1(NR

σ).

Proof: lim

αց1 D∗ α(NR σ) = D1(NR σ).

2Cooney, Mosonyi, Wilde, 2014

slide-30
SLIDE 30

Summary

  • Rényi divergences quantify the trade-off between two competing
  • perational quantities describing a problem.
  • Divergence measures on the state space with operational relevance.
  • Two different families of Rényi divergences are needed in the quantum

case.