Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 - PowerPoint PPT Presentation

Rényi divergences and hypothesis testing problems Milán Mosonyi 1 , 2 1 Física Teòrica: Informació i Fenomens Quàntics Universitat Autónoma Barcelona 2 Mathematical Institute Budapest University of Technology and Economics Paris 2015

Binary state discrimination • Two candidates for the true state of a system: H 0 : ρ vs. H 1 : σ • Many identical copies are available: H 0 : ρ ⊗ n vs. H 1 : σ ⊗ n on H ⊗ n . • Decision is based on a binary POVM ( T, I − T ) α n ( T ) := Tr ρ ⊗ n ( I n − T ) • error probabilities: (first kind) β n ( T ) := Tr σ ⊗ n T (second kind) • trade-off: min 0 ≤ T ≤ I { α n ( T ) + β n ( T ) } > 0 unless ρ n ⊥ σ n

Binary state discrimination • Two candidates for the true state of a system: H 0 : ρ vs. H 1 : σ • Many identical copies are available: H 0 : ρ ⊗ n vs. H 1 : σ ⊗ n on H ⊗ n . • Decision is based on a binary POVM ( T, I − T ) α n ( T ) := Tr ρ ⊗ n ( I n − T ) • error probabilities: (first kind) β n ( T ) := Tr σ ⊗ n T (second kind) • trade-off: min 0 ≤ T ≤ I { α n ( T ) + β n ( T ) } > 0 unless ρ n ⊥ σ n • Quantum Stein’s lemma: 1 β n ( T n ) ∼ e − nD 1 ( ρ � σ ) α n ( T n ) → 0 = ⇒ is the optimal decay D 1 ( ρ � σ ) := Tr ρ (log ρ − log σ ) relative entropy 2 1 Hiai, Petz, 1991, Ogawa, Nagaoka, 2001; 2 Umegaki, 1962

Relative entropy • The (quantum) Stein’s lemma gives an operational interpretation to the (quantum) relative entropy (Kullback-Leibler divergence). • Notion of “distance” on the state space. • All relevant information measures are derived from it: entropy: H ( ρ ) := − D 1 ( ρ � I ) I ( A : B ) ρ := D 1 ( ρ AB � ρ A ⊗ ρ B ) mutual information: all sorts of channel capacities, etc.

Relative entropy • The (quantum) Stein’s lemma gives an operational interpretation to the (quantum) relative entropy (Kullback-Leibler divergence). • Notion of “distance” on the state space. • All relevant information measures are derived from it (?) entropy: H ( ρ ) := − D 1 ( ρ � I ) I ( A : B ) ρ := D 1 ( ρ AB � ρ A ⊗ ρ B ) mutual information: all sorts of channel capacities, etc. • Statistical divergence ∆ on the state space: (1) ∆( ρ � σ ) ≥ 0 , ∆( ρ, σ ) = 0 ⇐ ⇒ ρ = σ (2) ∆(Φ( ρ ) � Φ( σ )) ≤ ∆( ρ � σ ) Φ stochastic map Example: f -divergences (3) operational interpretation?

Other statistical divergences • Trace-norm distance: H 0 : ρ vs. H 1 : σ 0 ≤ T ≤ I { α ( T ) + β ( T ) } = 1 − 1 min 2 � ρ − σ � 1 ∆ Tr ( ρ � σ ) := 1 2 � ρ − σ � 1

Other statistical divergences • Trace-norm distance: H 0 : ρ vs. H 1 : σ 0 ≤ T ≤ I { α ( T ) + β ( T ) } = 1 − 1 2 � ρ − σ � 1 min ∆ Tr ( ρ � σ ) := 1 2 � ρ − σ � 1 • Chernoff bound theorem: 1 � � ρ ⊗ n − σ ⊗ n � 1 − 1 � 1 ∼ e − nC ( ρ,σ ) 2 C ( ρ, σ ) := − 0 <α< 1 ( α − 1) D α ( ρ � σ ) inf Chernoff divergence 1 α − 1 log Tr ρ α σ 1 − α D α ( ρ � σ ) := Rényi divergences 1 Nussbaum, Szkoła, 2006; Audenaert et al., 2006

Quantifying the trade-off β n ( T n ) ∼ e − nD 1 ( ρ � σ ) • Stein’s lemma: α n ( T n ) → 0 = ⇒

Quantifying the trade-off β n ( T n ) ∼ e − nD 1 ( ρ � σ ) • Stein’s lemma: α n ( T n ) → 0 = ⇒ Quantum Hoeffding bound 1 • Direct domain: β n ( T n ) ∼ e − nr α n ( T n ) ∼ e − nH r , ⇒ r < D 1 ( ρ � σ ) = Quantum Han-Kobayashi bound 2 Converse domain: α n ( T n ) ∼ 1 − e − nH ∗ β n ( T n ) ∼ e − nr r , = ⇒ r > D 1 ( ρ � σ ) • Hoeffding divergences: α − 1 [ r − D α ( ρ � σ )] H r := sup α 0 <α< 1 α − 1 H ∗ [ r − D ∗ r := sup α ( ρ � σ )] α 1 <α 1 Hayashi; Nagaoka; Audenaert, Nussbaum, Szkoła, Verstraete; 2006 2 Mosonyi, Ogawa, 2013

Quantum Rényi divergences • p, q probability distributions on X , α ∈ [0 , + ∞ ) \ { 1 } : � 1 x p ( x ) α q ( x ) 1 − α D α ( p � q ) := α − 1 log

Quantum Rényi divergences • p, q probability distributions on X , α ∈ [0 , + ∞ ) \ { 1 } : � 1 x p ( x ) α q ( x ) 1 − α D α ( p � q ) := α − 1 log • Quantum Rényi divergences: 1 1 α − 1 log Tr ρ α σ 1 − α D α ( ρ � σ ) := � � α 1 1 − α 1 1 D ∗ 2 σ α ρ α ( ρ � σ ) := α − 1 log Tr ρ 2 • The right quantum extension is � D α ( ρ � σ ) , α ∈ [0 , 1) , D q α ( ρ � σ ) := D ∗ α ( ρ � σ ) , α ∈ (1 , + ∞ ] . 1 Petz 1986; Müller-Lennert, Dupuis, Szehr, Fehr, Tomamichel, 2013; Wilde, Winter, Yang, 2013

Mathematical properties • Both D α and D ∗ α are monotone increasing in α α → 1 D ( v ) lim α ( ρ � σ ) = D 1 ( ρ � σ ) := D ( ρ � σ ) := Tr ρ (log ρ − log σ ) • Araki-Lieb-Thirring inequality: D ∗ α ( ρ � σ ) ≤ D α ( ρ � σ ) , α ∈ [0 , + ∞ ] Equality for α = 1 and commuting states. • Monotonicity: D α (Φ( ρ ) � Φ( σ )) ≤ D α ( ρ � σ ) , α ∈ [0 , 2] D ∗ α (Φ( ρ ) � Φ( σ )) ≤ D ∗ α ( ρ � σ ) , α ∈ [1 / 2 , + ∞ ] D q α (Φ( ρ ) � Φ( σ )) ≤ D q = ⇒ α ( ρ � σ ) , α ∈ [0 , + ∞ ]

The fidelity � � α 1 1 1 − α 1 D ∗ 2 σ α ρ α ( ρ � σ ) := α − 1 log Tr ρ 2 α = 1 / 2 : � D ∗ 1 1 2 σρ α ( ρ � σ ) = − 2 log Tr ρ 2 = − 2 log F ( ρ, σ ) Operational interpretation??

More Rényi divergences • In classical information theory, trade-offs in many problems are quantified by Rényi divergences and derived quantities. • How about quantum? Probably also. • Do we get any other notions of Rényi divergences apart from D α and D ∗ α ? Probably not. • What are the right (=operational) definitions of the Rényi extensions of information quantities? E.g., Rényi mutual information, Rényi capacity, Rényi conditional mutual information?

More Rényi divergences I ( v ) σ B D ( v ) • Rényi mutual information: α ( A : B ) ρ := inf α ( ρ AB � ρ A ⊗ σ B ) Yes, for all quantum values. 1 Operational interpretation? Hypothesis testing H 0 : ρ ⊗ n H 1 : ρ ⊗ n A ⊗ S ( H ⊗ n vs. B ) . AB • Rényi-Holevo capacities: W : X → S ( H B ) channel � � � χ ( v ) I ( v ) α ( W ) := sup α ( X : B ) ρ X B : ρ X B = p ( x ) | x �� x | X ⊗ W ( x ) x Operational interpretation 2 for α > 1 and ( x ) = ∗ Strong converse exponent of classical-quantum channel coding. • Channel Rényi mutual information: N : A → B CPTP I ( v ) I ( v ) α ( N ) := sup α ( R : B ) N ( ψ RA ) ψ RA Partial results (Cooney, Mosonyi, Wilde, 2014) . 1 Hayashi, Tomamichel, 2014; 2 Mosonyi, Ogawa, 2014

More Rényi divergences • Channel Rényi divergences: N i : A → B CPTP D ( v ) D ( v ) α ( N 1 �N 2 ) := sup α ( N 1 ( ψ RA ) �N 2 ( ψ RA )) ψ RA Operational interpretation? Trivial one for all quantum values. Non-trivial one for α > 1 , ( x ) = ∗ , and N 2 ( . ) = R σ ( . ) := σ Tr( . ) replacer channel . ( Cooney, Mosonyi, Wilde, 2014) .

Binary channel discrimination • Two candidates for the identity of a channel: H 0 : N 0 , H 1 : N 1 H 0 : N ⊗ n H 1 : N ⊗ n n independent uses: , 0 1 • Adaptive discrimination strategy: Binary measurement at the end. ⇒ output N ⊗ n • Non-adaptive strategy: input ϕ R n A n = ϕ R n A n i ϕ R n A n = ϕ ⊗ n ⇒ ouput ( N i ϕ RA ) ⊗ n Product strategy: = RA

Binary channel discrimination • output ρ R n B n ( N = N 0 ) σ R n B n ( N = N 1 ) or ( T n , I − T n ) at the end measurement • error probabilities: β x ε ( N ⊗ n �N ⊗ n ) := inf { Tr σ R n B n T n : Tr ρ R n B n ( I − T n ) ≤ ε } 0 1 � Tr ρ R n B n ( I − T n ) : Tr σ R n B n T n ≤ 2 − nr � r ( N ⊗ n �N ⊗ n α x ) := inf 0 1 x = pr or x = ad

Trade-off exponents with product strategies • error probabilities: ε ( N ⊗ n �N ⊗ n β x ) := inf { Tr σ R n B n T n : Tr ρ R n B n ( I − T n ) ≤ ε } 0 1 � Tr ρ R n B n ( I − T n ) : Tr σ R n B n T n ≤ 2 − nr � α x r ( N ⊗ n �N ⊗ n ) := inf 0 1 • If only product strategies are allowed: x = pr n → + ∞ − 1 n log β x ε ( N ⊗ n �N ⊗ n lim ) = D ( N 0 �N 1 ) 0 1 := sup D ( N 0 ( ψ RA ) �N 1 ( ψ RA )) , ψ RA n → + ∞ − 1 n log α x lim n,r = H r ( N 0 �N 1 ) := sup H r ( N 0 ( ψ RA ) �N 1 ( ψ RA )) , ψ RA n → + ∞ − 1 n log(1 − α x n,r ) = H ∗ lim r ( N 0 �N 1 ) ψ RA H ∗ := inf r ( N 0 ( ψ RA ) �N 1 ( ψ RA )) ,

Channel divergences • Channel Hoeffding (anti-)divergences: H r ( N 0 �N 1 ) = sup H r ( N 0 ( ψ RA ) �N 1 ( ψ RA )) , ψ RA H ∗ ψ RA H ∗ r ( N 0 �N 1 ) = inf r ( N 0 ( ψ RA ) �N 1 ( ψ RA )) , • alternative expressions (due to minimax) α − 1 H r ( N 0 �N 1 ) = sup [ r − D α ( N 0 �N 1 )] , α 0 <α< 1 α − 1 H ∗ [ r − D ∗ r ( N 0 �N 1 ) = sup α ( N 0 �N 1 )] , α 1 <α where D α ( N 0 �N 1 ) and D ∗ α ( N 0 �N 1 ) are the channel Rényi divergences: D α ( N 0 �N 1 ) := sup D α ( N 0 ( ψ RA ) �N 1 ( ψ RA )) , ψ RA D ∗ D ∗ α ( N 0 �N 1 ) := sup α ( N 0 ( ψ RA ) �N 1 ( ψ RA )) . ψ RA

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 - PowerPoint PPT Presentation

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 Fsica Terica: Informaci i Fenomens Quntics Universitat Autnoma Barcelona 2 Mathematical Institute Budapest University of Technology and Economics Paris 2015

Mesoamerica Region NYI And Global Mission What does Global Mission Mean? It means that Regional

Determinants of migration streams in Myanmar Nyi Nyi and Phillip Guest September, 2017 A

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Ultraviolet and Infrared Divergences in Superstring Theory Ashoke Sen Harish-Chandra Research

Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT

Jeannie Albrecht and Danny Y. Huang Williams College http://gush.cs.williams.edu How do

Hardware LTI filters computing just right x F. de Dinechin, x x 2+ y 2+ z 2 log x s i

Adaptive Surrogate Modeling for Response Surface Approximations with Application to Bayesian

Q&A with Project Competition Peer Reviewers January 31, 2019 Slide de Summar ary

Stack machines equiv. to CFG and CFL closure (Using slides adapted from the book) Simulating

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Existentially closed C algebras, operator systems, and operator spaces Isaac Goldbring

Challenge Examples for Higher-Order ATPS Chad E. Brown March 29, 2019 We give three challenge

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 - PowerPoint PPT Presentation

Rnyi divergences and hypothesis testing problems Miln Mosonyi 1 , 2 1 Fsica Terica: Informaci i Fenomens Quntics Universitat Autnoma Barcelona 2 Mathematical Institute Budapest University of Technology and Economics Paris 2015

Mesoamerica Region NYI And Global Mission What does Global Mission Mean? It means that Regional

Determinants of migration streams in Myanmar Nyi Nyi and Phillip Guest September, 2017 A

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Testing 6.1 Specification testing Michel Bierlaire A short reminder on hypothesis testing

Ultraviolet and Infrared Divergences in Superstring Theory Ashoke Sen Harish-Chandra Research

Bridging the gap between Optimal Transport and MMD with Sinkhorn Divergences Aude Genevay MIT

Jeannie Albrecht and Danny Y. Huang Williams College http://gush.cs.williams.edu How do

Hardware LTI filters computing just right x F. de Dinechin, x x 2+ y 2+ z 2 log x s i

Adaptive Surrogate Modeling for Response Surface Approximations with Application to Bayesian

Q&amp;A with Project Competition Peer Reviewers January 31, 2019 Slide de Summar ary

Stack machines equiv. to CFG and CFL closure (Using slides adapted from the book) Simulating

Introduction to Data Mining Frequent Pattern Mining and Association Analysis Li Xiong Slide

Existentially closed C algebras, operator systems, and operator spaces Isaac Goldbring

Challenge Examples for Higher-Order ATPS Chad E. Brown March 29, 2019 We give three challenge

Q&A with Project Competition Peer Reviewers January 31, 2019 Slide de Summar ary