Comparison of Local and Global Contraction Coefficients for KL - PowerPoint PPT Presentation

Comparison of Local and Global Contraction Coefficients for KL Divergence Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology 5 November 2015 A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 1 / 32

Outline Introduction to Contraction Coefficients 1 Measuring Ergodicity Contraction Coefficients of Strong Data Processing Inequalities Motivation from Inference 2 Contraction Coefficients for KL and χ 2 -Divergences 3 Bounds between Contraction Coefficients 4 A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 2 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π aperiodic ⇒ W k → π 1 T (rank 1 matrix) A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π aperiodic ⇒ W k → π 1 T (rank 1 matrix) Rate of convergence? A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π aperiodic ⇒ W k → π 1 T (rank 1 matrix) Rate of convergence? Perron-Frobenius: 1 = λ 1 ( W ) > | λ 2 ( W ) | ≥ · · · ≥ | λ n ( W ) | A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π aperiodic ⇒ W k → π 1 T (rank 1 matrix) Rate of convergence? Perron-Frobenius: 1 = λ 1 ( W ) > | λ 2 ( W ) | ≥ · · · ≥ | λ n ( W ) | Rate of convergence determined by | λ 2 ( W ) | ← − coefficient of ergodicity A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Consider an ergodic Markov chain with n × n column stochastic transition matrix W . irreducible ⇒ unique stationary distribution π : W π = π aperiodic ⇒ W k → π 1 T (rank 1 matrix) Rate of convergence? Perron-Frobenius: 1 = λ 1 ( W ) > | λ 2 ( W ) | ≥ · · · ≥ | λ n ( W ) | Rate of convergence determined by | λ 2 ( W ) | ← − coefficient of ergodicity Want: A guarantee on the relative improvement i.e. for any distribution p , W k +1 p is “closer” to π than W k p . A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 3 / 32

Measuring Ergodicity Let d : P × P → [0 , ∞ ] be a divergence measure on the simplex P . Want: ∀ p ∈ P , d ( Wp , W π ) ≤ η d ( π, W ) d ( p , π ) �� = π for some contraction coefficient η d ( π, W ) ∈ [0 , 1]. A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 4 / 32

Measuring Ergodicity Let d : P × P → [0 , ∞ ] be a divergence measure on the simplex P . Want: ∀ p ∈ P , d ( Wp , W π ) ≤ η d ( π, W ) d ( p , π ) �� = π for some contraction coefficient η d ( π, W ) ∈ [0 , 1]. This would mean that: d ( W k p , π ) ≤ η d ( π, W ) k d ( p , π ) . ∀ p ∈ P , A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 4 / 32

Measuring Ergodicity Let d : P × P → [0 , ∞ ] be a divergence measure on the simplex P . Want: ∀ p ∈ P , d ( Wp , W π ) ≤ η d ( π, W ) d ( p , π ) �� = π for some contraction coefficient η d ( π, W ) ∈ [0 , 1]. This would mean that: d ( W k p , π ) ≤ η d ( π, W ) k d ( p , π ) . ∀ p ∈ P , d η d ( π, W ) < 1 ⇒ W k p − → π geometrically fast with rate η d ( π, W ). A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 4 / 32

Measuring Ergodicity Let d : P × P → [0 , ∞ ] be a divergence measure on the simplex P . Want: ∀ p ∈ P , d ( Wp , W π ) ≤ η d ( π, W ) d ( p , π ) �� = π for some contraction coefficient η d ( π, W ) ∈ [0 , 1]. This would mean that: d ( W k p , π ) ≤ η d ( π, W ) k d ( p , π ) . ∀ p ∈ P , d η d ( π, W ) < 1 ⇒ W k p − → π geometrically fast with rate η d ( π, W ). So, η d ( π, W ) is a coefficient of ergodicity, and we define it as: d ( Wp , W π ) η d ( π, W ) � sup . d ( p , π ) p : p � = π A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 4 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? Does the ℓ 2 -norm work? A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? Does the ℓ 2 -norm work? � W π − Wp � 2 = � W ( π − p ) � 2 ≤ � W � 2 � π − p � 2 where the spectral norm � W � 2 is the largest singular value of W . A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? Does the ℓ 2 -norm work? � W π − Wp � 2 = � W ( π − p ) � 2 ≤ � W � 2 � π − p � 2 where the spectral norm � W � 2 is the largest singular value of W . � W � 2 > 1 is possible ... � A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? Does the ℓ 2 -norm work? � W π − Wp � 2 = � W ( π − p ) � 2 ≤ � W � 2 � π − p � 2 where the spectral norm � W � 2 is the largest singular value of W . � W � 2 > 1 is possible ... � Dobrushin-Doeblin Coefficient of Ergodicity: The ℓ 1 -norm (total variation distance) works! � A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Measuring Ergodicity Can we define notions of distance between distributions which make W a contraction? Does the ℓ 2 -norm work? � W π − Wp � 2 = � W ( π − p ) � 2 ≤ � W � 2 � π − p � 2 where the spectral norm � W � 2 is the largest singular value of W . � W � 2 > 1 is possible ... � Dobrushin-Doeblin Coefficient of Ergodicity: The ℓ 1 -norm (total variation distance) works! � � W π − Wp � 1 = � W ( π − p ) � 1 ≤ η TV ( π, W ) � π − p � 1 � W π − Wp � 1 where η TV ( π, W ) � sup p : p � = π ∈ [0 , 1] is the Dobrushin-Doeblin � π − p � 1 contraction coefficient. A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 5 / 32

Csisz´ ar f -Divergence Definition (Csisz´ ar f -Divergence) Given distributions R X and P X on X , we define their f -divergence as: � R X ( x ) � � D f ( R X || P X ) � P X ( x ) f P X ( x ) x ∈X where f : R + → R is convex and f (1) = 0. A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 6 / 32

Csisz´ ar f -Divergence Definition (Csisz´ ar f -Divergence) Given distributions R X and P X on X , we define their f -divergence as: � R X ( x ) � � D f ( R X || P X ) � P X ( x ) f P X ( x ) x ∈X where f : R + → R is convex and f (1) = 0. Non-negativity: D f ( R X || P X ) ≥ 0 with equality iff R X = P X . A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 6 / 32

Csisz´ ar f -Divergence Definition (Csisz´ ar f -Divergence) Given distributions R X and P X on X , we define their f -divergence as: � R X ( x ) � � D f ( R X || P X ) � P X ( x ) f P X ( x ) x ∈X where f : R + → R is convex and f (1) = 0. Non-negativity: D f ( R X || P X ) ≥ 0 with equality iff R X = P X . Data Processing Inequality: For a fixed channel P Y | X : ∀ R X , P X , D f ( R Y || P Y ) ≤ D f ( R X || P X ) where R Y and P Y are output pmfs corresponding to R X and P X . A. Makur & L. Zheng (MIT) Local and Global Contraction Coefficients 5 November 2015 6 / 32

Comparison of Local and Global Contraction Coefficients for KL - PowerPoint PPT Presentation

Comparison of Local and Global Contraction Coefficients for KL Divergence Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology 5 November 2015 A. Makur & L. Zheng (MIT) Local and Global Contraction

Mechanism of Contraction 25a A&P: Muscular System - Mechanism of Contraction Class

Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Q.-C. Pham N. Tabareau J.-J.

Recall 1 Wavelet coefficients of images are Laplacian distributed! The various wavelet

Linear Differential Equations With Constant Coefficients Alan H. Stein University of Connecticut

Tensor Contraction with Extended BLAS Kernels on CPU and GPU Yang Shi University of California,

Strassens Algorithm for Tensor Contraction Jianyu Huang Joint work with Devin A. Matthews and

Banach Contraction Mapping Principle Oksana Bihun March 2, 2010 Department of Mathematics and

THE CardioECR/iCELL SYSTEM @ CHARLES RIVER A tool for excitation-contraction coupling in

A Contraction Method to Decide MSO Theories of Trees Gabriele Puppis Departement of Mathematics

On trees invariant under edge contraction Pascal Maillard (Universit Paris-Sud) based on joint

Global and local alignments Global vs. local alignments Global: align all nucleotides

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Fixed point theorems for maps with various local contraction properties Krzysztof Chris Ciesielski

Exactly solvable models of tilings and LittlewoodRichardson coefficients P. Zinn-Justin

Lecture 5- ECE 240a Density A and B coefficients Ver Chap. 7&8 Cross Section and

Clebsch-Gordan Coefficients and Principal Series Representations Zhuohui Zhang, Rutgers

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

UMBC A B M A L T F O U M B C I M Y O R T 1 (12/8/04) I E S R C E O V U

Additional Coding Opportunities in Cache-Aided Networks Mich` ele Wigger Telecom ParisTech

Proof mining in ergodic theory - a survey Philipp Gerhardy Department of Mathematics University

Infinite groups, actions on the interval [0,1] and von Neumann algebras Algemeen

On delocalization in the six-vertex model Marcin Lis University of Vienna February 10, 2020 1 /

Stochastic Properties of disturbed Elementary Cellular Automata Micha Posiewnik Institute of

Comparison of Local and Global Contraction Coefficients for KL - PowerPoint PPT Presentation

Comparison of Local and Global Contraction Coefficients for KL Divergence Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology 5 November 2015 A. Makur & L. Zheng (MIT) Local and Global Contraction

Mechanism of Contraction 25a A&amp;P: Muscular System - Mechanism of Contraction Class

Stochastic contraction BACS Workshop Chamonix, January 14, 2008 Q.-C. Pham N. Tabareau J.-J.

Recall 1 Wavelet coefficients of images are Laplacian distributed! The various wavelet

Linear Differential Equations With Constant Coefficients Alan H. Stein University of Connecticut

Tensor Contraction with Extended BLAS Kernels on CPU and GPU Yang Shi University of California,

Strassens Algorithm for Tensor Contraction Jianyu Huang Joint work with Devin A. Matthews and

Banach Contraction Mapping Principle Oksana Bihun March 2, 2010 Department of Mathematics and

THE CardioECR/iCELL SYSTEM @ CHARLES RIVER A tool for excitation-contraction coupling in

A Contraction Method to Decide MSO Theories of Trees Gabriele Puppis Departement of Mathematics

On trees invariant under edge contraction Pascal Maillard (Universit Paris-Sud) based on joint

Global and local alignments Global vs. local alignments Global: align all nucleotides

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Fixed point theorems for maps with various local contraction properties Krzysztof Chris Ciesielski

Exactly solvable models of tilings and LittlewoodRichardson coefficients P. Zinn-Justin

Lecture 5- ECE 240a Density A and B coefficients Ver Chap. 7&amp;8 Cross Section and

Clebsch-Gordan Coefficients and Principal Series Representations Zhuohui Zhang, Rutgers

Lecture 3 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

Lecture 3 Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

UMBC A B M A L T F O U M B C I M Y O R T 1 (12/8/04) I E S R C E O V U

Additional Coding Opportunities in Cache-Aided Networks Mich` ele Wigger Telecom ParisTech

Proof mining in ergodic theory - a survey Philipp Gerhardy Department of Mathematics University

Infinite groups, actions on the interval [0,1] and von Neumann algebras Algemeen

On delocalization in the six-vertex model Marcin Lis University of Vienna February 10, 2020 1 /

Stochastic Properties of disturbed Elementary Cellular Automata Micha Posiewnik Institute of

Mechanism of Contraction 25a A&P: Muscular System - Mechanism of Contraction Class

Lecture 5- ECE 240a Density A and B coefficients Ver Chap. 7&8 Cross Section and