Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 11, 2019 @ ICML, Long Beach 1/13

Collaborators Alessandro Barp Fran¸ cois-Xavier Briol Jackson Gorham Mark Girolami Lester Mackey Chris Oates 2/13

Empirical Approximation Problem A major problem in machine learning and modern statistics is to approximate some difficult-to-compute density p defined on some domain X ⊆ R d where normalisation constant is unknown . I.e., p ( x ) = ˜ p ( x ) /Z and Z > 0 is unknown. We consider an empirical approximation of p with points { x i } n i =1 : n p n ( x ) = 1 � ˆ δ ( x − x i ) , n i =1 so that for test function f : X → R : n � f ( x ) p ( x )d x ≈ 1 � f ( x i ) . n X i =1 A popular approach is Markov chain Monte Carlo. 3/13

Discrepancy Idea – construct a measure of discrepancy D (ˆ p n , p ) with desirable features: ∗ • Detect (non)convergence. I.e., D (ˆ p n , p ) → 0 only if ˆ p n − → p . • Efficiently computable with limited access to p . Unfortunately not the case for many popular discrepancy measures: • Kullback-Leibler divergence, • Wasserstein distance, • Maximum mean discrepancy (MMD). 4/13

Kernel Embedding and MMD Kernel embedding of a distribution p � µ p ( · ) = k ( x, · ) p ( x )d x (a function in the RKHS K ) Consider the maximum mean discrepancy (MMD) as an option for D : p n − µ p � K =: D k,p ( { x i } n D (ˆ p n , p ) := � µ ˆ i =1 ) i =1 ) 2 = � µ ˆ ∴ D k,p ( { x i } n p n − µ p � 2 K = � µ ˆ p n − µ p , µ ˆ p n − µ p � = � µ ˆ p n , µ ˆ p n � − 2 � µ ˆ p n , µ p � + � µ p , µ p � We are faced with intractable integrals w.r.t. p ! For a Stein kernel k 0 : � µ p ( · ) = k 0 ( x, · ) p ( x )d x = 0 . i =1 ) 2 =: KSD 2 ! p n − µ p � 2 p n � 2 K 0 =: D k 0 ,p ( { x i } n ∴ � µ ˆ K 0 = � µ ˆ 5/13

Kernel Stein Discrepancy (KSD) The kernel Stein discrepancy (KSD) is given by � n n � i =1 ) = 1 � � D k 0 ,p ( { x i } n � k 0 ( x i , x j ) , � n i =1 j =1 where k 0 is the Stein kernel k 0 ( x, x ′ ) := T p T ′ p k ( x, x ′ ) = ∇ x · ∇ x ′ k ( x, x ′ ) + �∇ x log p ( x ) , ∇ x ′ k ( x, x ′ ) � + �∇ x ′ log p ( x ′ ) , ∇ x k ( x, x ′ ) � + �∇ x log p ( x ) , ∇ x ′ log p ( x ′ ) � k ( x, x ′ ) , with T p f = ∇ ( pf ) /p . ( T p is a Stein operator.) • This is computable without the normalisation constant. • Requires gradient information ∇ log p ( x i ) . • Detects (non)convergence for an appropriately chosen k . 6/13

Stein Points (SP) The main idea of Stein Points is the greedy minimisation of KSD: D k 0 ,p ( { x i } j − 1 x j | x 1 , . . . , x j − 1 ← arg min i =1 ∪ { x } ) x ∈X j − 1 � = arg min k 0 ( x, x ) + 2 k 0 ( x, x i ) . x ∈X i =1 A global optimisation step is needed for each iteration. 7/13

Stein Point Markov Chain Monte Carlo (SP-MCMC) We propose to replace the global minimisation at each iteration j of the SP method with a local search based on a p -invariant Markov chain of length m j . The proposed SP-MCMC method proceeds as follows: 1. Fix an initial point x 1 ∈ X . 2. For j = 2 , . . . , n : a. Select i ∗ ∈ { 1 , . . . , j − 1 } according to criterion crit ( { x i } j − 1 i =1 ) . b. Generate ( y j,i ) m j i =1 from a p -invariant Markov chain with y j, 1 = x i ∗ . i =1 D k 0 ,p ( { x i } j − 1 c. Set x j ← arg min x ∈{ y j,i } i =1 ∪ { x } ) . mj For crit , three different approaches are considered: • LAST selects the point last added: i ∗ := j − 1 . • RAND selects i ∗ uniformly at random in { 1 , . . . , j − 1 } . • INFL selects i ∗ to be the index of the most influential point in { x i } j − 1 i =1 . We call x ∗ i the most influential point if removing it from the point set creates the greatest increase in KSD. 8/13

Gaussian Mixture Model Experiment MCMC LAST RAND INFL 0 0 0 0 log KSD -2 -2 -2 -2 -4 -4 -4 -4 500 1000 500 1000 500 1000 500 1000 j 2 2 2 2 0 0 0 0 -2 -2 -2 -2 500 1000 500 1000 500 1000 500 1000 j 1.5 1.5 1.5 1.5 SP-MCMC Density 1 1 1 1 MCMC 0.5 0.5 0.5 0.5 0 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 Jump 2 9/13

IGARCH Experiment ( d = 2 ) -4 -5 -6 -7 log E P MALA -8 RWM SVGD -9 MED SP -10 SP-MALA LAST SP-MALA INFL -11 SP-RWM LAST SP-RWM INFL 2 4 6 8 10 12 log n eval SP-MCMC methods are compared against the original SP (Chen et al., 2018), MED (Roshan Joseph et al., 2015) and SVGD (Liu & Wang, 2016), as well as the Metropolis-adjusted Langevin algorithm ( MALA ) and random-walk Metropolis ( RWM ). 10/13

ODE Experiment ( d = 10 ) 8 7 6 5 log KSD MALA 4 RWM SVGD MED 3 SP SP-MALA LAST 2 SP-MALA INFL SP-RWM LAST 1 SP-RWM INFL 4 6 8 10 12 log n eval SP-MCMC methods are compared against the original SP (Chen et al., 2018), MED (Roshan Joseph et al., 2015) and SVGD (Liu & Wang, 2016), as well as the Metropolis-adjusted Langevin algorithm ( MALA ) and random-walk Metropolis ( RWM ). 11/13

Theoretical Guarantees The convergence of the proposed SP-MCMC method is established, with an explicit bound provided on the KSD in terms of the V -uniform ergodicity of the Markov transition kernel. Example: SP-MALA Convergence Let ( m j ) n j =1 ⊂ N be a fixed sequence and let { x i } n i =1 denote the SP-MALA output, based on Markov chains ( Y j,l ) m j l =1 , j ∈ N . (Under certain regularity con- ditions) MALA is V -uniformly ergodic for V ( x ) = 1 + � x � 2 and ∃ C > 0 such that n ≤ C log( n ∧ m i ) � � D k 0 ,p ( { x i } n i =1 ) 2 � . E n n ∧ m i i =1 12/13

Paper, Code and Poster • Paper is available at: https://arxiv.org/pdf/1905.03673.pdf • Code is available at: https://github.com/wilson-ye-chen/sp-mcmc • Check out the poster at Pacific Ballroom #216 from 6:30pm to 8pm! 13/13

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 11, 2019 @ ICML, Long Beach 1/13 Collaborators Alessandro Barp Fran cois-Xavier Briol Jackson Gorham Mark Girolami Lester Mackey Chris

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 89 in

Markov Chains and MCMC CompSci 590.04 Instructor:

Markov Chains DS GA 1002 Probability and Statistics for Data Science

Information Economics The Signaling Theory Ling-Chieh Kung Department of Information Management

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo

Pulsar Detection and Parameter Estimation with MCMC - Six Parameters Nelson Christensen Physics

Overview of Markov chains David Gleich Purdue University

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 11, 2019 @ ICML, Long Beach 1/13 Collaborators Alessandro Barp Fran cois-Xavier Briol Jackson Gorham Mark Girolami Lester Mackey Chris

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 89 in

Markov Chains and MCMC CompSci 590.04 Instructor:

Markov Chains DS GA 1002 Probability and Statistics for Data Science

Information Economics The Signaling Theory Ling-Chieh Kung Department of Information Management

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo

Pulsar Detection and Parameter Estimation with MCMC - Six Parameters Nelson Christensen Physics

Overview of Markov chains David Gleich Purdue University

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly