Adaptive HMC via the Infinite Exponential Family Arthur Gretton - PowerPoint PPT Presentation

Adaptive HMC via the Infinite Exponential Family Arthur Gretton ⋆ Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 1 / 38

Setting: MCMC for intractable non-linear targets Using samples to compute expectations We have a density of the form p ( x ) = π ( x ) ˆ Z = π ( x ) dx Z Z often impractical to compute Goal: to compute expectations of functions, ˆ E p [ f ( x )] = f ( x ) p ( x ) dx Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 2 / 38

Setting: MCMC for intractable non-linear targets Using samples to compute expectations We have a density of the form p ( x ) = π ( x ) ˆ Z = π ( x ) dx Z Z often impractical to compute Goal: to compute expectations of functions, ˆ E p [ f ( x )] = f ( x ) p ( x ) dx Given samples { x i } n i = 1 with distribution p ( x ) , n � E p [ f ( x )] = 1 � f ( x i ) n i = 1 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 2 / 38

Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC A visual guide . . . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 3 / 38

Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC Unnormalized target π ( x ) ∝ p ( x ) Generate Markov chain with invariant distribution p Initialize x 0 ∼ p 0 At iteration t ≥ 0, propose to move to state x ′ ∼ q ( ·| x t ) Accept/Reject proposals based on ratio � � � 1 , π ( x ′ ) q ( x t | x ′ ) x ′ , w.p. min , π ( x t ) q ( x ′ | x t ) x t + 1 = x t , otherwise. Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 4 / 38

Setting: MCMC for intractable non-linear targets Metropolis-Hastings MCMC Unnormalized target π ( x ) ∝ p ( x ) Generate Markov chain with invariant distribution p Initialize x 0 ∼ p 0 At iteration t ≥ 0, propose to move to state x ′ ∼ q ( ·| x t ) Accept/Reject proposals based on ratio � � � 1 , π ( x ′ ) q ( x t | x ′ ) x ′ , w.p. min , π ( x t ) q ( x ′ | x t ) x t + 1 = x t , otherwise. What proposal q ( ·| x t ) ? Too narrow or broad: → slow convergence Does not conform to support of target → slow convergence Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 4 / 38

Setting: MCMC for intractable non-linear targets Adaptive MCMC Adaptive Metropolis ( Haario, Saksman & Tamminen, 2001 ): Update proposal q t ( ·| x t ) = N ( x t , ν 2 ˆ Σ t ) , using estimates of the target covariance Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 5 / 38

Setting: MCMC for intractable non-linear targets Adaptive MCMC Adaptive Metropolis ( Haario, Saksman & Tamminen, 2001 ): Update proposal q t ( ·| x t ) = N ( x t , ν 2 ˆ Σ t ) , using estimates of the target covariance Locally miscalibrated for strongly non-linear targets : directions of large variance depend on the current location Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 5 / 38

Setting: MCMC for intractable non-linear targets Alternative adaptive sampler: the Kameleon Idea: fit Gaussian in feature space, take local steps in directions of max. principal components. D. Sejdinovic, H. Strathmann, M. Lomeli, C. Andrieu, and A. Gretton, ICML 2014 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 6 / 38

Setting: MCMC for intractable non-linear targets Hamiltonian Monte Carlo HMC: distant moves, high acceptance probability. 1 Potential energy U ( x ) = − log π ( x ) , auxiliary 0 momentum p ∼ exp ( − K ( p )) , simulate for t ∈ R along − 1 Hamiltonian flow of H ( p , x ) = K ( p ) + U ( x ) , using θ 7 − 2 operator − 3 ∂ K ∂ x − ∂ U ∂ ∂ − 4 ∂ p ∂ x ∂ p − 5 − 5 − 4 − 3 − 2 − 1 0 Numerical simulation (i.e. θ 2 leapfrog) depends on gradient information . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 7 / 38

Setting: MCMC for intractable non-linear targets Intractable & Non-linear Target in GPC Sliced posterior over hyperparameters of a Gaussian Process classifier on UCI Glass dataset obtained using Pseudo-Marginal MCMC 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Can you learn an HMC sampler? Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 8 / 38

Setting: MCMC for intractable non-linear targets Outline for remainder of talk 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Kernel Adaptive Hamiltonian Infinite dimensional exponential Monte Carlo ( Strathmann et al. family ( Sriperumbudur et al. 2015 ) 2014 ) Global estimate of gradient Exponential family with of log target density from RKHS-valued natural prev. samples parameter Mixing performance close to Learned via score matching , ideal “known density” HMC no log-partition function Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 9 / 38

MCMC Kameleon Infinite dimensional exponential family density estimator 0 − 1 − 2 θ 7 − 3 − 4 − 5 − 6 − 5 − 4 − 3 − 2 − 1 0 θ 2 Bharath Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Revant Kumar, and Aapo Hyvarinen, JMLR 2017, to appear (slides adapted from Bharath’s talk) Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 10 / 38

MCMC Kameleon The Exponential Family of Distributions Natural form: p θ ( x ) = q 0 ( x ) e θ T T ( x ) − A ( θ ) where θ ∈ Θ ⊂ R m (natural parameter) q 0 : probability density defined over Ω ⊂ R d A ( θ ) : log-partition function ˆ e θ T T ( x ) q 0 ( x ) dx A ( θ ) = log T ( x ) : sufficient statistic Includes many commonly used distributions Normal, Binomial, Poisson, Exponential, . . . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 11 / 38

MCMC Kameleon Infinite Dimensional Generalization � � p f ( x ) = e f ( x ) − A ( f ) q 0 ( x ) , x ∈ Ω : f ∈ F P = where � � ˆ e f ( x ) q 0 ( x ) dx < ∞ F = f ∈ H : A ( f ) = log (Canu and Smola, 2005; Fukumizu, 2009): H is a reproducing kernel Hilbert space (RKHS). Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 12 / 38

MCMC Kameleon Reproducing kernel Hilbert space Exponentiated quadratic kernel, � � ∞ � −� x − x ′ � 2 k ( x , x ′ ) = exp φ i ( x ) φ i ( x ′ ) = 2 σ 2 i = 1 ∞ ∞ � � f 2 f ( x ) = f i φ i ( x ) i < ∞ . i = 1 i = 1 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 13 / 38

MCMC Kameleon Reproducing kernel Hilbert space Function with exponentiated quadratic kernel: 1 0.8 0.6 m � 0.4 f(x) f ( x ) : = α i k ( x i , x ) 0.2 0 i = 1 −0.2 m � −0.4 = α i � φ ( x i ) , φ ( x ) � H −6 −4 −2 0 2 4 6 8 x i = 1 � m � � = α i φ ( x i ) , φ ( x ) i = 1 H Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 14 / 38

MCMC Kameleon Reproducing kernel Hilbert space Function with exponentiated quadratic kernel: 1 m � 0.8 f ( x ) : = α i k ( x i , x ) 0.6 i = 1 0.4 f(x) � m 0.2 α i � φ ( x i ) , φ ( x ) � H = 0 −0.2 i = 1 � m � −0.4 −6 −4 −2 0 2 4 6 8 � x f ℓ := � m = α i φ ( x i ) , φ ( x ) i = 1 α i φ ℓ ( x i ) i = 1 H ∞ � = f ℓ φ ℓ ( x ) Possible to write functions of ℓ = 1 infinitely many features! = � f ( · ) , φ ( x ) � H Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 15 / 38

MCMC Kameleon RKHS-Based Exponential Family H is an RKHS: � � p f ( x ) = e � f ,φ ( x ) � H − A ( f ) q 0 ( x ) , x ∈ Ω , f ∈ F P = where � � ˆ e f ( x ) q 0 ( x ) dx < ∞ F = f ∈ H : A ( f ) = log . Finite dimensional RKHS: one-to-one correspondence between finite dimensional exponential family and RKHS. T ( x ) � k ( x , y ) = � T ( x ) , T ( y ) � . Similarly, k ( x , y ) = � Φ( x ) , Φ( y ) � � Φ( x ) . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 16 / 38

MCMC Kameleon Examples Exponential: Ω = R ++ , k ( x , y ) = xy . Normal: Ω = R , k ( x , y ) = xy + x 2 y 2 . Beta: Ω = ( 0 , 1 ) , k ( x , y ) = log x log y + log ( 1 − x ) log ( 1 − y ) . Gamma: Ω = R ++ , k ( x , y ) = log x log y + xy . Inverse Gaussian: Ω = R ++ , k ( x , y ) = xy + 1 xy . Poisson: Ω = N ∪ { 0 } , k ( x , y ) = xy , q 0 ( x ) = ( x ! e ) − 1 . Geometric: Ω = N ∪ { 0 } , k ( x , y ) = xy , q 0 ( x ) = 1. Binomial: Ω = { 0 , . . . , m } , k ( x , y ) = xy , q 0 ( x ) = 2 − m � m � . c Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 17 / 38

MCMC Kameleon Problem: Given random samples, X 1 , . . . , X n drawn i.i.d. from an unknown density, p 0 := p f 0 ∈ P , estimate p 0 . Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 18 / 38

Adaptive HMC via the Infinite Exponential Family Arthur Gretton - PowerPoint PPT Presentation

Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 1 / 38 Setting: MCMC

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

About Revit Family (NAH) Project Family Management Annotation Family System Family

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

De mystifying me dia and building good r e lationships Pre pare d by HMC Co mmunic atio ns fo r

CLOVER HMC AND STAGGERED MULTIGRID ON SUMMIT AND VOLTA Kate Clark, July 25th 2018 OUTLINE with

Graphical Models Graphical Models Exponential family & Variational Inference I Siamak

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

Improved Measurements of the -Decay Response of Liquid Xenon with the LUX Detector Jon Balajthy

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

scheduling 3 1 Changelog Changes not seen in fjrst lecture: 4 Feb 2020: MLFQ example: number

Mixed models for binary and count data Rasmus Waagepetersen Department of Mathematics Aalborg

Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi,

Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th,

Sequence comparison: Significance of similarity scores

Lecture 11. 100 years events - extreme loads Igor Rychlik Chalmers Department of Mathematical

Sambuz

Useful Links

Newsletter

Mail Us

Adaptive HMC via the Infinite Exponential Family Arthur Gretton - PowerPoint PPT Presentation

Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family 30/03/2016 1 / 38 Setting: MCMC

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

About Revit Family (NAH) Project Family Management Annotation Family System Family

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

De mystifying me dia and building good r e lationships Pre pare d by HMC Co mmunic atio ns fo r

CLOVER HMC AND STAGGERED MULTIGRID ON SUMMIT AND VOLTA Kate Clark, July 25th 2018 OUTLINE with

Graphical Models Graphical Models Exponential family &amp; Variational Inference I Siamak

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

Improved Measurements of the -Decay Response of Liquid Xenon with the LUX Detector Jon Balajthy

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

scheduling 3 1 Changelog Changes not seen in fjrst lecture: 4 Feb 2020: MLFQ example: number

Mixed models for binary and count data Rasmus Waagepetersen Department of Mathematics Aalborg

Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi,

Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th,

Sequence comparison: Significance of similarity scores

Lecture 11. 100 years events - extreme loads Igor Rychlik Chalmers Department of Mathematical

Sambuz

Useful Links

Newsletter

Mail Us

Graphical Models Graphical Models Exponential family & Variational Inference I Siamak