Approximate Posterior Sampling via Stochastic Optimisation Connie - PowerPoint PPT Presentation

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti Putcha 6 th September 2019 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Background Large scale machine learning models rely on stochastic optimisation techniques to learn parameters of interest Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Background Large scale machine learning models rely on stochastic optimisation techniques to learn parameters of interest It is useful to understand parameter uncertainty using Bayesian inference Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Background Large scale machine learning models rely on stochastic optimisation techniques to learn parameters of interest It is useful to understand parameter uncertainty using Bayesian inference Usually simulate the Bayesian posterior using Markov Chain Monte Carlo (MCMC) sampling algorithms Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Background Large scale machine learning models rely on stochastic optimisation techniques to learn parameters of interest It is useful to understand parameter uncertainty using Bayesian inference Usually simulate the Bayesian posterior using Markov Chain Monte Carlo (MCMC) sampling algorithms Stochastic gradient MCMC methods combine stochastic optimisation methods with MCMC to reduce computation time Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Notation In the Bayesian approach, the unknown parameter θ is treated as a random variable. The Bayesian posterior distribution π ( θ | ① ) has the form: N � π ( θ | ① ) ∝ p ( θ ) ℓ ( ① | θ ) = p ( θ ) ℓ ( x i | θ ) , i =1 where: p ( θ ) is the prior distribution ℓ ( x i | θ ) is the likelihood associated with observation i N is the size of the dataset Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Notation In particular, gradient-based MCMC algorithms use the log posterior f ( θ ) to propose moves: N N � � f ( θ ) = k + f 0 ( θ ) + f i ( θ ) ≡ k + log p ( θ ) + log ℓ ( x i | θ ) i =1 i =1 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ 0 , batch size n ≪ N , and step sizes ǫ t . Iterate: Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ 0 , batch size n ≪ N , and step sizes ǫ t . Iterate: 1 Take a subsample S t of size n from the data Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ 0 , batch size n ≪ N , and step sizes ǫ t . Iterate: 1 Take a subsample S t of size n from the data 2 Estimate the gradient at θ t by : f ( θ t ) = ∇ f 0 ( θ t ) + N ∇ ˆ � ∇ f i ( θ t ) n x i ∈ S t Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ 0 , batch size n ≪ N , and step sizes ǫ t . Iterate: 1 Take a subsample S t of size n from the data 2 Estimate the gradient at θ t by : f ( θ t ) = ∇ f 0 ( θ t ) + N ∇ ˆ � ∇ f i ( θ t ) n x i ∈ S t 3 Set θ t +1 = θ t + ǫ t ∇ ˆ f ( θ t ) Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ 0 , batch size n ≪ N , and step sizes ǫ t . Iterate: 1 Take a subsample S t of size n from the data 2 Estimate the gradient at θ t by : f ( θ t ) = ∇ f 0 ( θ t ) + N ∇ ˆ � ∇ f i ( θ t ) n x i ∈ S t 3 Set θ t +1 = θ t + ǫ t ∇ ˆ f ( θ t ) + γ ( θ t − θ t − 1 ) There are many ways of speeding up convergence, such as adding in a momentum term. Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Robbins-Monro criteria for convergence: If � ∞ t =1 ǫ t = ∞ and � ∞ t =1 ǫ 2 t < ∞ , then θ t will converge to a local maximum Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Robbins-Monro criteria for convergence: If � ∞ t =1 ǫ t = ∞ and � ∞ t =1 ǫ 2 t < ∞ , then θ t will converge to a local maximum Usually set ǫ t = ( α t + β ) − γ with γ ∈ (0 . 5 , 1] Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Optimisation Robbins-Monro criteria for convergence: If � ∞ t =1 ǫ t = ∞ and � ∞ t =1 ǫ 2 t < ∞ , then θ t will converge to a local maximum Usually set ǫ t = ( α t + β ) − γ with γ ∈ (0 . 5 , 1] These algorithms only converge to a point estimate of the posterior mode Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MCMC Many problems for which Bayesian inference would be useful involve non-standard distributions and a large number of parameters, making exact inference challenging. MCMC algorithms aim to generate random samples from the posterior. These samplers construct a Markov chain, often a random walk, which converges to the desired stationary distribution. Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Metropolis-Adjusted Langevin Algorithm (MALA) The Langevin diffusion describes dynamics which converge to π ( θ ): d θ ( t ) = 1 2 ∇ f ( θ ( t )) + db ( t ) MALA uses the following discretisation to propose samples: θ t +1 = θ t + σ 2 2 ∇ f ( θ t ) + ση t A Metropolis-Hastings accept/reject step is then used to correct discretisation errors, ensuring convergence to the desired stationary distribution. Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MALA algorithm Set starting value θ 0 and step size σ 2 . Iterate the following: Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MALA algorithm Set starting value θ 0 and step size σ 2 . Iterate the following: 1 Set θ ∗ = θ t + σ 2 2 ∇ f ( θ t ) + ση t , where η t ∼ N (0 , I ) Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MALA algorithm Set starting value θ 0 and step size σ 2 . Iterate the following: 1 Set θ ∗ = θ t + σ 2 2 ∇ f ( θ t ) + ση t , where η t ∼ N (0 , I ) 2 Accept and set θ t +1 = θ ∗ with probability � � 1 , π ( θ ∗ ) q ( θ t | θ ∗ ) a ( θ ∗ , θ t ) = min , π ( θ t ) q ( θ ∗ | θ t ) where q ( x | y ) = P ( θ ∗ = x | θ t = y ) Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MALA algorithm Set starting value θ 0 and step size σ 2 . Iterate the following: 1 Set θ ∗ = θ t + σ 2 2 ∇ f ( θ t ) + ση t , where η t ∼ N (0 , I ) 2 Accept and set θ t +1 = θ ∗ with probability � � 1 , π ( θ ∗ ) q ( θ t | θ ∗ ) a ( θ ∗ , θ t ) = min , π ( θ t ) q ( θ ∗ | θ t ) where q ( x | y ) = P ( θ ∗ = x | θ t = y ) 3 If rejected, set θ t +1 = θ t Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

MALA 0.3 0.2 0.2 0.1 0.1 σ = 0 . 03 a = 0 . 99 0.0 0.0 −0.1 −0.1 −0.2 −0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 0.3 0.2 0.2 0.1 0.1 σ = 0 . 13 a = 0 . 57 0.0 0.0 −0.1 −0.1 −0.2 −0.2 −0.2 −0.1 0.0 0.1 0.2 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.2 0.1 σ = 0 . 20 a = 0 . 13 0.1 0.0 0.0 −0.1 −0.1 −0.2 −0.2 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.2 −0.1 0.0 0.1 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Stochastic Gradient Langevin Dynamics (SGLD) SGLD aims to reduce the computational cost of MALA by replacing the full gradient calculation in the proposal with the stochastic approximation ∇ ˆ f ( θ ): f ( θ t ) + √ ǫ t η t θ t +1 = θ t + ǫ t 2 ∇ ˆ Here, the ǫ t are decreasing to 0 as in SGA. Since the Metropolis-Hastings acceptance rate tends to 1 as the step size decreases, the costly accept/reject step is ignored. Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

Approximate Posterior Sampling via Stochastic Optimisation Connie - PowerPoint PPT Presentation

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti Putcha 6 th September 2019 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation Background Large scale

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Approximate Inference by Stochastic Simulation/Sampling Methods Zhenke Wu Department of

Sampling-Based Inference 1 Inference by stochastic simulation Basic idea: 1) Draw N samples from

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Akshay Jeff

Approximate Counting By Sampling CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 3 :

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Equine Assisted Therapy for Post Traumatic Stress Disorder (PTSD) By-Lindsey Boyte What is PTSD

Child Witness to Violence Project Provides counseling services to children age 8 &

The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models Noah W. Tuchow,

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon

Rising Skill Premium? The Roles of Capital-Skill Complementarity and Sectoral Shifts in a

Commercial Real Estate Loan Commercial Real Estate Loan Workouts and Modifications Strategies for

Sambuz

Useful Links

Newsletter

Mail Us

Approximate Posterior Sampling via Stochastic Optimisation Connie - PowerPoint PPT Presentation

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti Putcha 6 th September 2019 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation Background Large scale

A O I Posterior View A O I Posterior View A O I

Section 33: Hip Structural Components 33-1 posterior posterior anterior anterior head of

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Approximate Inference by Stochastic Simulation/Sampling Methods Zhenke Wu Department of

Sampling-Based Inference 1 Inference by stochastic simulation Basic idea: 1) Draw N samples from

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Akshay Jeff

Approximate Counting By Sampling CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 3 :

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Equine Assisted Therapy for Post Traumatic Stress Disorder (PTSD) By-Lindsey Boyte What is PTSD

Child Witness to Violence Project Provides counseling services to children age 8 &amp;

The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models Noah W. Tuchow,

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon

Rising Skill Premium? The Roles of Capital-Skill Complementarity and Sectoral Shifts in a

Commercial Real Estate Loan Commercial Real Estate Loan Workouts and Modifications Strategies for

Sambuz

Useful Links

Newsletter

Mail Us

Child Witness to Violence Project Provides counseling services to children age 8 &