Approximate Posterior Sampling via Stochastic Optimisation Connie - - PowerPoint PPT Presentation

approximate posterior sampling via stochastic optimisation
SMART_READER_LITE
LIVE PREVIEW

Approximate Posterior Sampling via Stochastic Optimisation Connie - - PowerPoint PPT Presentation

Approximate Posterior Sampling via Stochastic Optimisation Connie Trojan Supervisor: Srshti Putcha 6 th September 2019 Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation Background Large scale


slide-1
SLIDE 1

Approximate Posterior Sampling via Stochastic Optimisation

Connie Trojan Supervisor: Srshti Putcha 6th September 2019

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-2
SLIDE 2

Background

Large scale machine learning models rely on stochastic

  • ptimisation techniques to learn parameters of interest

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-3
SLIDE 3

Background

Large scale machine learning models rely on stochastic

  • ptimisation techniques to learn parameters of interest

It is useful to understand parameter uncertainty using Bayesian inference

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-4
SLIDE 4

Background

Large scale machine learning models rely on stochastic

  • ptimisation techniques to learn parameters of interest

It is useful to understand parameter uncertainty using Bayesian inference Usually simulate the Bayesian posterior using Markov Chain Monte Carlo (MCMC) sampling algorithms

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-5
SLIDE 5

Background

Large scale machine learning models rely on stochastic

  • ptimisation techniques to learn parameters of interest

It is useful to understand parameter uncertainty using Bayesian inference Usually simulate the Bayesian posterior using Markov Chain Monte Carlo (MCMC) sampling algorithms Stochastic gradient MCMC methods combine stochastic

  • ptimisation methods with MCMC to reduce computation

time

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-6
SLIDE 6

Notation

In the Bayesian approach, the unknown parameter θ is treated as a random variable. The Bayesian posterior distribution π(θ|①) has the form: π(θ|①) ∝ p(θ)ℓ(①|θ) = p(θ)

N

  • i=1

ℓ(xi|θ), where: p(θ) is the prior distribution ℓ(xi|θ) is the likelihood associated with observation i N is the size of the dataset

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-7
SLIDE 7

Notation

In particular, gradient-based MCMC algorithms use the log posterior f (θ) to propose moves: f (θ) = k + f0(θ) +

N

  • i=1

fi(θ) ≡ k + log p(θ) +

N

  • i=1

log ℓ(xi|θ)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-8
SLIDE 8

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-9
SLIDE 9

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ0, batch size n ≪ N, and step sizes ǫt. Iterate:

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-10
SLIDE 10

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ0, batch size n ≪ N, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-11
SLIDE 11

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ0, batch size n ≪ N, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by :

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-12
SLIDE 12

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ0, batch size n ≪ N, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by :

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

3 Set θt+1 = θt + ǫt∇ˆ

f (θt)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-13
SLIDE 13

Stochastic Optimisation

Efficient way of learning model parameters, typically used in machine learning. Stochastic Gradient Ascent (SGA) Set starting value θ0, batch size n ≪ N, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by :

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

3 Set θt+1 = θt + ǫt∇ˆ

f (θt) +γ(θt − θt−1) There are many ways of speeding up convergence, such as adding in a momentum term.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-14
SLIDE 14

Stochastic Optimisation

Robbins-Monro criteria for convergence: If ∞

t=1 ǫt = ∞ and ∞ t=1 ǫ2 t < ∞, then θt will converge to a

local maximum

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-15
SLIDE 15

Stochastic Optimisation

Robbins-Monro criteria for convergence: If ∞

t=1 ǫt = ∞ and ∞ t=1 ǫ2 t < ∞, then θt will converge to a

local maximum Usually set ǫt = (αt + β)−γ with γ ∈ (0.5, 1]

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-16
SLIDE 16

Stochastic Optimisation

Robbins-Monro criteria for convergence: If ∞

t=1 ǫt = ∞ and ∞ t=1 ǫ2 t < ∞, then θt will converge to a

local maximum Usually set ǫt = (αt + β)−γ with γ ∈ (0.5, 1] These algorithms only converge to a point estimate of the posterior mode

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-17
SLIDE 17

MCMC

Many problems for which Bayesian inference would be useful involve non-standard distributions and a large number of parameters, making exact inference challenging. MCMC algorithms aim to generate random samples from the

  • posterior. These samplers construct a Markov chain, often a

random walk, which converges to the desired stationary distribution.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-18
SLIDE 18

Metropolis-Adjusted Langevin Algorithm (MALA)

The Langevin diffusion describes dynamics which converge to π(θ): dθ(t) = 1 2∇f (θ(t)) + db(t) MALA uses the following discretisation to propose samples: θt+1 = θt + σ2 2 ∇f (θt) + σηt A Metropolis-Hastings accept/reject step is then used to correct discretisation errors, ensuring convergence to the desired stationary distribution.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-19
SLIDE 19

MALA algorithm

Set starting value θ0 and step size σ2. Iterate the following:

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-20
SLIDE 20

MALA algorithm

Set starting value θ0 and step size σ2. Iterate the following:

1 Set θ∗ = θt + σ2 2 ∇f (θt) + σηt , where ηt ∼ N(0, I)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-21
SLIDE 21

MALA algorithm

Set starting value θ0 and step size σ2. Iterate the following:

1 Set θ∗ = θt + σ2 2 ∇f (θt) + σηt , where ηt ∼ N(0, I) 2 Accept and set θt+1 = θ∗ with probability

a(θ∗, θt) = min

  • 1, π(θ∗)q(θt|θ∗)

π(θt)q(θ∗|θt)

  • ,

where q(x|y) = P(θ∗ = x|θt = y)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-22
SLIDE 22

MALA algorithm

Set starting value θ0 and step size σ2. Iterate the following:

1 Set θ∗ = θt + σ2 2 ∇f (θt) + σηt , where ηt ∼ N(0, I) 2 Accept and set θt+1 = θ∗ with probability

a(θ∗, θt) = min

  • 1, π(θ∗)q(θt|θ∗)

π(θt)q(θ∗|θt)

  • ,

where q(x|y) = P(θ∗ = x|θt = y)

3 If rejected, set θt+1 = θt

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-23
SLIDE 23

MALA

σ = 0.03 σ = 0.13 σ = 0.20 a = 0.99 a = 0.57 a = 0.13

−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.3 −0.2 −0.1 0.0 0.1 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-24
SLIDE 24

Stochastic Gradient Langevin Dynamics (SGLD)

SGLD aims to reduce the computational cost of MALA by replacing the full gradient calculation in the proposal with the stochastic approximation ∇ˆ f (θ): θt+1 = θt + ǫt 2 ∇ˆ f (θt) + √ǫtηt Here, the ǫt are decreasing to 0 as in SGA. Since the Metropolis-Hastings acceptance rate tends to 1 as the step size decreases, the costly accept/reject step is ignored.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-25
SLIDE 25

SGLD algorithm

Set starting value θ0, batch size n, and step sizes ǫt. Iterate:

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-26
SLIDE 26

SGLD algorithm

Set starting value θ0, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-27
SLIDE 27

SGLD algorithm

Set starting value θ0, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-28
SLIDE 28

SGLD algorithm

Set starting value θ0, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

3 Set θt+1 = θt + ǫt 2 ∇ˆ

f (θt) + √ǫtηt , where ηt ∼ N(0, I)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-29
SLIDE 29

SGLD algorithm

Set starting value θ0, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by

∇ˆ f (θt) = ∇f0(θt) + N n

  • xi∈St

∇fi(θt)

3 Set θt+1 = θt + ǫt 2 ∇ˆ

f (θt) + √ǫtηt , where ηt ∼ N(0, I) In practice, a fixed step size often works and is far easier to tune.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-30
SLIDE 30

SGLD

ǫ = 0.0001 ǫ = 0.0005 ǫ = 0.0013 ǫ = 0.0050

−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.3 −0.2 −0.1 0.0 0.1 0.2 −0.2 0.0 0.2 −0.2 0.0 0.2 −0.6 −0.3 0.0 0.3 0.6 −0.50 −0.25 0.00 0.25 0.50 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-31
SLIDE 31

SGLD with Control Variates (SGLD-CV)

The gradient estimate in SGLD is simple

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-32
SLIDE 32

SGLD with Control Variates (SGLD-CV)

The gradient estimate in SGLD is simple The variance of the gradient estimator can be reduced using control variates

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-33
SLIDE 33

SGLD with Control Variates (SGLD-CV)

The gradient estimate in SGLD is simple The variance of the gradient estimator can be reduced using control variates This is achieved by finding ˆ θ, a value of θ close to the mode, called the centering value. The gradient estimates in the sampler will condition on ˆ θ

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-34
SLIDE 34

SGLD-CV

Since ∇f (θt) = ∇f (ˆ θ) +

  • ∇f (θt) − ∇f (ˆ

θ)

  • ,

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-35
SLIDE 35

SGLD-CV

Since ∇f (θt) = ∇f (ˆ θ) +

  • ∇f (θt) − ∇f (ˆ

θ)

  • ,

we can take a subsample St of the data and estimate ∇f (θt) by ∇˜ f (θt) = ∇f (ˆ θ) +

  • ∇ˆ

f (θt) − ∇ˆ f (ˆ θ)

  • .

Here, ∇ˆ f is the simple estimate used in SGLD.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-36
SLIDE 36

SGLD-CV

Since ∇f (θt) = ∇f (ˆ θ) +

  • ∇f (θt) − ∇f (ˆ

θ)

  • ,

we can take a subsample St of the data and estimate ∇f (θt) by ∇˜ f (θt) = ∇f (ˆ θ) +

  • ∇ˆ

f (θt) − ∇ˆ f (ˆ θ)

  • .

Here, ∇ˆ f is the simple estimate used in SGLD. In full our new estimate ∇˜ f is: ∇f (ˆ θ) +

  • ∇f0(θt) − ∇f0(ˆ

θ)

  • + N

n

  • xi∈St
  • ∇fi(θt) − ∇fi(ˆ

θ)

  • Connie TrojanSupervisor: Srshti Putcha

Approximate Posterior Sampling via Stochastic Optimisation

slide-37
SLIDE 37

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-38
SLIDE 38

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode Calculate the full gradient ∇f (ˆ θ)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-39
SLIDE 39

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode Calculate the full gradient ∇f (ˆ θ) Set starting value ˆ θ, batch size n, and step sizes ǫt. Iterate:

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-40
SLIDE 40

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode Calculate the full gradient ∇f (ˆ θ) Set starting value ˆ θ, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-41
SLIDE 41

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode Calculate the full gradient ∇f (ˆ θ) Set starting value ˆ θ, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by ∇˜

f (θt)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-42
SLIDE 42

SGLD-CV algorithm

Use stochastic optimisation to find ˆ θ, a value close to a mode Calculate the full gradient ∇f (ˆ θ) Set starting value ˆ θ, batch size n, and step sizes ǫt. Iterate:

1 Take a subsample St of size n from the data 2 Estimate the gradient at θt by ∇˜

f (θt)

3 Set θt+1 = θt + ǫt 2 ∇˜

f (θt) + √ǫtηt , where ηt ∼ N(0, I)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-43
SLIDE 43

Comparison

MALA SGLD (n=10) SGLD−CV 2.5 5.0 7.5 25 50 75 100

Passes through data Kernel Stein Discrepency

MALA SGLD (n=100) SGLD (n=50) SGLD (n=10) SGLD−CV

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-44
SLIDE 44

Comparison

MALA −2 −1 1 2 −2 −1 1 2 SGLD −2 −1 1 2 SGLD−CV −2 −1 1 2

Comparison of the samplers for a more complicated multimodal target distribution Data distribution: ① ∼ 1

2N(µ1, σ) + 1 2N(µ2, σ)

Each sampler was given 500 passes through the data and 20 passes of burn-in or optimisation

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-45
SLIDE 45

The Covertype Dataset

The sampling algorithms discussed above were used to fit a binary logistic regression model to the covertype dataset. The aim was to predict the class of tree cover from 54 forest terrain factors.

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-46
SLIDE 46

The Covertype Dataset

Elevation (m) Aspect (degrees azimuth) Slope (degrees) Horizontal distance to nearest surface water (m) Vertical distance to nearest surface water (m) Horizontal distance to nearest roadway (m) Hillshade 9am (0-255) Hillshade Noon (0-255) Hillshade 3pm (0-255) Horizontal distance to wildfire ignition points (m) Wilderness area designation x4 (binary) Soil type x40 (binary) Class (1-7)

1: Spruce/Fir 2: Lodgepole Pine 3: Ponderosa Pine 4: Willow/ Cottonwood 5: Aspen 6: Douglas Fir 7: Krummholz Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-47
SLIDE 47

The Covertype Dataset

The problem was converted to a binary classification problem aiming to separate class 2 from the others ① ① ① ①

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-48
SLIDE 48

The Covertype Dataset

The problem was converted to a binary classification problem aiming to separate class 2 from the others Instead of class, used the response variable y where: yi = 1, if class(①i) = 2 0, else P(yi = 1|①i) = σ(β0 + βT①i) ≡ 1 1 + exp [−(β0 + βT①i)]

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-49
SLIDE 49

The Covertype Dataset

The problem was converted to a binary classification problem aiming to separate class 2 from the others Instead of class, used the response variable y where: yi = 1, if class(①i) = 2 0, else P(yi = 1|①i) = σ(β0 + βT①i) ≡ 1 1 + exp [−(β0 + βT①i)] The training dataset had 570 000 observations and an additional 10 000 were used to test the model

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-50
SLIDE 50

The Covertype Dataset

0.52 0.53 0.54 0.55 1 2 3

Passes through training data Log loss of test set

SGLD SGLD−CV

Performance measure: log loss 1 |T|

  • yi∈T

[yi log( ˆ pi) + (1 − yi) log(1 − ˆ pi)]

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-51
SLIDE 51

Conclusions and Further Work

MALA is very impractical with large datasets

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-52
SLIDE 52

Conclusions and Further Work

MALA is very impractical with large datasets SGLD-CV consistently outperforms the other algorithms

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-53
SLIDE 53

Conclusions and Further Work

MALA is very impractical with large datasets SGLD-CV consistently outperforms the other algorithms Tuning SGLD is very difficult - have to test a wide range of stepsizes and use a metric like KSD to assess performance

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-54
SLIDE 54

Conclusions and Further Work

MALA is very impractical with large datasets SGLD-CV consistently outperforms the other algorithms Tuning SGLD is very difficult - have to test a wide range of stepsizes and use a metric like KSD to assess performance SGLD-CV also has a high tuning burden, since both the

  • ptimisation and the sampling stages have to be tuned

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-55
SLIDE 55

Conclusions and Further Work

MALA is very impractical with large datasets SGLD-CV consistently outperforms the other algorithms Tuning SGLD is very difficult - have to test a wide range of stepsizes and use a metric like KSD to assess performance SGLD-CV also has a high tuning burden, since both the

  • ptimisation and the sampling stages have to be tuned

Gradient calculations had to be done by hand, making it difficult to implement more complicated models

It is more practical to use numerical differentiation for this (e.g. sgmcmc for R)

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-56
SLIDE 56

References

Gareth O. Roberts and Richard L. Tweedie. Exponential Convergence of Langevin Distributions and Their Discrete Approximations. https://www.jstor.org/stable/3318418 R´ emi Bardenet, Arnaud Doucet, and Chris Holmes. On Markov chain Monte Carlo methods for tall data. http://jmlr.org/papers/v18/15-205.html Max Welling and Yee W. Teh. Bayesian Learning via Stochastic Gradient Langevin Dynamics. https://www.ics.uci.edu/~welling/publications/papers/ stoclangevin_v6.pdf Jack Baker, Paul Fearnhead, Emily B. Fox, and Christopher Nemeth. Control Variates for Stochastic Gradient MCMC. https://arxiv.org/abs/1706.05439

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation

slide-57
SLIDE 57

Any Questions?

Connie TrojanSupervisor: Srshti Putcha Approximate Posterior Sampling via Stochastic Optimisation