Approximate Bayesian Computation
Michael Gutmann
https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University
Approximate Bayesian Computation Michael Gutmann - - PowerPoint PPT Presentation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate Bayesian computation (ABC) 2. ABC
https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University
What is ABC? A set of methods for approximate Bayesian inference which can be used whenever sampling from the model is possible. Michael Gutmann ABC 2 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ The ingredients for Bayesian parameter inference:
◮ Observed data yo ∈ Y ⊂ Rn ◮ A statistical model for the data generating process, py|θ,
◮ A prior probability density function (pdf) for the parameters θ,
◮ The mechanics of Bayesian inference:
◮ Often written without subscripts (“function overloading”)
Michael Gutmann ABC 4 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Likelihood function: L(θ) = p(yo|θ)
◮ For discrete random variables:
◮ For continuous random variables:
ǫ→0
◮ L(θ) indicates to which extent different values of the model
Michael Gutmann ABC 5 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
1 √ 2π·42 exp
2·42
1 √ 2π exp
2
5 10 15 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 mean prior pdf
5 10 15 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 mean likelihood
5 10 15 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 mean posterior pdf
prior likelihood function posterior data model Michael Gutmann ABC 6 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ The statistical model was defined via the family of pdfs
◮ Statistical models can be specified in other ways as well. ◮ In this lecture: models which are specified via a mechanism
◮ Example: Instead of
Michael Gutmann ABC 7 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Sampling from the model is straightforward. For example:
◮ Enables direct modeling of how data are generated. ◮ Names for models specified via a data generating mechanism:
◮ Generative models ◮ Implicit models ◮ Stochastic simulation models ◮ Simulator-based models Michael Gutmann ABC 8 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Astrophysics:
◮ Evolutionary biology:
◮ Health science:
◮ . . .
Dark matter density simulated by the Illustris collaboration (Figure from http://www.illustris-project.org)
Michael Gutmann ABC 9 / 47
◮ Simulation of different hypothesized evolutionary scenarios ◮ Interaction between early modern humans (Homo sapiens)
Immigration of Modern Humans into Europe from the Near East. Light gray: Neanderthal population. Dark: Homo sapiens. from (Currat and Excoffier, Plos Biology, 2004, 10.1371/journal.pbio.0020421). The numbers in the figures indicate generations. See also Pinhasi et al, The genetic history of Europeans, Trends in Genetics, 2012
Michael Gutmann ABC 10 / 47
◮ Simulation of bacterial transmission dynamics in child day
Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30 Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30 Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30
Time
Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30
Individual S t r a i n Michael Gutmann ABC 11 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Let (Ω, F, P) be a probability space. ◮ A simulator-based model is a collection of (measurable)
◮ For any fixed θ, yθ = g(., θ) is a random variable.
Michael Gutmann ABC 12 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Direct implementation of hypotheses of how the observed
◮ Neat interface with physical or biological models of data. ◮ Modeling by replicating the mechanisms of nature which
◮ Possibility to perform experiments in silico.
Michael Gutmann ABC 13 / 47
Bayesian inference Inference for simulator-based models Recap Simulator-based models
◮ Generally elude analytical treatment. ◮ Can be easily made more complicated than necessary. ◮ Statistical inference is difficult . . . but possible!
Michael Gutmann ABC 14 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ For any fixed θ, the output of the simulator yθ = g(., θ) is a
◮ Generally, it is impossible to write down the pdf of yθ
◮ No closed-form formulae available for p(y|θ). ◮ Simulator defines the model pdfs p(y|θ) implicitly.
Michael Gutmann ABC 15 / 47
Michael Gutmann ABC 16 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ The implicit definition of the model pdfs implies an implicit
Michael Gutmann ABC 17 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ For continuous random variables: L(θ) = limǫ→0 Lǫ(θ)
Michael Gutmann ABC 18 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ To compute the likelihood function, we need to compute the
◮ No analytical expression available. ◮ But we can empirically test whether simulated data equals yo
◮ This property will be exploited to perform inference for
Michael Gutmann ABC 19 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ For discrete random variables, we can perform exact Bayesian
◮ Idea: the posterior is obtained by conditioning p(θ, y) on the
◮ Given tuples (θi, yi) where
◮ θi ∼ pθ
◮ yi = g(ωi, θi)
◮ The θi from the retained tuples are samples from the
Michael Gutmann ABC 20 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ Posterior inference of the success probability θ in a Bernoulli
◮ Data: yo = 1 ◮ Prior: pθ = 1 on (0, 1) ◮ Data generating process:
◮ Given θi ∼ pθ ◮ ωi ∼ U(0, 1) ◮ yi =
◮ Retain those θi for which yi = yo.
Michael Gutmann ABC 21 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ The method produces samples from the posterior. ◮ Monte Carlo error when summarizing the samples as an
◮ Histogram for N simulated tuples (θi, yi)
0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5 Success probability pdf Estimated posterior pdf True posterior pdf Prior pdf
N = 1000
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Success probability pdf Estimated posterior pdf True posterior pdf Prior pdf
N = 10, 000
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Success probability pdf Estimated posterior pdf True posterior pdf Prior pdf
N = 100, 000
Michael Gutmann ABC 22 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ The method produces samples from p(θ|yo). ◮ This is good. ◮ But only applicable to discrete random variables. ◮ And even for discrete random variables:
◮ Reason: The probability of the event yθ = yo becomes smaller
◮ Out of N simulated tuples only a small fraction will be
◮ The small number of accepted samples do not represent the
◮ Large Monte Carlo errors
◮ This is bad.
Michael Gutmann ABC 23 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ Settle for approximate yet computationally feasible inference. ◮ Introduce two types of approximations:
◮ In other words:
◮ Defines an approximate likelihood function ˜
Michael Gutmann ABC 24 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ Inference of the mean θ of a
◮ Pr(y = yo|θ) = 0. ◮ Discrepancy ∆θ:
n
i ,
n
1 2 3 4 2 4 6 8 10 12 14 16 θ discrepancy 0.1, 0.9 quantiles mean realization of stochastic process realizations at θ
Discrepancy ∆θ is a random variable.
Michael Gutmann ABC 25 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
0.5 1 1.5 2 2.5 3 3.5 4 0.2 0.4 0.6 0.8 1 θ 0.1, 0.9 quantiles mean threshold (0.1) approximate likelihood (rescaled) true likelihood (rescaled) Michael Gutmann ABC 26 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ Here, T(y) = 1 n
i=1 yi is a sufficient statistics for inference
◮ The only approximation is ǫ > 0. ◮ In general, the summary statistics will not be sufficient.
Michael Gutmann ABC 27 / 47
Bayesian inference Inference for simulator-based models Likelihood function Exact inference Approximate inference
◮ The two approximations made yield the rejection algorithm for
◮ This is the basic ABC algorithm. ◮ It produces samples θ ∼ ˜
Michael Gutmann ABC 28 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ Simulator-based models: Models which are specified by a data
◮ By construction, we can sample from simulator-based models.
◮ Rejection ABC: Trial and error scheme to find parameter
◮ Simulated data resemble the observed data if some
Michael Gutmann ABC 30 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ The rejection ABC algorithm works. ◮ But it is computationally not efficient. ◮ The probability of the event ∆θ ≤ ǫ is usually small when
Michael Gutmann ABC 31 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ In the Gaussian example, the probability for ∆θ ≤ ǫ can be
∆θ = (ˆ µo − ˆ µθ)2
x
−∞ 1 √ 2π exp
2 u2
du ◮ For nǫ small: ˜
◮ For small ǫ good approximation of
◮ But for small ǫ, Pr(∆θ ≤ ǫ) ≈ 0:
0.5 1 1.5 2 2.5 3 3.5 4 0.2 0.4 0.6 0.8 1 θ 0.1, 0.9 quantiles mean threshold (0.1) approximate likelihood (rescaled) true likelihood (rescaled)
Michael Gutmann ABC 32 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ Two widely used algorithms which improve upon rejection
◮ Both use rejection ABC as a building block. ◮ Sequential Monte Carlo (SMC) ABC is also known as
Michael Gutmann ABC 33 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ Regression ABC consists in running rejection ABC with a
◮ Sequential Monte Carlo ABC consists in sampling θ from an
Michael Gutmann ABC 34 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ The summary statistics tθ = T(yθ) and θ have a joint
◮ Let ti be the summary statistics for simulated data
◮ We can learn a regression model between the summary
◮ The training data for the regression are typically tuples (θi, ti)
Michael Gutmann ABC 35 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ Fitting the regression model to the training data (θi, ti) yields
◮ Regression ABC consists in replacing θi with θ∗ i ,
i = ˆ
◮ Corresponds to an adjustment of θi. ◮ If the relation between t and θ is learned correctly, the θ∗ i
Michael Gutmann ABC 36 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ We may modify the rejection ABC algorithm and use φ(θ)
◮ The retained samples follow a distribution proportional to
Michael Gutmann ABC 37 / 47
Standard algorithms Recent developments Critique of rejection ABC Regression ABC Sequential Monte Carlo ABC
◮ Parameters θi weighted with wi,
◮ Can be used to iteratively morph the prior into a posterior:
◮ Use a sequence of shrinking thresholds ǫt ◮ Run rejection ABC with ǫ0. ◮ Define φt at iteration t based on the weighted samples from
◮ More efficient than rejection ABC: φt(θ) is close to the
Michael Gutmann ABC 38 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
◮ Evaluating ∆θ is computationally costly. We are only
◮ We could increase the computational efficiency by evaluating
◮ Use a combination of probabilistic modeling of ∆θ and
1 2 3 4 2 4 6 8 10 12 14 16 theta
0.1, 0.9 quantiles mean realization of stochastic process realizations at theta
Michael Gutmann ABC 39 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
◮ The approximate likelihood function ˜
˜ Lǫ(θ) ∝ Pr (∆θ ≤ ǫ | θ) ◮ If we new the distribution of ∆θ we could compute ˜
◮ In recent work, we proposed to learn a model of ∆θ and to
◮ Model is learned more accurately in regions where ∆θ tends
Michael Gutmann ABC 40 / 47
5
10 20 30 40 50 60 theta discrepancy
5
5 10 15 20 25 30 35 40 45 50 theta discrepancy
5
5 10 15 20 25 30 35 40 45 50 theta discrepancy
Acquisition
Acquisition function Belief based on 1 data point
Start: 1 data point
Where to evaluate next
2 data points
Updated belief Exploration vs exploitation Bayes’ theorem
3 data points
Acquisition function Where to evaluate next Where to evaluate next Updated belief Target
Updated belief Current belief
Michael Gutmann ABC 41 / 47
◮ Inference about bacterial transmission dynamics in child day
Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30 Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30 Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30
Time
Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30
Individual S t r a i n Michael Gutmann ABC 42 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
Individual Strain 5 10 15 20 25 30 35 5 10 15 20 25 30
Example data from a DCC. Each square indicates an attendee colonized with a strain
Michael Gutmann ABC 43 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
◮ Simulator-based model: latent continuous-time Markov chain
(Numminen et, Biometrics, 2013). ◮ The model has three parameters:
◮ β: rate of infections within a DCC ◮ Λ: rate of infections outside a DCC ◮ θ: possibility to be infected with multiple strains
◮ Likelihood is intractable (data at a single time point are
Michael Gutmann ABC 44 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
◮ Comparison of the model-based approach with a
◮ Roughly equal results using 1000 times fewer simulations. ◮ The minimizer of the
Posterior means: solid lines with markers, credibility intervals: shaded areas or dashed lines.
2 2.5 3 3.5 4 4.5 5 5.5 6 1 2 3 4 5 6 7 8 9 10 11 Log 10 number of simulated data sets β minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result)
Michael Gutmann ABC 45 / 47
Standard algorithms Recent developments Bayesian optimization for ABC Application
◮ Comparison of the model-based approach with a
2 2.5 3 3.5 4 4.5 5 5.5 6 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Log 10 number of simulated data sets Λ minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result) 2 2.5 3 3.5 4 4.5 5 5.5 6 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Log 10 number of simulated data sets θ minimizer of regression function model-based posterior PMC-ABC posterior Numminen et al (final result)
Posterior means are shown as solid lines with markers, credibility intervals as shaded areas or dashed lines. Michael Gutmann ABC 46 / 47
◮ The topic was Bayesian inference for models specified via a
◮ Introduced approximate Bayesian computation (ABC). ◮ Principle of ABC: Find parameter values which yield simulated
◮ Covered three classical algorithms:
◮ Introduced recent work which uses Bayesian optimization to
◮ Not covered: How to choose the summary statistics / the
Michael Gutmann ABC 47 / 47