approximate bayesian computation
play

Approximate Bayesian Computation Michael Gutmann - PowerPoint PPT Presentation

Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate Bayesian computation (ABC) 2. ABC


  1. Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015

  2. Content Two parts: 1. The basics of approximate Bayesian computation (ABC) 2. ABC methods used in practice What is ABC? A set of methods for approximate Bayesian inference which can be used whenever sampling from the model is possible. Michael Gutmann ABC 2 / 47

  3. Part I Basic ABC

  4. Bayesian inference Recap Inference for simulator-based models Simulator-based models Recap of Bayesian inference ◮ The ingredients for Bayesian parameter inference: ◮ Observed data y o ∈ Y ⊂ R n ◮ A statistical model for the data generating process, p y | θ , parametrized by θ ∈ Θ ⊂ R d . ◮ A prior probability density function (pdf) for the parameters θ , p θ ◮ The mechanics of Bayesian inference: p θ | y ( θ | y o ) ∝ p y | θ ( y o | θ ) × p θ ( θ ) (1) posterior ∝ likelihood function × prior (2) ◮ Often written without subscripts (“function overloading”) p ( θ | y o ) ∝ p ( y o | θ ) × p ( θ ) (3) Michael Gutmann ABC 4 / 47

  5. Bayesian inference Recap Inference for simulator-based models Simulator-based models Likelihood function ◮ Likelihood function: L ( θ ) = p ( y o | θ ) ◮ For discrete random variables: L ( θ ) = p ( y o | θ ) = Pr( y = y o | θ ) (4) Probability that data generated from the model, when using parameter value θ , are equal to y o . ◮ For continuous random variables: Pr( y ∈ B ǫ ( y o ) | θ ) L ( θ ) = p ( y o | θ ) = lim (5) Vol( B ǫ ( y o )) ǫ → 0 Proportional to the probability that the generated data are in a small ball B ǫ ( y o ) around y o . ◮ L ( θ ) indicates to which extent different values of the model parameters are consistent with the observed data. Michael Gutmann ABC 5 / 47

  6. Bayesian inference Recap Inference for simulator-based models Simulator-based models Example � � � � y o = 2 − ( y − θ ) 2 − θ 2 1 1 p ( θ ) = 2 π · 4 2 exp p ( y | θ ) = 2 π exp √ √ 2 · 4 2 2 data model prior likelihood function 0.1 0.4 0.09 0.35 0.08 0.3 0.07 0.25 0.06 likelihood prior pdf 0.05 0.2 0.04 0.15 0.03 0.1 0.02 0.05 0.01 0 0 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 mean mean 0.45 0.4 0.35 0.3 posterior pdf 0.25 0.2 0.15 0.1 0.05 0 -15 -10 -5 0 5 10 15 mean posterior Michael Gutmann ABC 6 / 47

  7. Bayesian inference Recap Inference for simulator-based models Simulator-based models Different kinds of statistical models ◮ The statistical model was defined via the family of pdfs p ( y | θ ). ◮ Statistical models can be specified in other ways as well. ◮ In this lecture: models which are specified via a mechanism (rule) for generating data ◮ Example: Instead of � � − ( y − θ ) 2 1 √ p ( y | θ ) = exp (6) 2 2 π we could have specified the model via � − 2 log( ω ) cos(2 πν ) y = z + θ z = (7) where ω and ν are independent random variables uniformly distributed on (0 , 1). Advantage? Michael Gutmann ABC 7 / 47

  8. Bayesian inference Recap Inference for simulator-based models Simulator-based models Simulator-based models ◮ Sampling from the model is straightforward. For example: 1. Sampling ω i and ν i from the uniform random variables ω and ν , 2. computing the nonlinear transformation � y i = f ( ω i , ν i , θ ) = θ + − 2 log( ω i ) cos(2 πν i ) produces samples y i ∼ p ( y | θ ). ◮ Enables direct modeling of how data are generated. ◮ Names for models specified via a data generating mechanism: ◮ Generative models ◮ Implicit models ◮ Stochastic simulation models ◮ Simulator-based models Michael Gutmann ABC 8 / 47

  9. Bayesian inference Recap Inference for simulator-based models Simulator-based models Examples Simulator-based models are used in: ◮ Astrophysics: Simulating the formation of galaxies, stars, or planets ◮ Evolutionary biology: Simulating the evolution of life ◮ Health science: Simulating the spread of an infectious disease ◮ . . . Dark matter density simulated by the Illustris collaboration (Figure from http://www.illustris-project.org) Michael Gutmann ABC 9 / 47

  10. Examples (evolutionary biology) ◮ Simulation of different hypothesized evolutionary scenarios ◮ Interaction between early modern humans (Homo sapiens) and their Neanderthal contemporaries in Europe Immigration of Modern Humans into Europe from the Near East. Light gray: Neanderthal population. Dark: Homo sapiens. from (Currat and Excoffier, Plos Biology , 2004, 10.1371/journal.pbio.0020421). The numbers in the figures indicate generations. See also Pinhasi et al, The genetic history of Europeans, Trends in Genetics , 2012 Michael Gutmann ABC 10 / 47

  11. Examples (health science) ◮ Simulation of bacterial transmission dynamics in child day care centers (Numminen et al, Biometrics , 2013) 5 10 n i a 15 Strain r t S 20 25 30 5 5 10 15 20 25 30 35 Individual Individual 10 15 Strain 5 20 10 25 15 30 Strain 5 5 10 20 15 20 25 30 35 Individual 10 25 15 Strain 30 20 5 10 15 20 25 30 35 Individual Time 25 30 5 10 15 20 25 30 35 Individual Michael Gutmann ABC 11 / 47

  12. Bayesian inference Recap Inference for simulator-based models Simulator-based models Formal definition of a simulator-based model ◮ Let (Ω , F , P ) be a probability space. ◮ A simulator-based model is a collection of (measurable) functions g ( ., θ ) parametrized by θ , ω ∈ Ω �→ y = g ( ω , θ ) ∈ Y (8) ◮ For any fixed θ , y θ = g ( ., θ ) is a random variable. Simulation / Sampling Michael Gutmann ABC 12 / 47

  13. Bayesian inference Recap Inference for simulator-based models Simulator-based models Advantages of simulator-based models ◮ Direct implementation of hypotheses of how the observed data were generated. ◮ Neat interface with physical or biological models of data. ◮ Modeling by replicating the mechanisms of nature which produced the observed/measured data. (“Analysis by synthesis”) ◮ Possibility to perform experiments in silico. Michael Gutmann ABC 13 / 47

  14. Bayesian inference Recap Inference for simulator-based models Simulator-based models Disadvantages of simulator-based models ◮ Generally elude analytical treatment. ◮ Can be easily made more complicated than necessary. ◮ Statistical inference is difficult . . . but possible! — This lecture is about inference for simulator-based models — Michael Gutmann ABC 14 / 47

  15. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Family of pdfs induced by the simulator ◮ For any fixed θ , the output of the simulator y θ = g ( ., θ ) is a random variable. ◮ Generally, it is impossible to write down the pdf of y θ analytically in closed form. ◮ No closed-form formulae available for p ( y | θ ). ◮ Simulator defines the model pdfs p ( y | θ ) implicitly. Michael Gutmann ABC 15 / 47

  16. Implicit definition of the model pdfs A A Michael Gutmann ABC 16 / 47

  17. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Implicit definition of the likelihood function ◮ The implicit definition of the model pdfs implies an implicit definition of the likelihood function. For discrete random variables: Michael Gutmann ABC 17 / 47

  18. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Implicit definition of the likelihood function ◮ For continuous random variables: L ( θ ) = lim ǫ → 0 L ǫ ( θ ) Michael Gutmann ABC 18 / 47

  19. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Implicit definition of the likelihood function ◮ To compute the likelihood function, we need to compute the probability that the simulator generates data close to y o , Pr ( y = y o | θ ) Pr ( y ∈ B ǫ ( y o ) | θ ) or ◮ No analytical expression available. ◮ But we can empirically test whether simulated data equals y o or is in B ǫ ( y o ). ◮ This property will be exploited to perform inference for simulator-based models. Michael Gutmann ABC 19 / 47

  20. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Exact inference for discrete random variables ◮ For discrete random variables, we can perform exact Bayesian inference without knowing the likelihood function. ◮ Idea: the posterior is obtained by conditioning p ( θ , y ) on the event y = y o : p ( θ | y o ) = p ( θ , y o ) = p ( θ , y = y o ) (9) p ( y o ) p ( y = y o ) ◮ Given tuples ( θ i , y i ) where ◮ θ i ∼ p θ (iid from the prior) ◮ y i = g ( ω i , θ i ) (obtained by running the simulator) retain only those where y i = y o . ◮ The θ i from the retained tuples are samples from the posterior p ( θ | y o ). Michael Gutmann ABC 20 / 47

  21. Likelihood function Bayesian inference Exact inference Inference for simulator-based models Approximate inference Example ◮ Posterior inference of the success probability θ in a Bernoulli trial. ◮ Data: y o = 1 ◮ Prior: p θ = 1 on (0 , 1) ◮ Data generating process: ◮ Given θ i ∼ p θ ◮ ω i ∼ U (0 , 1) � 1 if ω i < θ i ◮ y i = 0 otherwise ◮ Retain those θ i for which y i = y o . Michael Gutmann ABC 21 / 47

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend