Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - PowerPoint PPT Presentation

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop

Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our ABCs.... Approximate Bayesian Computation Image from http://livingstontownship.org/lycsblog/?p=241 1

Suppose we randomly select a sample of size n women and measure their heights: x 1 , x 2 , x 3 , . . . , x n Let’s assume this follows a Normal distribution : X ∼ N ( µ, σ ) 2 πσ 2 e − ( x − µ )2 1 f ( x ; µ, σ ) = √ 2 σ 2 2

Stellar Initial Mass Function 3

Probability density function (pdf) − → the function f ( x , µ, σ ), with µ fixed and x variable Density 2 πσ 2 e − ( x − µ )2 1 √ f ( x , µ, σ ) = 2 σ 2 µ H e i g h t Height (in) Each observation from this 4

Probability density function (pdf) − → the function f ( x , µ, σ ), with µ fixed and x variable Density 2 πσ 2 e − ( x − µ )2 1 √ f ( x , µ, σ ) = 2 σ 2 µ H e i g h t Height (in) Each observation from this Likelihood − → the function f ( x , µ, σ ), with µ variable and x fixed 2 πσ 2 e − ( x − µ )2 1 Likelihood √ f ( x , µ, σ ) = 2 σ 2 µ X µ 4

Consider a random sample of size n (assuming independence, and a known σ ): X 1 , . . . , X n ∼ N (3 , 2) n 2 πσ 2 e − ( xi − µ )2 1 � f ( x , µ, σ ) = f ( x 1 , . . . , x n , µ, σ )= √ 2 σ 2 i =1 5 n = 50 n = 100 n = 250 (normalized) Likelihood n = 500 4 µ 3 2 1 0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ 5

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer µ ? Or any unknown parameter(s) θ ? 6

Maximum likelihood estimation The parameter value, θ , that maximizes the likelihood: ˆ θ = max f ( x 1 , . . . , x n , θ ) θ MLE 6e-43 µ max µ f ( x 1 , . . . , x n , µ, σ ) = Likelihood 2 πσ 2 e − ( xi − µ )2 � n 1 4e-43 max µ 2 σ 2 √ i =1 2e-43 � n i =1 x i Hence, ˆ µ = = ¯ x n 0e+00 1.0 2.0 3.0 4.0 µ 7

Bayesian framework Suppose θ is our parameter(s) of interest Classical or Frequentist methods for inference consider θ to be fixed and unknown − → performance of methods evaluated by repeated sampling − → consider all possible data sets Bayesian methods consider θ to be random − → only considers observed data set and prior information 8

Posterior distribution Likelihood Prior � �� Data f ( x | θ ) · π ( θ ) f ( x | θ ) π ( θ ) �� π ( θ | x ) = = Θ d θ f ( x | θ ) π ( θ ) ∝ f ( x | θ ) π ( θ ) � f ( x ) The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the observed data 9

Data: y 1 , . . . , y 4 ∼ N ( µ = 3 , σ = 2), ¯ y = 1 . 278 Prior: N ( µ 0 = − 3 , σ 0 = 5) Posterior: N ( µ 1 = 1 . 114 , σ 1 = 0 . 981) 0.4 Likelihood Prior Posterior 0.3 Density 0.2 0.1 0.0 -10 -5 0 5 10 µ 10

Data: y 1 , . . . , y 4 ∼ N ( µ = 3 , σ = 2), ¯ y = 1 . 278 Prior: N ( µ 0 = − 3 , σ 0 = 1) Posterior: N ( µ 1 = − 0 . 861 , σ 1 = 0 . 707) 0.6 Likelihood Prior Posterior 0.4 Density 0.2 0.0 -10 -5 0 5 10 µ 11

Data: y 1 , . . . , y 200 ∼ N ( µ = 3 , σ = 2), ¯ y = 2 . 999 Prior: N ( µ 0 = − 3 , σ 0 = 1) Posterior: N ( µ 1 = 2 . 881 , σ 1 = 0 . 140) 3.0 Likelihood Prior Posterior 2.0 Density 1.0 0.0 -6 -4 -2 0 2 4 6 µ 12

Prior distribution The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information? 13

Prior distribution The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information? And what if you don’t know the likelihood?? 13

Approximate Bayesian Computation “Likelihood-free” approach to approximating π ( θ | x obs ) i.e. f ( x obs | θ ) not specified Proceeds via simulation of the forward process Why would we not know f ( x obs | θ )? 1 Physical model too complex to write as a closed form function of θ 2 Strong dependency in data 3 Observational limitations ABC in Astronomy: Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013); Schafer and Freeman (2012); Cameron and Pettitt (2012) 14

Basic ABC algorithm For the observed data x obs and prior π ( θ ): Algorithm ∗ 1 Sample θ prop from prior π ( θ ) 2 Generate x prop from forward process f ( x | θ prop ) 3 Accept θ prop if x obs = x prop 4 Return to step 1 ∗ Introduced in Pritchard et al. (1999) (population genetics) 16

Step 3: Accept θ prop if x obs = x prop Waiting for proposals such that x obs = x prop would be computationally prohibitive Instead, accept proposals with ∆( x obs , x prop ) ≤ ǫ for some distance ∆ and some tolerance threshold ǫ When x is high-dimensional, will have to make ǫ too large in order to keep acceptance probability reasonable. Instead, reduce the dimension by comparing summaries S ( x prop ) and S ( x obs ) 17

Gaussian illustration Data x obs consists of 25 iid draws from Normal( µ, 1) Summary statistics S ( x ) = ¯ x Distance function ∆( S ( x prop ) , S ( x obs )) = | ¯ x prop − ¯ x obs | Tolerance ǫ = 0.50 and 0.10 Prior π ( µ ) = Normal(0,10) 18

Gaussian illustration: posteriors for µ Tolerance: 0.1, N:1000, n:25 Tolerance: 0.5, N:1000, n:25 2.0 True Posterior True Posterior 2.0 ABC Posterior ABC Posterior Input value Input value 1.5 1.5 Density Density 1.0 1.0 0.5 0.5 0.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 µ µ 19

How to pick a tolerance, ǫ ? Instead of starting the ABC algorithm over with a smaller tolerance ( ǫ ), decrease the tolerance and use the already sampled particle system as a proposal distribution rather than drawing from the prior distribution. Particle system: (1) retained sampled values, (2) importance weights Beaumont et al. (2009); Moral et al. (2011); Bonassi and West (2004) 20

Gaussian illustration: sequential posteriors N:1000, n:25 True Posterior ABC Posterior 2.0 Input value 1.5 Density 1.0 0.5 0.0 -1.0 -0.5 0.0 0.5 1.0 µ Tolerance sequence, ǫ 1:10 : 1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06 21

Stellar Initial Mass Function 22

Stellar Initial Mass Function : the distribution of star masses after a star formation event within a specified volume of space Molecular cloud → Protostars → Stars Image: adapted from http://www.astro.ljmu.ac.uk 23

Broken power-law (Kroupa, 2001) Φ( M ) ∝ M − α i , M 1 i ≤ M ≤ M 2 i α 1 = 0.3 for 0 . 01 ≤ M / M ∗ Sun ≤ 0 . 08 [Sub-stellar] α 2 = 1.3 for 0 . 08 ≤ M / M Sun ≤ 0 . 50 α 3 = 2.3 for 0 . 50 ≤ M / M Sun ≤ M max Many other models, e.g. Salpeter (1955); Chabrier (2003) ∗ 1 M Sun = 1 Solar Mass (the mass of our Sun) 24

ABC for the Stellar Initial Mass Function 600 Frequency 400 200 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Mass Image (left): Adapted from http://www.astro.ljmu.ac.uk 25

IMF Likelihood Start with a power-law distribution : each star’s mass is independently drawn from a power law distribution with density � � 1 − α m − α , f ( m ) = m ∈ ( M min , M max ) M 1 − α max − M 1 − α min Then the likelihood is � � n tot n tot 1 − α � m − α L ( α | m 1: n tot ) = × M 1 − α max − M 1 − α i min i =1 n tot = total number of stars in cluster 26

Observational limitations: aging Lifecycle of star depends on mass → more massive stars die faster Cluster age of τ Myr → only observe stars with masses < T age ≈ τ − 2 / 5 × 10 8 / 5 If age = 30 Myr so the aging cutoff is T age ≈ 10 M Sun Then the likelihood is � n obs � n obs � � 1 − α � m − α × P ( M > T age ) n tot − n obs L ( α | m 1: n obs , n tot ) = T 1 − α − M 1 − α i age min i =1 n tot = # of stars in cluster n obs = # stars observed in cluster Image: http://scioly.org 27

Observational limitations: completeness Completeness function:  0, m < C min   m − C min P (observing star | m ) = C max − C min , m ∈ [ C min , C max ]   1, m > C max Probability of observing a particular star given its mass Depends on the flux limit, stellar crowding, etc. Image: NASA, J. Trauger (JPL), J. Westphal (Caltech) 28

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - PowerPoint PPT Presentation

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Approximate Bayesian Computation Dr. Jarad Niemi STAT 615 - Iowa State University December 5,

Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

The CDE of ABC Challenges, Discoveries and Examples in Approximate Bayesian Computation Kerrie

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation (ABC)

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 26,

Introduction to Bayesian computation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

OPTIMAL LOCATION OF OPTICAL GROUND STATIONS TO SERVE LEO SPACECRAFT Inigo del Portillo

NMR and SAXS: Two complementary techniques Annalisa Pastore NIMR A pact of friendship... SAXS

Trichromatic Theory of Color Vision, Part II Jonathan Pillow Mathematical Tools for Neuroscience

Self-Structuring Antenna Concept for FM-band Automotive Backlight Antenna Design B.T. Perry * (1)

CHRISTIAN PAULO S. OLIVEIRA PEETERS DEPARTAMENTO DE BIOLOGIA ANIMAL UNIVERSIDADE ESTADUAL DE

Geant4 Low Energy Electromagnetic Physics Sbastien Incerti

LOW ENERGY ELECTROMAGNETIC PHYSICS Mihaly Novak Material from Sebastien Incerti (CNRS) Lund

Fingerprint-based physical mapping Dustin Cartwright (joint with Alexander Gutin) October 30,