Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - - PowerPoint PPT Presentation

approximate bayesian computation for astrostatistics
SMART_READER_LITE
LIVE PREVIEW

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - - PowerPoint PPT Presentation

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our


slide-1
SLIDE 1

Approximate Bayesian Computation for Astrostatistics

Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop

slide-2
SLIDE 2

Our goals

Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our ABCs.... Approximate Bayesian Computation

Image from http://livingstontownship.org/lycsblog/?p=241

1

slide-3
SLIDE 3

Suppose we randomly select a sample of size n women and measure their heights: x1, x2, x3, . . . , xn Let’s assume this follows a Normal distribution: X ∼ N(µ, σ) f (x; µ, σ) = 1 √ 2πσ2 e− (x−µ)2

2σ2

2

slide-4
SLIDE 4

Stellar Initial Mass Function

3

slide-5
SLIDE 5

Probability density function (pdf) − → the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2

2σ2

Height (in) Density µHeight

Each observation from this

4

slide-6
SLIDE 6

Probability density function (pdf) − → the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2

2σ2

Height (in) Density µHeight

Each observation from this Likelihood − → the function f (x, µ, σ), with µ variable and x fixed f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2

2σ2

µ Likelihood µ X

4

slide-7
SLIDE 7

Consider a random sample of size n (assuming independence, and a known σ): X1, . . . , Xn ∼ N(3, 2) f (x, µ, σ) = f (x1, . . . , xn, µ, σ)=

n

  • i=1

1 √ 2πσ2 e− (xi −µ)2

2σ2

1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5

µ (normalized) Likelihood

n = 50 n = 100 n = 250 n = 500 µ

5

slide-8
SLIDE 8

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer µ? Or any unknown parameter(s) θ?

6

slide-9
SLIDE 9

Maximum likelihood estimation The parameter value, θ, that maximizes the likelihood: ˆ θ = max

θ

f (x1, . . . , xn, θ)

1.0 2.0 3.0 4.0 0e+00 2e-43 4e-43 6e-43

µ Likelihood

MLE µ

maxµ f (x1, . . . , xn, µ, σ) = maxµ n

i=1 1 √ 2πσ2 e− (xi −µ)2

2σ2

Hence, ˆ µ =

n

i=1 xi

n

= ¯ x

7

slide-10
SLIDE 10

Bayesian framework

Suppose θ is our parameter(s) of interest Classical or Frequentist methods for inference consider θ to be fixed and unknown − → performance of methods evaluated by repeated sampling − → consider all possible data sets Bayesian methods consider θ to be random − → only considers observed data set and prior information

8

slide-11
SLIDE 11

Posterior distribution

π(θ |

Data

  • x ) =

Likelihood

f (x | θ) ·

Prior

  • π(θ)

f (x) = f (x | θ)π(θ)

  • Θ dθf (x | θ)π(θ) ∝ f (x | θ)π(θ)

The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the

  • bserved data

9

slide-12
SLIDE 12

Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), ¯ y = 1.278 Prior: N(µ0 = −3, σ0 = 5) Posterior: N(µ1 = 1.114, σ1 = 0.981)

  • 10
  • 5

5 10 0.0 0.1 0.2 0.3 0.4

µ Density

Likelihood Prior Posterior

10

slide-13
SLIDE 13

Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), ¯ y = 1.278 Prior: N(µ0 = −3, σ0 = 1) Posterior: N(µ1 = −0.861, σ1 = 0.707)

  • 10
  • 5

5 10 0.0 0.2 0.4 0.6

µ Density

Likelihood Prior Posterior

11

slide-14
SLIDE 14

Data: y1, . . . , y200 ∼ N(µ = 3, σ = 2), ¯ y = 2.999 Prior: N(µ0 = −3, σ0 = 1) Posterior: N(µ1 = 2.881, σ1 = 0.140)

  • 6
  • 4
  • 2

2 4 6 0.0 1.0 2.0 3.0

µ Density

Likelihood Prior Posterior

12

slide-15
SLIDE 15

Prior distribution

The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information?

13

slide-16
SLIDE 16

Prior distribution

The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information? And what if you don’t know the likelihood??

13

slide-17
SLIDE 17

Approximate Bayesian Computation “Likelihood-free” approach to approximating π(θ | xobs) i.e. f (xobs | θ) not specified Proceeds via simulation of the forward process Why would we not know f (xobs | θ)?

1 Physical model too complex to write as a closed form function

  • f θ

2 Strong dependency in data 3 Observational limitations

ABC in Astronomy: Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013); Schafer and Freeman (2012); Cameron and Pettitt (2012)

14

slide-18
SLIDE 18

15

slide-19
SLIDE 19

Basic ABC algorithm

For the observed data xobs and prior π(θ): Algorithm∗

1 Sample θ prop from prior π(θ) 2 Generate x prop from forward process f (x | θ prop) 3 Accept θ prop if xobs = x prop 4 Return to step 1

∗Introduced in Pritchard et al. (1999) (population genetics) 16

slide-20
SLIDE 20

Step 3: Accept θ

prop if xobs = x prop

Waiting for proposals such that xobs = x

prop would be computationally

prohibitive Instead, accept proposals with ∆(xobs, x

prop) ≤ ǫ

for some distance ∆ and some tolerance threshold ǫ When x is high-dimensional, will have to make ǫ too large in order to keep acceptance probability reasonable. Instead, reduce the dimension by comparing summaries S(x

prop) and S(xobs)

17

slide-21
SLIDE 21

Gaussian illustration

Data xobs consists of 25 iid draws from Normal(µ, 1) Summary statistics S(x) = ¯ x Distance function ∆(S(x

prop), S(xobs)) = |¯

x

prop − ¯

x

  • bs|

Tolerance ǫ = 0.50 and 0.10 Prior π(µ) = Normal(0,10)

18

slide-22
SLIDE 22

Gaussian illustration: posteriors for µ

Tolerance: 0.5, N:1000, n:25

µ Density

  • 1.0
  • 0.5

0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value

Tolerance: 0.1, N:1000, n:25

µ Density

  • 1.0
  • 0.5

0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value

19

slide-23
SLIDE 23

How to pick a tolerance, ǫ? Instead of starting the ABC algorithm over with a smaller tolerance (ǫ), decrease the tolerance and use the already sampled particle system as a proposal distribution rather than drawing from the prior distribution. Particle system: (1) retained sampled values, (2) importance weights Beaumont et al. (2009); Moral et al. (2011); Bonassi and West (2004)

20

slide-24
SLIDE 24

Gaussian illustration: sequential posteriors

  • 1.0
  • 0.5

0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0

N:1000, n:25

µ Density True Posterior ABC Posterior Input value

Tolerance sequence, ǫ1:10: 1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06

21

slide-25
SLIDE 25

Stellar Initial Mass Function

22

slide-26
SLIDE 26

Stellar Initial Mass Function: the distribution of star masses after a star formation event within a specified volume of space Molecular cloud → Protostars → Stars

Image: adapted from http://www.astro.ljmu.ac.uk

23

slide-27
SLIDE 27

Broken power-law

(Kroupa, 2001) Φ(M) ∝ M−αi, M1i ≤ M ≤ M2i α1 = 0.3 for 0.01 ≤ M/M∗

Sun ≤ 0.08 [Sub-stellar]

α2 = 1.3 for 0.08 ≤ M/MSun ≤ 0.50 α3 = 2.3 for 0.50 ≤ M/MSun ≤ Mmax Many other models, e.g. Salpeter (1955); Chabrier (2003)

∗1 MSun = 1 Solar Mass (the mass of our Sun)

24

slide-28
SLIDE 28

ABC for the Stellar Initial Mass Function

Mass Frequency

0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600

Image (left): Adapted from http://www.astro.ljmu.ac.uk

25

slide-29
SLIDE 29

IMF Likelihood

Start with a power-law distribution: each star’s mass is independently drawn from a power law distribution with density f (m) =

  • 1 − α

M1−α

max − M1−α min

  • m−α,

m ∈ (Mmin, Mmax) Then the likelihood is L(α | m1:ntot) =

  • 1 − α

M1−α

max − M1−α min

ntot ×

ntot

  • i=1

m−α

i

ntot = total number of stars in cluster

26

slide-30
SLIDE 30

Observational limitations: aging

Lifecycle of star depends on mass → more massive stars die faster Cluster age of τ Myr → only observe stars with masses < Tage ≈ τ −2/5 × 108/5 If age = 30 Myr so the aging cutoff is Tage ≈ 10 MSun Then the likelihood is

L(α | m1:nobs , ntot) =

  • 1 − α

T 1−α

age

− M1−α

min

nobs nobs

  • i=1

m−α

i

  • × P(M > Tage)ntot−nobs

ntot = # of stars in cluster nobs = # stars observed in cluster

Image: http://scioly.org

27

slide-31
SLIDE 31

Observational limitations: completeness

Completeness function: P(observing star | m) =      0, m < Cmin

m−Cmin Cmax−Cmin ,

m ∈ [Cmin, Cmax] 1, m > Cmax Probability of observing a particular star given its mass Depends on the flux limit, stellar crowding, etc.

Image: NASA, J. Trauger (JPL), J. Westphal (Caltech)

28

slide-32
SLIDE 32

Observational limitations: measurement error

Incorporating log-normal measurement error gives our final likelihood:

L(α | m1:nobs , ntot) =

  • P(M > Tage) +
  • 1 − α

M1−α

max

− M1−α

min

Cmax

Cmin

M−α ×

  • 1 −

M − Cmin Cmax − Cmin

  • dM

ntot −nobs ×

nobs

  • i=1

Tage

2

(2πσ2)− 1

2 m−1 i

e

− 1 2σ2 (log(mi )−log(M))2

1 − α M1−α

max

− M1−α

min

  • M−α

×

  • I{M > Cmax} +
  • M − Cmin

Cmax − Cmin

  • I{Cmin ≤ M ≤ Cmax}
  • dM
  • 29
slide-33
SLIDE 33

IMF

Mass Density

10 20 30 40 50 60 0.0 0.1 0.2 0.3 0.4 0.5 0.6

With aging, completeness, and error

Mass Density

2 4 6 8 10 12 14 16 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Sample size = 1000 stars, [Cmin, Cmax] = [2, 4], σ = 0.25

30

slide-34
SLIDE 34

Simulation Study: forward model

Draw from f (m) =

  • 1 − α

601−α − 21−α

  • m−α,

m ∈ (2, 60) Aged 30 Myrs Observational completeness: P(obs | m) =      0, m < 4

m−2 2 ,

m ∈ [2, 4] 1, m > 4. Uncertainty: log M = log m + 0.25η (with η ∼ N(0, 1)) Prior: α ∼ U[0, 6]∗

31

slide-35
SLIDE 35

Simulation Study: summary statistics

We want to account for the following with our summary statistics and distance functions:

  • 1. Shape of the observed Mass Function

ρ1(msim, mobs) = ˆ flog msim(x) − ˆ flog mobs(x) 2 dx 1/2

32

slide-36
SLIDE 36
  • 2. Number of stars observed

ρ2(msim, mobs) = |1 − nsim/nobs| msim = masses of the stars simulated from the forward model mobs = masses of observed stars nsim = number of stars simulated from the forward model nobs = number of observed stars

33

slide-37
SLIDE 37

Simulation Study

1 Draw n = 103 stars 2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60 3 N = 103 particles 4 T = 30 sequential time steps

Observed Mass Function

Mass Density

5 10 15 0.00 0.05 0.10 0.15 0.20 0.25

34

slide-38
SLIDE 38

10 20 30 40 50 0.00 0.10 0.20 0.30

IMF

Density 10 20 30 40 50 0.00 0.10 0.20 0.30

Observed MF

Density

Sample size = 1000 stars, [Cmin, Cmax] = [2, 4], σ = 0.25

35

slide-39
SLIDE 39

Simulation Study results

1 Draw n = 103 stars 2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60 3 N = 103 particles 4 T = 30 sequential time steps

0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 (A) Observed Mass Function Log10(M) Log10(fM)

Observed MF 95% Credible band Posterior Median

1.6 1.8 2.0 2.2 2.4 2.6 2.8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 (B) Posteriors α Density

ABC Posterior True Posterior Input value Initial ABC posterior ... Intermediate ABC posterior ... Final ABC Posterior

36

slide-40
SLIDE 40

Summary

ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful

37

slide-41
SLIDE 41

Summary

ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful

THANK YOU!

37

slide-42
SLIDE 42

Bibliography I

Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), “Approximate Bayesian computation for forward modeling in cosmology,” Journal of Cosmology and Astroparticle Physics, 2015, 043. Bate, M. R. (2012), “Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulation

  • f star cluster formation,” Monthly Notices of the Royal Astronomical Society, 419, 3115–3146.

Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), “Adaptive approximate Bayesian computation,” Biometrika, 96, 983 – 990. Bonassi, F. V. and West, M. (2004), “Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation,” Bayesian Analysis, 1, 1–19. Cameron, E. and Pettitt, A. N. (2012), “Approximate Bayesian Computation for Astronomical Model Analysis: A Case Study in Galaxy Demographics and Morphological Transformation at High Redshift,” Monthly Notices of the Royal Astronomical Society, 425, 44–65. Chabrier, G. (2003), “The galactic disk mass function: Reconciliation of the hubble space telescope and nearby determinations,” The Astrophysical Journal Letters, 586, L133. Ishida, E. E. O., Vitenti, S. D. P., Penna-Lima, M., Cisewski, J., de Souza, R. S., Trindade, A. M. M., Cameron, E., and Busti, V. C. (2015), “cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation,” Astronomy and Computing, 13, 1 – 11. Kroupa, P. (2001), “On the variation of the initial mass function,” Monthly Notices of the Royal Astronomical Society, 322, 231–246. Moral, P. D., Doucet, A., and Jasra, A. (2011), “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22, 1009–1020. Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), “Population Growth of Human Y Chromosomes: A study of Y Chromosome Microsatellites,” Molecular Biology and Evolution, 16, 1791 – 1798. Salpeter, E. E. (1955), “The luminosity function and stellar evolution.” The Astrophysical Journal, 121, 161. Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1, Lecture Notes in Statistics, pp. 3 – 19. Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), “Likeihood-free cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty,” The Astrophysical Journal, 764, 116.

38