Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - - PowerPoint PPT Presentation
Approximate Bayesian Computation for Astrostatistics Jessi Cisewski - - PowerPoint PPT Presentation
Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our
Our goals
Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our ABCs.... Approximate Bayesian Computation
Image from http://livingstontownship.org/lycsblog/?p=241
1
Suppose we randomly select a sample of size n women and measure their heights: x1, x2, x3, . . . , xn Let’s assume this follows a Normal distribution: X ∼ N(µ, σ) f (x; µ, σ) = 1 √ 2πσ2 e− (x−µ)2
2σ2
2
Stellar Initial Mass Function
3
Probability density function (pdf) − → the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2
2σ2
Height (in) Density µHeight
Each observation from this
4
Probability density function (pdf) − → the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2
2σ2
Height (in) Density µHeight
Each observation from this Likelihood − → the function f (x, µ, σ), with µ variable and x fixed f (x, µ, σ) = 1 √ 2πσ2 e− (x−µ)2
2σ2
µ Likelihood µ X
4
Consider a random sample of size n (assuming independence, and a known σ): X1, . . . , Xn ∼ N(3, 2) f (x, µ, σ) = f (x1, . . . , xn, µ, σ)=
n
- i=1
1 √ 2πσ2 e− (xi −µ)2
2σ2
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 5
µ (normalized) Likelihood
n = 50 n = 100 n = 250 n = 500 µ
5
Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer µ? Or any unknown parameter(s) θ?
6
Maximum likelihood estimation The parameter value, θ, that maximizes the likelihood: ˆ θ = max
θ
f (x1, . . . , xn, θ)
1.0 2.0 3.0 4.0 0e+00 2e-43 4e-43 6e-43
µ Likelihood
MLE µ
maxµ f (x1, . . . , xn, µ, σ) = maxµ n
i=1 1 √ 2πσ2 e− (xi −µ)2
2σ2
Hence, ˆ µ =
n
i=1 xi
n
= ¯ x
7
Bayesian framework
Suppose θ is our parameter(s) of interest Classical or Frequentist methods for inference consider θ to be fixed and unknown − → performance of methods evaluated by repeated sampling − → consider all possible data sets Bayesian methods consider θ to be random − → only considers observed data set and prior information
8
Posterior distribution
π(θ |
Data
- x ) =
Likelihood
f (x | θ) ·
Prior
- π(θ)
f (x) = f (x | θ)π(θ)
- Θ dθf (x | θ)π(θ) ∝ f (x | θ)π(θ)
The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the
- bserved data
9
Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), ¯ y = 1.278 Prior: N(µ0 = −3, σ0 = 5) Posterior: N(µ1 = 1.114, σ1 = 0.981)
- 10
- 5
5 10 0.0 0.1 0.2 0.3 0.4
µ Density
Likelihood Prior Posterior
10
Data: y1, . . . , y4 ∼ N(µ = 3, σ = 2), ¯ y = 1.278 Prior: N(µ0 = −3, σ0 = 1) Posterior: N(µ1 = −0.861, σ1 = 0.707)
- 10
- 5
5 10 0.0 0.2 0.4 0.6
µ Density
Likelihood Prior Posterior
11
Data: y1, . . . , y200 ∼ N(µ = 3, σ = 2), ¯ y = 2.999 Prior: N(µ0 = −3, σ0 = 1) Posterior: N(µ1 = 2.881, σ1 = 0.140)
- 6
- 4
- 2
2 4 6 0.0 1.0 2.0 3.0
µ Density
Likelihood Prior Posterior
12
Prior distribution
The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information?
13
Prior distribution
The prior distribution allows you to “easily” incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don’t actually have prior information? And what if you don’t know the likelihood??
13
Approximate Bayesian Computation “Likelihood-free” approach to approximating π(θ | xobs) i.e. f (xobs | θ) not specified Proceeds via simulation of the forward process Why would we not know f (xobs | θ)?
1 Physical model too complex to write as a closed form function
- f θ
2 Strong dependency in data 3 Observational limitations
ABC in Astronomy: Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013); Schafer and Freeman (2012); Cameron and Pettitt (2012)
14
15
Basic ABC algorithm
For the observed data xobs and prior π(θ): Algorithm∗
1 Sample θ prop from prior π(θ) 2 Generate x prop from forward process f (x | θ prop) 3 Accept θ prop if xobs = x prop 4 Return to step 1
∗Introduced in Pritchard et al. (1999) (population genetics) 16
Step 3: Accept θ
prop if xobs = x prop
Waiting for proposals such that xobs = x
prop would be computationally
prohibitive Instead, accept proposals with ∆(xobs, x
prop) ≤ ǫ
for some distance ∆ and some tolerance threshold ǫ When x is high-dimensional, will have to make ǫ too large in order to keep acceptance probability reasonable. Instead, reduce the dimension by comparing summaries S(x
prop) and S(xobs)
17
Gaussian illustration
Data xobs consists of 25 iid draws from Normal(µ, 1) Summary statistics S(x) = ¯ x Distance function ∆(S(x
prop), S(xobs)) = |¯
x
prop − ¯
x
- bs|
Tolerance ǫ = 0.50 and 0.10 Prior π(µ) = Normal(0,10)
18
Gaussian illustration: posteriors for µ
Tolerance: 0.5, N:1000, n:25
µ Density
- 1.0
- 0.5
0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value
Tolerance: 0.1, N:1000, n:25
µ Density
- 1.0
- 0.5
0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value
19
How to pick a tolerance, ǫ? Instead of starting the ABC algorithm over with a smaller tolerance (ǫ), decrease the tolerance and use the already sampled particle system as a proposal distribution rather than drawing from the prior distribution. Particle system: (1) retained sampled values, (2) importance weights Beaumont et al. (2009); Moral et al. (2011); Bonassi and West (2004)
20
Gaussian illustration: sequential posteriors
- 1.0
- 0.5
0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0
N:1000, n:25
µ Density True Posterior ABC Posterior Input value
Tolerance sequence, ǫ1:10: 1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06
21
Stellar Initial Mass Function
22
Stellar Initial Mass Function: the distribution of star masses after a star formation event within a specified volume of space Molecular cloud → Protostars → Stars
Image: adapted from http://www.astro.ljmu.ac.uk
23
Broken power-law
(Kroupa, 2001) Φ(M) ∝ M−αi, M1i ≤ M ≤ M2i α1 = 0.3 for 0.01 ≤ M/M∗
Sun ≤ 0.08 [Sub-stellar]
α2 = 1.3 for 0.08 ≤ M/MSun ≤ 0.50 α3 = 2.3 for 0.50 ≤ M/MSun ≤ Mmax Many other models, e.g. Salpeter (1955); Chabrier (2003)
∗1 MSun = 1 Solar Mass (the mass of our Sun)
24
ABC for the Stellar Initial Mass Function
Mass Frequency
0.0 0.5 1.0 1.5 2.0 2.5 3.0 200 400 600
Image (left): Adapted from http://www.astro.ljmu.ac.uk
25
IMF Likelihood
Start with a power-law distribution: each star’s mass is independently drawn from a power law distribution with density f (m) =
- 1 − α
M1−α
max − M1−α min
- m−α,
m ∈ (Mmin, Mmax) Then the likelihood is L(α | m1:ntot) =
- 1 − α
M1−α
max − M1−α min
ntot ×
ntot
- i=1
m−α
i
ntot = total number of stars in cluster
26
Observational limitations: aging
Lifecycle of star depends on mass → more massive stars die faster Cluster age of τ Myr → only observe stars with masses < Tage ≈ τ −2/5 × 108/5 If age = 30 Myr so the aging cutoff is Tage ≈ 10 MSun Then the likelihood is
L(α | m1:nobs , ntot) =
- 1 − α
T 1−α
age
− M1−α
min
nobs nobs
- i=1
m−α
i
- × P(M > Tage)ntot−nobs
ntot = # of stars in cluster nobs = # stars observed in cluster
Image: http://scioly.org
27
Observational limitations: completeness
Completeness function: P(observing star | m) = 0, m < Cmin
m−Cmin Cmax−Cmin ,
m ∈ [Cmin, Cmax] 1, m > Cmax Probability of observing a particular star given its mass Depends on the flux limit, stellar crowding, etc.
Image: NASA, J. Trauger (JPL), J. Westphal (Caltech)
28
Observational limitations: measurement error
Incorporating log-normal measurement error gives our final likelihood:
L(α | m1:nobs , ntot) =
- P(M > Tage) +
- 1 − α
M1−α
max
− M1−α
min
Cmax
Cmin
M−α ×
- 1 −
M − Cmin Cmax − Cmin
- dM
ntot −nobs ×
nobs
- i=1
Tage
2
(2πσ2)− 1
2 m−1 i
e
− 1 2σ2 (log(mi )−log(M))2
1 − α M1−α
max
− M1−α
min
- M−α
×
- I{M > Cmax} +
- M − Cmin
Cmax − Cmin
- I{Cmin ≤ M ≤ Cmax}
- dM
- 29
IMF
Mass Density
10 20 30 40 50 60 0.0 0.1 0.2 0.3 0.4 0.5 0.6
With aging, completeness, and error
Mass Density
2 4 6 8 10 12 14 16 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Sample size = 1000 stars, [Cmin, Cmax] = [2, 4], σ = 0.25
30
Simulation Study: forward model
Draw from f (m) =
- 1 − α
601−α − 21−α
- m−α,
m ∈ (2, 60) Aged 30 Myrs Observational completeness: P(obs | m) = 0, m < 4
m−2 2 ,
m ∈ [2, 4] 1, m > 4. Uncertainty: log M = log m + 0.25η (with η ∼ N(0, 1)) Prior: α ∼ U[0, 6]∗
31
Simulation Study: summary statistics
We want to account for the following with our summary statistics and distance functions:
- 1. Shape of the observed Mass Function
ρ1(msim, mobs) = ˆ flog msim(x) − ˆ flog mobs(x) 2 dx 1/2
32
- 2. Number of stars observed
ρ2(msim, mobs) = |1 − nsim/nobs| msim = masses of the stars simulated from the forward model mobs = masses of observed stars nsim = number of stars simulated from the forward model nobs = number of observed stars
33
Simulation Study
1 Draw n = 103 stars 2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60 3 N = 103 particles 4 T = 30 sequential time steps
Observed Mass Function
Mass Density
5 10 15 0.00 0.05 0.10 0.15 0.20 0.25
34
10 20 30 40 50 0.00 0.10 0.20 0.30
IMF
Density 10 20 30 40 50 0.00 0.10 0.20 0.30
Observed MF
Density
Sample size = 1000 stars, [Cmin, Cmax] = [2, 4], σ = 0.25
35
Simulation Study results
1 Draw n = 103 stars 2 IMF slope α = 2.35 with Mmin = 2 and Mmax = 60 3 N = 103 particles 4 T = 30 sequential time steps
0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 (A) Observed Mass Function Log10(M) Log10(fM)
Observed MF 95% Credible band Posterior Median
1.6 1.8 2.0 2.2 2.4 2.6 2.8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 (B) Posteriors α Density
ABC Posterior True Posterior Input value Initial ABC posterior ... Intermediate ABC posterior ... Final ABC Posterior
36
Summary
ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful
37
Summary
ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful
THANK YOU!
37
Bibliography I
Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), “Approximate Bayesian computation for forward modeling in cosmology,” Journal of Cosmology and Astroparticle Physics, 2015, 043. Bate, M. R. (2012), “Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulation
- f star cluster formation,” Monthly Notices of the Royal Astronomical Society, 419, 3115–3146.
Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), “Adaptive approximate Bayesian computation,” Biometrika, 96, 983 – 990. Bonassi, F. V. and West, M. (2004), “Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation,” Bayesian Analysis, 1, 1–19. Cameron, E. and Pettitt, A. N. (2012), “Approximate Bayesian Computation for Astronomical Model Analysis: A Case Study in Galaxy Demographics and Morphological Transformation at High Redshift,” Monthly Notices of the Royal Astronomical Society, 425, 44–65. Chabrier, G. (2003), “The galactic disk mass function: Reconciliation of the hubble space telescope and nearby determinations,” The Astrophysical Journal Letters, 586, L133. Ishida, E. E. O., Vitenti, S. D. P., Penna-Lima, M., Cisewski, J., de Souza, R. S., Trindade, A. M. M., Cameron, E., and Busti, V. C. (2015), “cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation,” Astronomy and Computing, 13, 1 – 11. Kroupa, P. (2001), “On the variation of the initial mass function,” Monthly Notices of the Royal Astronomical Society, 322, 231–246. Moral, P. D., Doucet, A., and Jasra, A. (2011), “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22, 1009–1020. Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), “Population Growth of Human Y Chromosomes: A study of Y Chromosome Microsatellites,” Molecular Biology and Evolution, 16, 1791 – 1798. Salpeter, E. E. (1955), “The luminosity function and stellar evolution.” The Astrophysical Journal, 121, 161. Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1, Lecture Notes in Statistics, pp. 3 – 19. Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), “Likeihood-free cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty,” The Astrophysical Journal, 764, 116.
38