SLIDE 1 21st-Century Statistical Computation for Exoplanet Studies
Tom Loredo
- Dept. of Astronomy, Cornell University
http://www.astro.cornell.edu/staff/loredo/ With David Chernoff, Merlise Clyde, Jim Berger, Floyd Bullard, Jim Crooks Work very much in progress MaxEnt — July 9, 2007
SLIDE 2
A Productive Methodological Union
!"#$%&"'()*"*&%*&+%
!"#$%$"&$'()$&*+*,"(-.$,%/'( !"#,%012*,"(-.$,%/
,-.*&/&0$'%&1'".( 2'*$34"*&1'
345124%$'(6,"2$(31%7,'( 636386$2%,9,7*+:;1+2*"<+===
!""#$%&'()*%+$),'-.).$+.$/+
SLIDE 3 An Unproductive Sociological Divide
!"#$%&'(&)*"*'%*'+%
!"#$%&'"()*+%&$*&,-.*//-%& 0"1"2*%&3-.$#*+444
5678.8(*/9-7)%&:-8-.;89("/-7)%& )87-8#82+%&<)+7=8#82+444>
!"#$%&'(&*,$&
3"<#"7*444%&?*;;9*+)444% @8A%&?"+.*)%&BC##%&0D-##-.2% E9*//=89)/444
!"#$%&'()$"*+,)-",-
SLIDE 4 An Example
Cosmological Parameters from WMAP & SDSS Tegmark et al. 2004
- Looked at over 2 dozen models with 2 – 10 params
- Used random walk MCMC in “rotated” params
- Typical run length 3 – 5 ×105; a few runs trapped
- 30 CPU years of effort
SLIDE 5
Outline
1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms
A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation
SLIDE 6
Outline
1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms
A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation
SLIDE 7 Extrasolar Planets
Exoplanet detection/measurement methods:
- Direct: Transits, gravitational lensing, imaging,
interferometric nulling
- Indirect: Keplerian reflex motion (line-of-sight velocity,
astrometric wobble)
The Sun’s Wobble From 10 pc
500 1000
500 1000
SLIDE 8
Radial Velocity Technique
As of July 2007, 245 planets found, including 26 multiple-planet systems Vast majority (233) found via Doppler radial velocity (RV) measurements
SLIDE 9
Parameters for an Orbit — Single Planet
Intrinsic geometry: semimajor axis a, eccentricity e Orientation: 3 Euler angles, i, ω, Ω Time: period τ, origin Mp RV parameters: semi-amplitude K(a, e, τ), τ, e, Mp, ω, COM velocity v0 Physics: min. mass m sin i, size a, eccentricity e (Ultimate goal: multiple planets, astrometry → ∼ 30+ parameters!)
SLIDE 10 Conventional RV Data Analysis
Analysis method: Identify best candidate period via periodogram; fit parameters with nonlinear least squares
!"#"$%&%'"( )"#"*&$' +",-."-"#"*&%*"/01 2"#"*&%3"45"
6-,7)8")9"2:&"%**'
SLIDE 11 Issues with Periodogram + Least Squares
- Multimodality, nonlinearity, sparse data → “∆χ2”
uncertainties not valid
- Reporting uncertainties in derived parameters (m sin i, a)
- Lomb-Scargle periodogram not optimal for eccentric and
multiple planets
- Handling marginal detections
- Combining info from many systems for pop’n studies
- Scheduling future observations
Bayesian statistics can address all these within a unified framework!
SLIDE 12
Outline
1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms
A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation
SLIDE 13 Scientific Goals ⇐ ⇒ Bayesian Methodology
- Improved system inferences — Detection (marginal
likelihoods, Bayes factors), estimation (joint & marginal posteriors)
- Improved population inferences — Propagate system
uncertainties (including detection uncertainties and selection effects) to population level (hierarchical/multi-level Bayes)
- Adaptive scheduling — Plan future observations to most
quickly reduce uncertainties (Bayesian experimental design)
Prior information & data Combined information Interim Strategy New data Predictions Strategy
Observ’n Design Inference
results
SLIDE 14 Keplerian Radial Velocity Model
Parameters for single planet
- τ = orbital period (days)
- e = orbital eccentricity
- K = velocity amplitude (m/s)
- Argument of pericenter ω
- Mean anomaly of pericenter
passage Mp
- System center-of-mass velocity
v0
Velocity vs. time
v(t) = v0 + K (e cos ω + cos[ω + υ(t)]) True anomaly υ(t) found via Kepler’s equation for eccentric anomaly: E(t) − e sin E(t) = 2πt τ − Mp; tan υ 2 = 1 + e 1 − e 1/2 tan E 2 A strongly nonlinear model!
SLIDE 15 The Likelihood Function
Keplerian velocity model with parameters θ = {K, τ, e, Mp, ω, v0}: di = v(ti; θ) + ǫi For measurement errors with std dev’n σi, and additional “jitter” with std dev’n σJ, L(θ, σJ) ≡ p({di}|θ, σJ) =
N
1 2π
i + σ2 J
exp
2 [di − v(ti; θ)]2 σ2
i + σ2 J
i
1 2π
i + σ2 J
exp
2χ2(θ)
χ2(θ, σJ) ≡
[di − v(ti; θ)]2 σ2
i + σ2 J
Ignore jitter for now . . .
SLIDE 16 What To Do With It
Parameter estimation
Posterior dist’n for parameters of model Mi with i planets: p(θ|D, M) ∝ p(θ|M)Li(θ) Summarize with mode, means, credible regions (found by integrating over θ)
Detection
Calculate probability for no planets (M0), one planet (M1) . . . . Let I = {Mi}. p(Mi|D, I) ∝ p(Mi|I)L(Mi) where L(Mi) =
Marginal likelihood L(Mi) includes “Occam factor”
SLIDE 17 Design
Predict future datum dt at time t, accounting for model uncertainties: p(dt|D, Mi) =
- dθ p(dt|θ, Mi) p(θ|D, Mi).
1st factor is Gaussian for dt with known model; 2nd term & integral account for uncertainty. Information theory → best time has largest dt uncertainty.
Population analysis
Multi-level (“hierarchical”) model: Parameterize the orbit parameter prior and infer it. Include probabilities for star having (1, 2, . . . ) planets → Bayes factors weight system contributions. Need all data, well-characterized surveys.
SLIDE 18 Current MCMC Methods
Random Walk Metropolis (Ford) Performance depends a lot
Parallel Tempering (Gregory) General purpose, but inefficient/unwieldy
SLIDE 19
Outline
1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms
A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation
SLIDE 20 Improving Exoplanet Bayesian Calculation
Seek more efficient and less unwieldy samplers → abandon “general purpose” algorithms Keep all tasks in mind: Estimation, detection, design, pop’n . . . Guidelines for modest dimensional Bayesian computation:
- Know thine enemy — Understand scales in various
dimensions, multimodality, locate modes; significant exploration should precede & guide sampling
- Reduce dimensionality — Analytic/numerical techniques
may help
- Clever priors — If a clever prior (that doesn’t do too much
violence to real prior info) helps, use it; you can weight and resample/Rao-Blackwellize later
- Try different samplers — Nearly 20 years of ideas
SLIDE 21
Know Thine Enemy I: Linear vs. Nonlinear Parameters
v(t) = v0 + K (e cos ω + cos[ω + υ(t)]) = A1 + A2[e + cos υ(t)] + A3 sin υ(t) A1 = v0; A2 = K cos ω; A3 = −Ksinω Model is linear wrt Ai, nonlinear wrt e, Mp, ω → p(θ) = p(τ, e, Mp) p({Ai}|τ, e, Mp) ||D, M1 with p({Ai}|τ, e, Mp) conditionally normal if we adopt a flat or conjugate prior for {Ai}. Flat prior → p(K|M1) ∝ K. Using this interim prior, dimension can be reduced by 3.
SLIDE 22
Know Thine Enemy II — Slices
Extreme multimodality in τ; challenging multimodality in Mp; smooth in e
SLIDE 23 Periodogram Connections
Bayesian periodograms (Jaynes-Bretthorst algorithm) Data are superposition of periodic functions + noise: di =
M
Aαgα(ti; ω; θ) + ei Calculate L({A}, ω, θ) using χ2. Integrate out A’s → least squares + volume factors: p(ω, θ|D) ∝ p(ω, θ)J(ω, θ) exp
2
- r2(ω, θ) is squared residual, found by diagonalizing metric
ηαβ = gα · gβ Integrate out θ numerically → p(ω|D); S(ω) ≡ ln [p(ω|D)] Generalizes Schuster periodogram & LSP.
SLIDE 24
Circular Orbits
Assume circular orbits: θ = {K, τ, φ, v0} Frequentist For given τ, maximize likelihood over K and φ (set v0 to data mean, ¯ v) → profile likelihood: log Lp(τ, ¯ v) ∝ Lomb-Scargle periodogram Bayesian For given τ, integrate (“marginalize”) likelihood × prior over K and φ (set v0 to data mean, ¯ v) → marginal posterior: log p(τ, ¯ v|D) ∝ Lomb-Scargle periodogram Additionally marginalize over v0 → floating-mean LSP
SLIDE 25
Radial Kepler Periodogram for Eccentric Orbits
v(t) = A1 + A2[e + cos υ(t)] + A3 sin υ(t) For given (τ, e, Mp), analytically marginalize {Ai}: ln p(τ, e, Mp|D, Mp) = (Radial) Kepler-gram For given τ, marginalize over (e, Mp): log p(τ|D) ∝ (Radial) Kepler periodogram This requires a 2-d numerical integral.
SLIDE 26
Terminology for Generalized Periodograms
Sinusoid model
A1 cos ωt + A2 sin ωt ln p(ω|D, M) ∝ Schuster periodogram
Chirp model
A1 cos(ωt + αt2) + A2 sin(ωt + αt2) ln p(ω, α|D, M) ∝ Chirpogram (Jaynes)
Keplerian reflex motion model
ln p(τ, e, Mp|D, Mp) = (Radial) Kepler-gram log p(τ|D) ∝ (Radial) Kepler periodogram
SLIDE 27 A Bayesian Pipeline
- Choose priors to enable analytic calculations to improve
MCMC performance; fix priors with importance weighting
- Use modern, adaptive MCMC algorithms
!"#" $%&'%( )%(*+,+-(". )/τ)0 1%2τ0314&2τ 5,"&#*6% 4747
{τ, %034&8
9.&+(#":;% <%*-=#*:- )+&>'"#*+: 5:"'?@*@ )(*+(@ 5,"&#*6% A;=%,>'*:- !%#%;#*+:3B 4%"@>(%.%:#
SLIDE 28 A Bayesian Pipeline
- Choose priors to enable analytic calculations to improve
MCMC performance; fix priors with importance weighting
- Use modern, adaptive MCMC algorithms
!"#" $%&'%( )%(*+,+-(". )/τ)0 1%2τ0314&2τ 5,"&#*6% 4747
{τ, %034&8
9.&+(#":;% <%*-=#*:- )+&>'"#*+: 5:"'?@*@ )(*+(@ 5,"&#*6% A;=%,>'*:- !%#%;#*+:3B 4%"@>(%.%:#
SLIDE 29 Some Modern MCMC Themes
- Adaptive MCMC: Proposal dist’ns that change scale/shape
- Coupled chains: Parallel tempering, parallel marginalization
- Population-based MCMC: Coupled chains, all sampling the
posterior; pop’n defines the proposal
- Mixing samplers: Local exploration + mode-hopping
(Note: Skilling’s BayeSys3 has some of this; large statistics literature)
SLIDE 30 Differential Evolution MCMC
Ter Braak 2006 — Combine evolutionary computing & MCMC Follow a population of states, where a randomly selected state is considered for updating via the (scaled) vector difference between two other states.
!"#$#%&' (&)*+*&,-./#".0$*&,- !"#$#%-*.*+%$'&1-2-),
Behaves roughly like RWM, but with a proposal distribution that automatically adjusts to shape & scale of posterior Step scale: Optimal γ ≈ 2.38/ √ 2d, but occassionally switch to γ = 1 for mode-swapping
SLIDE 31 Differential Evolution for Exoplanets
Use Kepler periodogram to reduce dimensionality to 3-d (τ, e, Mp). Use Kepler periodogram results (p(τ), moments of e, Mp) to define initial population for DEMC: Marsaglia’s 5-table sampler for τ Conditional beta and Von Mises samplers for e, Mp Once we have {τ, e, Mp}, get associated {K, ω, v0} samples from their exact conditional distribution. Advantages vs. PT & RWM:
- Only 2 tuning parameters (# of parallel chains; mode swapping)
- Updates all parameters at once
- Candidate distribution adapts its shape and size
- All of the parallel chains are usable
- Simple!
SLIDE 32
Results for HD 222582
24 Keck RV observations spanning 683 days; long period; hi e Kepler Periodogram
SLIDE 33 Differential Evolution MCMC Performance Reaches convergence dramatically faster than PT or RWM, with
- nly one tunable parameter (pop’n size — unexplored here!)
Conspiracy of three factors: Reduced dimensionality, adaptive proposals, good starting population (from K-gram)
SLIDE 34
Results for HD 73526 (Early Data)
Exoplanet system with 3 important modes in τ DEMC can quickly explore within modes, but swaps between modes → longer convergence. Partitioning allows fast exploration within modes, can be automated. Does seem to be swapping at “right” frequency; the two minor modes are well-separated (though weak). What should our goal be for multimodal sampling? Should we weight small but not-insignificant modes to increase their sampling frequency?
SLIDE 35 Exoplanet MCMC Work-In-Progress
- Normal kernel coupler (Warnes 2000)
- Stochastic approximation Monte Carlo (SAMC) — Better
explore modes
- Derivatives of the Jaynes-Bretthorst algorithm:
- May acclerate MCMC via hybrid Monte Carlo
- Score-based output analysis (Fan, Brooks & Gelman
2006)
- Gradient/geodesic mode-hopping
- 1
1 2 3 -1
0.5 1
0.5 1
1 2
0.5
SLIDE 36 Marginal Likelihood Calculation
Unpromising methods
- Harmonic mean
- Weighted harmonic mean
- Thermodynamic integration
- Nested sampling
Promising methods
- Restricted importance sampler
- Regional mixture importance sampler
- Hybrid ratio estimator
- Adaptive kernel density importance sampler
- Adaptive simplex cubature
SLIDE 37 Adaptive Simplex Cubature
(With a nod to Ken Hanson)
Motivation: Use MCMC sample locations and densities
q
θ
Suppose you were given {θi, qi} and told to estimate Z =
this 1-d q. Use a quadrature approximation that doesn’t require specific abscissas: histogram, trapezoid, etc.. These weight by “volume” factors: Z =
(length) × (avg. height) In 2-d intervals are triangles (2-simplices); length→area. Make the triangles via Delaunay triangulation.
SLIDE 38
Higher dimensions: Combine n-d Delaunay triangulation and n-d simplex trapezoidal rules of Lyness & Genz
SLIDE 39 Performance
Explored up to 6-d with a variety of standard test-case normal mixtures, using samples as vertices. Qhull used for triangulation. Triangulation is expensive → use a small number of vertices. In few-d, requires many fewer points that subregion-adaptive cubature (DCUHRE), but underestimates integrals in > 4-D. There is lots of volume in the outer “shell” so even though density is low, it contributes a lot.
Modifications
- Tempered/derivative-weighted resampling (seems to work to 6- or
7-D)
- Non-optimal triangulations
SLIDE 40 Summary: Guidelines for Modest-Dimension Bayes
- Know thine enemy — Understand scales in various
dimensions, multimodality, locate modes; significant exploration should precede & guide sampling
- Reduce dimensionality — Analytic/numerical techniques
may help
- Clever priors — If a clever prior (that doesn’t do too much
violence to real prior info) helps, use it; you can weight and resample/Rao-Blackwellize later
- Try different samplers — Nearly 20 years of ideas
Cross the sociological divide! It’s worth it!