21st-Century Statistical Computation for Exoplanet Studies Tom - - PowerPoint PPT Presentation

21st century statistical computation for exoplanet studies
SMART_READER_LITE
LIVE PREVIEW

21st-Century Statistical Computation for Exoplanet Studies Tom - - PowerPoint PPT Presentation

21st-Century Statistical Computation for Exoplanet Studies Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/ With David Chernoff, Merlise Clyde, Jim Berger, Floyd Bullard, Jim Crooks Work very much


slide-1
SLIDE 1

21st-Century Statistical Computation for Exoplanet Studies

Tom Loredo

  • Dept. of Astronomy, Cornell University

http://www.astro.cornell.edu/staff/loredo/ With David Chernoff, Merlise Clyde, Jim Berger, Floyd Bullard, Jim Crooks Work very much in progress MaxEnt — July 9, 2007

slide-2
SLIDE 2

A Productive Methodological Union

!"#$%&"'()*"*&%*&+%

!"#$%$"&$'()$&*+*,"(-.$,%/'( !"#,%012*,"(-.$,%/

,-.*&/&0$'%&1'".( 2'*$34"*&1'

345124%$'(6,"2$(31%7,'( 636386$2%,9,7*+:;1+2*"<+===

!""#$%&'()*%+$),'-.).$+.$/+

slide-3
SLIDE 3

An Unproductive Sociological Divide

!"#$%&'(&)*"*'%*'+%

!"#$%&'"()*+%&$*&,-.*//-%& 0"1"2*%&3-.$#*+444

5678.8(*/9-7)%&:-8-.;89("/-7)%& )87-8#82+%&<)+7=8#82+444>

!"#$%&'(&*,$&

  • ,#%'+".&)+'$(+$%

3"<#"7*444%&?*;;9*+)444% @8A%&?"+.*)%&BC##%&0D-##-.2% E9*//=89)/444

!"#$%&'()$"*+,)-",-

slide-4
SLIDE 4

An Example

Cosmological Parameters from WMAP & SDSS Tegmark et al. 2004

  • Looked at over 2 dozen models with 2 – 10 params
  • Used random walk MCMC in “rotated” params
  • Typical run length 3 – 5 ×105; a few runs trapped
  • 30 CPU years of effort
slide-5
SLIDE 5

Outline

1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms

A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation

slide-6
SLIDE 6

Outline

1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms

A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation

slide-7
SLIDE 7

Extrasolar Planets

Exoplanet detection/measurement methods:

  • Direct: Transits, gravitational lensing, imaging,

interferometric nulling

  • Indirect: Keplerian reflex motion (line-of-sight velocity,

astrometric wobble)

The Sun’s Wobble From 10 pc

  • 1000
  • 500

500 1000

  • 1000
  • 500

500 1000

slide-8
SLIDE 8

Radial Velocity Technique

As of July 2007, 245 planets found, including 26 multiple-planet systems Vast majority (233) found via Doppler radial velocity (RV) measurements

slide-9
SLIDE 9

Parameters for an Orbit — Single Planet

Intrinsic geometry: semimajor axis a, eccentricity e Orientation: 3 Euler angles, i, ω, Ω Time: period τ, origin Mp RV parameters: semi-amplitude K(a, e, τ), τ, e, Mp, ω, COM velocity v0 Physics: min. mass m sin i, size a, eccentricity e (Ultimate goal: multiple planets, astrometry → ∼ 30+ parameters!)

slide-10
SLIDE 10

Conventional RV Data Analysis

Analysis method: Identify best candidate period via periodogram; fit parameters with nonlinear least squares

!"#"$%&%'"( )"#"*&$' +",-."-"#"*&%*"/01 2"#"*&%3"45"

6-,7)8")9"2:&"%**'

slide-11
SLIDE 11

Issues with Periodogram + Least Squares

  • Multimodality, nonlinearity, sparse data → “∆χ2”

uncertainties not valid

  • Reporting uncertainties in derived parameters (m sin i, a)
  • Lomb-Scargle periodogram not optimal for eccentric and

multiple planets

  • Handling marginal detections
  • Combining info from many systems for pop’n studies
  • Scheduling future observations

Bayesian statistics can address all these within a unified framework!

slide-12
SLIDE 12

Outline

1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms

A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation

slide-13
SLIDE 13

Scientific Goals ⇐ ⇒ Bayesian Methodology

  • Improved system inferences — Detection (marginal

likelihoods, Bayes factors), estimation (joint & marginal posteriors)

  • Improved population inferences — Propagate system

uncertainties (including detection uncertainties and selection effects) to population level (hierarchical/multi-level Bayes)

  • Adaptive scheduling — Plan future observations to most

quickly reduce uncertainties (Bayesian experimental design)

Prior information & data Combined information Interim Strategy New data Predictions Strategy

Observ’n Design Inference

results

slide-14
SLIDE 14

Keplerian Radial Velocity Model

Parameters for single planet

  • τ = orbital period (days)
  • e = orbital eccentricity
  • K = velocity amplitude (m/s)
  • Argument of pericenter ω
  • Mean anomaly of pericenter

passage Mp

  • System center-of-mass velocity

v0

Velocity vs. time

v(t) = v0 + K (e cos ω + cos[ω + υ(t)]) True anomaly υ(t) found via Kepler’s equation for eccentric anomaly: E(t) − e sin E(t) = 2πt τ − Mp; tan υ 2 = 1 + e 1 − e 1/2 tan E 2 A strongly nonlinear model!

slide-15
SLIDE 15

The Likelihood Function

Keplerian velocity model with parameters θ = {K, τ, e, Mp, ω, v0}: di = v(ti; θ) + ǫi For measurement errors with std dev’n σi, and additional “jitter” with std dev’n σJ, L(θ, σJ) ≡ p({di}|θ, σJ) =

N

  • i=1

1 2π

  • σ2

i + σ2 J

exp

  • −1

2 [di − v(ti; θ)]2 σ2

i + σ2 J

 

i

1 2π

  • σ2

i + σ2 J

  exp

  • −1

2χ2(θ)

  • where

χ2(θ, σJ) ≡

  • i

[di − v(ti; θ)]2 σ2

i + σ2 J

Ignore jitter for now . . .

slide-16
SLIDE 16

What To Do With It

Parameter estimation

Posterior dist’n for parameters of model Mi with i planets: p(θ|D, M) ∝ p(θ|M)Li(θ) Summarize with mode, means, credible regions (found by integrating over θ)

Detection

Calculate probability for no planets (M0), one planet (M1) . . . . Let I = {Mi}. p(Mi|D, I) ∝ p(Mi|I)L(Mi) where L(Mi) =

  • dθ p(θ|Mi)L(θ)

Marginal likelihood L(Mi) includes “Occam factor”

slide-17
SLIDE 17

Design

Predict future datum dt at time t, accounting for model uncertainties: p(dt|D, Mi) =

  • dθ p(dt|θ, Mi) p(θ|D, Mi).

1st factor is Gaussian for dt with known model; 2nd term & integral account for uncertainty. Information theory → best time has largest dt uncertainty.

Population analysis

Multi-level (“hierarchical”) model: Parameterize the orbit parameter prior and infer it. Include probabilities for star having (1, 2, . . . ) planets → Bayes factors weight system contributions. Need all data, well-characterized surveys.

slide-18
SLIDE 18

Current MCMC Methods

Random Walk Metropolis (Ford) Performance depends a lot

  • n parameterization

Parallel Tempering (Gregory) General purpose, but inefficient/unwieldy

slide-19
SLIDE 19

Outline

1 Introducing Exoplanets 2 Bayesian Methods for Exoplanets 3 Some 21st Century Algorithms

A Bayesian pipeline Improved MCMC for parameter estimation Marginal likelihood calculation

slide-20
SLIDE 20

Improving Exoplanet Bayesian Calculation

Seek more efficient and less unwieldy samplers → abandon “general purpose” algorithms Keep all tasks in mind: Estimation, detection, design, pop’n . . . Guidelines for modest dimensional Bayesian computation:

  • Know thine enemy — Understand scales in various

dimensions, multimodality, locate modes; significant exploration should precede & guide sampling

  • Reduce dimensionality — Analytic/numerical techniques

may help

  • Clever priors — If a clever prior (that doesn’t do too much

violence to real prior info) helps, use it; you can weight and resample/Rao-Blackwellize later

  • Try different samplers — Nearly 20 years of ideas
slide-21
SLIDE 21

Know Thine Enemy I: Linear vs. Nonlinear Parameters

v(t) = v0 + K (e cos ω + cos[ω + υ(t)]) = A1 + A2[e + cos υ(t)] + A3 sin υ(t) A1 = v0; A2 = K cos ω; A3 = −Ksinω Model is linear wrt Ai, nonlinear wrt e, Mp, ω → p(θ) = p(τ, e, Mp) p({Ai}|τ, e, Mp) ||D, M1 with p({Ai}|τ, e, Mp) conditionally normal if we adopt a flat or conjugate prior for {Ai}. Flat prior → p(K|M1) ∝ K. Using this interim prior, dimension can be reduced by 3.

slide-22
SLIDE 22

Know Thine Enemy II — Slices

Extreme multimodality in τ; challenging multimodality in Mp; smooth in e

slide-23
SLIDE 23

Periodogram Connections

Bayesian periodograms (Jaynes-Bretthorst algorithm) Data are superposition of periodic functions + noise: di =

M

  • α=1

Aαgα(ti; ω; θ) + ei Calculate L({A}, ω, θ) using χ2. Integrate out A’s → least squares + volume factors: p(ω, θ|D) ∝ p(ω, θ)J(ω, θ) exp

  • −r2(ω, θ)

2

  • r2(ω, θ) is squared residual, found by diagonalizing metric

ηαβ = gα · gβ Integrate out θ numerically → p(ω|D); S(ω) ≡ ln [p(ω|D)] Generalizes Schuster periodogram & LSP.

slide-24
SLIDE 24

Circular Orbits

Assume circular orbits: θ = {K, τ, φ, v0} Frequentist For given τ, maximize likelihood over K and φ (set v0 to data mean, ¯ v) → profile likelihood: log Lp(τ, ¯ v) ∝ Lomb-Scargle periodogram Bayesian For given τ, integrate (“marginalize”) likelihood × prior over K and φ (set v0 to data mean, ¯ v) → marginal posterior: log p(τ, ¯ v|D) ∝ Lomb-Scargle periodogram Additionally marginalize over v0 → floating-mean LSP

slide-25
SLIDE 25

Radial Kepler Periodogram for Eccentric Orbits

v(t) = A1 + A2[e + cos υ(t)] + A3 sin υ(t) For given (τ, e, Mp), analytically marginalize {Ai}: ln p(τ, e, Mp|D, Mp) = (Radial) Kepler-gram For given τ, marginalize over (e, Mp): log p(τ|D) ∝ (Radial) Kepler periodogram This requires a 2-d numerical integral.

slide-26
SLIDE 26

Terminology for Generalized Periodograms

Sinusoid model

A1 cos ωt + A2 sin ωt ln p(ω|D, M) ∝ Schuster periodogram

Chirp model

A1 cos(ωt + αt2) + A2 sin(ωt + αt2) ln p(ω, α|D, M) ∝ Chirpogram (Jaynes)

Keplerian reflex motion model

ln p(τ, e, Mp|D, Mp) = (Radial) Kepler-gram log p(τ|D) ∝ (Radial) Kepler periodogram

slide-27
SLIDE 27

A Bayesian Pipeline

  • Choose priors to enable analytic calculations to improve

MCMC performance; fix priors with importance weighting

  • Use modern, adaptive MCMC algorithms

!"#" $%&'%( )%(*+,+-(". )/τ)0 1%2τ0314&2τ 5,"&#*6% 4747

{τ, %034&8

9.&+(#":;% <%*-=#*:- )+&>'"#*+: 5:"'?@*@ )(*+(@ 5,"&#*6% A;=%,>'*:- !%#%;#*+:3B 4%"@>(%.%:#

slide-28
SLIDE 28

A Bayesian Pipeline

  • Choose priors to enable analytic calculations to improve

MCMC performance; fix priors with importance weighting

  • Use modern, adaptive MCMC algorithms

!"#" $%&'%( )%(*+,+-(". )/τ)0 1%2τ0314&2τ 5,"&#*6% 4747

{τ, %034&8

9.&+(#":;% <%*-=#*:- )+&>'"#*+: 5:"'?@*@ )(*+(@ 5,"&#*6% A;=%,>'*:- !%#%;#*+:3B 4%"@>(%.%:#

slide-29
SLIDE 29

Some Modern MCMC Themes

  • Adaptive MCMC: Proposal dist’ns that change scale/shape
  • Coupled chains: Parallel tempering, parallel marginalization
  • Population-based MCMC: Coupled chains, all sampling the

posterior; pop’n defines the proposal

  • Mixing samplers: Local exploration + mode-hopping

(Note: Skilling’s BayeSys3 has some of this; large statistics literature)

slide-30
SLIDE 30

Differential Evolution MCMC

Ter Braak 2006 — Combine evolutionary computing & MCMC Follow a population of states, where a randomly selected state is considered for updating via the (scaled) vector difference between two other states.

!"#$#%&' (&)*+*&,-./#".0$*&,- !"#$#%-*.*+%$'&1-2-),

Behaves roughly like RWM, but with a proposal distribution that automatically adjusts to shape & scale of posterior Step scale: Optimal γ ≈ 2.38/ √ 2d, but occassionally switch to γ = 1 for mode-swapping

slide-31
SLIDE 31

Differential Evolution for Exoplanets

Use Kepler periodogram to reduce dimensionality to 3-d (τ, e, Mp). Use Kepler periodogram results (p(τ), moments of e, Mp) to define initial population for DEMC: Marsaglia’s 5-table sampler for τ Conditional beta and Von Mises samplers for e, Mp Once we have {τ, e, Mp}, get associated {K, ω, v0} samples from their exact conditional distribution. Advantages vs. PT & RWM:

  • Only 2 tuning parameters (# of parallel chains; mode swapping)
  • Updates all parameters at once
  • Candidate distribution adapts its shape and size
  • All of the parallel chains are usable
  • Simple!
slide-32
SLIDE 32

Results for HD 222582

24 Keck RV observations spanning 683 days; long period; hi e Kepler Periodogram

slide-33
SLIDE 33

Differential Evolution MCMC Performance Reaches convergence dramatically faster than PT or RWM, with

  • nly one tunable parameter (pop’n size — unexplored here!)

Conspiracy of three factors: Reduced dimensionality, adaptive proposals, good starting population (from K-gram)

slide-34
SLIDE 34

Results for HD 73526 (Early Data)

Exoplanet system with 3 important modes in τ DEMC can quickly explore within modes, but swaps between modes → longer convergence. Partitioning allows fast exploration within modes, can be automated. Does seem to be swapping at “right” frequency; the two minor modes are well-separated (though weak). What should our goal be for multimodal sampling? Should we weight small but not-insignificant modes to increase their sampling frequency?

slide-35
SLIDE 35

Exoplanet MCMC Work-In-Progress

  • Normal kernel coupler (Warnes 2000)
  • Stochastic approximation Monte Carlo (SAMC) — Better

explore modes

  • Derivatives of the Jaynes-Bretthorst algorithm:
  • May acclerate MCMC via hybrid Monte Carlo
  • Score-based output analysis (Fan, Brooks & Gelman

2006)

  • Gradient/geodesic mode-hopping
  • 1

1 2 3 -1

  • 0.5

0.5 1

  • 1
  • 0.5

0.5 1

  • 1

1 2

  • 1
  • 0.5

0.5

slide-36
SLIDE 36

Marginal Likelihood Calculation

Unpromising methods

  • Harmonic mean
  • Weighted harmonic mean
  • Thermodynamic integration
  • Nested sampling

Promising methods

  • Restricted importance sampler
  • Regional mixture importance sampler
  • Hybrid ratio estimator
  • Adaptive kernel density importance sampler
  • Adaptive simplex cubature
slide-37
SLIDE 37

Adaptive Simplex Cubature

(With a nod to Ken Hanson)

Motivation: Use MCMC sample locations and densities

q

θ

Suppose you were given {θi, qi} and told to estimate Z =

  • dθq(θ) for

this 1-d q. Use a quadrature approximation that doesn’t require specific abscissas: histogram, trapezoid, etc.. These weight by “volume” factors: Z =

  • intervals

(length) × (avg. height) In 2-d intervals are triangles (2-simplices); length→area. Make the triangles via Delaunay triangulation.

slide-38
SLIDE 38

Higher dimensions: Combine n-d Delaunay triangulation and n-d simplex trapezoidal rules of Lyness & Genz

slide-39
SLIDE 39

Performance

Explored up to 6-d with a variety of standard test-case normal mixtures, using samples as vertices. Qhull used for triangulation. Triangulation is expensive → use a small number of vertices. In few-d, requires many fewer points that subregion-adaptive cubature (DCUHRE), but underestimates integrals in > 4-D. There is lots of volume in the outer “shell” so even though density is low, it contributes a lot.

Modifications

  • Tempered/derivative-weighted resampling (seems to work to 6- or

7-D)

  • Non-optimal triangulations
slide-40
SLIDE 40

Summary: Guidelines for Modest-Dimension Bayes

  • Know thine enemy — Understand scales in various

dimensions, multimodality, locate modes; significant exploration should precede & guide sampling

  • Reduce dimensionality — Analytic/numerical techniques

may help

  • Clever priors — If a clever prior (that doesn’t do too much

violence to real prior info) helps, use it; you can weight and resample/Rao-Blackwellize later

  • Try different samplers — Nearly 20 years of ideas

Cross the sociological divide! It’s worth it!