ABC for Temporally Sampled Genetic Data Mark A. Beaumont, Schools - - PowerPoint PPT Presentation

abc for temporally sampled genetic data
SMART_READER_LITE
LIVE PREVIEW

ABC for Temporally Sampled Genetic Data Mark A. Beaumont, Schools - - PowerPoint PPT Presentation

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 ABC for Temporally Sampled Genetic Data Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK 05 April 2011 Mark A.


slide-1
SLIDE 1

ABC for Temporally Sampled Genetic Data

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK 05 April 2011

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 1 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-2
SLIDE 2

Temporal Change in Gene Frequency with Admixture

Aim is to infer parameters in a state space model of changes in gene frequency in the presence of admixture.

t6 t1 t0 t3 t2 t4 t5

F0 F1 F2 F3 F4 F5 F6 μ6 μ5 μ4 μ3 μ2 μ1 N6 N5 N4 N3 N2 N1 X0 α0 β1 X2 X3 X4 X5 X6 X1 α1 α2 α3 α4 α5 α6 β2 β3 β4 β5 β6 Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 2 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-3
SLIDE 3

Temporal Change in Gene Frequency with Admixture

Temporally sampled genetic data is quite commonly obtained. Changes are usually attributed to genetic drift (a function of the population size). However admixture and replacement of populations over time may be confounded with drift. This is a major issue for ancient DNA samples

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 3 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-4
SLIDE 4

Importance Sampling, Particles, and MCMC

Beaumont (Genetics, 2003); GIMH algorithm; using noisy estimates

  • f likelihood obtained from sequential importance sampling in MCMC.

Becquet and Przeworski (Genome Research, 2007); application of GIMH idea to MCMC-ABC algorithm of Marjoram et al (PNAS, 2003). Andrieu and Roberts (Annal. Stat. 2009)Pseudo-marginal method: convergence proofs and generalization of GIMH. Andrieu, Doucet, and Holenstein (RSSB, 2010); Particle MCMC Peters and Cornebise (RSSB, discussion of A,D,&H, 2010); ABC and particle MCMC.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 4 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-5
SLIDE 5

Framework for Temporal Model with Admixture (1)

S temporal samples are taken. ti Time of ith sample (i = 0, . . . , S). ∆tj Difference between time of jth and (j − 1)th sample (j = 1, . . . , S). Nj Effective population size for jth interval. µj Admixture proportion for jth interval. Fi FST of ith admixing population.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 5 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-6
SLIDE 6

Framework for Temporal Model with Admixture (2)

Use a Dirichlet rather than coalescent to model variance in allele frequencies: Laval et al., (Genetics, 2003) Kitakado et al., (Genetics, 2006) This does not give the same allele frequency distribution as the coalescent, but for a given FST, the variance is the same (see discussant contributions to Nicholson et al (RSSB, 2002)).

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 6 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-7
SLIDE 7

Framework for Temporal Model with Admixture (3)

For frequency vector αi of length K alleles, sampled at time ti, we model the change in frequency due to drift over the interval ∆ti with effective size Ni as αi ∼ D(φiα′

(i−1),1, . . . , φiα′ (i−1),K)

where φi = exp(−∆ti/Ni) (1 − exp(−∆ti/Ni))· The observed frequencies, Xi are assumed to be multinomial samples from αi

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 7 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-8
SLIDE 8

Framework for Temporal Model with Admixture (4)

Admixture is modelled as α′

i−1 = (1 − µi)αi−1 + µiβi·

The admixing frequencies βj, (j = 1, . . . , S), and the initial α0, are drawn from Dirichlet distributions, parameterized by Fi (i = 0, . . . , S), and metapopulation frequency M. E.g: β1 ∼ D(θ1M1, . . . , θ1MK) with θ1 = 1 F1 − 1 (Sewall Wright’s infinite island model)

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 8 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-9
SLIDE 9

t6 t1 t0 t3 t2 t4 t5

F0 F1 F2 F3 F4 F5 F6 μ6 μ5 μ4 μ3 μ2 μ1 N6 N5 N4 N3 N2 N1 X0 α0 β1 X2 X3 X4 X5 X6 X1 α1 α2 α3 α4 α5 α6 β2 β3 β4 β5 β6

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 9 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-10
SLIDE 10

MCMC implementation of TMA

Aim is to infer parameters in this model in a Bayesian framework. The likelihood is: P(X0|α0)P(α0|F0, M) × S

i=1 {P(Xi|αi)P(αi|αi−1, Ni, ∆ti, µi, βi)P(βi|Fi, M)}

The tis are known. Assume a hierarchical prior on Ni (Gaussian on log-scale) Assume beta priors on µi and Fi Assume Dirichlet prior on M Update parameters using Metropolis-Hastings.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 10 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-11
SLIDE 11

Application to Bryozoan data

1 Data from a freshwater Bryozoan, Cristatella mucedo, studied by

Beth Okamura (NHM, London) and Sophia Ahmed (Roscoff, France).

2 8 highly polymorphic microsatellite loci genotyped by Sophia Ahmed. 3 Sampled over 7 time periods. 4 Gene frequencies change markedly; unlikely to be due to drift. 5 Aim is to estimate effective population sizes, admixture proportions,

and FST of putative admixing populations.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 11 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-12
SLIDE 12

Bryozoan data: results with MCMC algorithm

10 20 30 40 50 0.00 0.02 0.04 0.06 0.08 N density 6 5 4 3 2 1

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 12 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-13
SLIDE 13

Bryozoan data: results with MCMC algorithm

0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 mu density 6 5 4 3 2 1

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 13 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-14
SLIDE 14

Bryozoan data: results with MCMC algorithm

0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 FST density 1 2 3 4 5 6

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 14 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-15
SLIDE 15

Convergence of MCMC

Comparison of runs with likelihood held constant, to check for recovery of priors. Data sampled at 4 time points, 2 loci, 5 alleles each. Histogram — αi held constant Red line — αi updated Black line — prior N(4,1)

Mean log N Density 2 4 6 8 0.0 0.1 0.2 0.3 0.4

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 15 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-16
SLIDE 16

Particle MCMC Implementation of TMA

Aim is to avoid MCMC updates for α1, . . . , αS, but use MCMC for all

  • ther parameters (including α0).

At each MCMC step, use importance sampling of the αi to compute noisy likelihood estimate, conditioning on all parameter values at that stage in the MCMC.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 16 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-17
SLIDE 17

Particle MCMC Implementation of TMA

Schematic Algorithm

1 For sample point 1: 1

set φ1 =

exp(−∆t1/N1) (1−exp(−∆t1/N1))·

2

Simulate M particles: α(j)

1

∼ q(α(j)

1 ) := D({φ1 + X1}α′ 0).

3

Compute importance weight W (j)

1

= p(X1|α(j)

1 )p(α(j) 1 |α′ 0, φ1)/q(α(j) 1 ).

4

Set ˜ L1 = 1/M W (j)

1 .

2 For sample points i > 1: 1

Set φi.

2

Simulate M particles: α(j)

i

∼ q(α(j)

i ) := D({φi + Xi}α′(j) i−1),

where α′(j)

i−1 = (1 − µi)α(j) i−1 + µiβi

where α(j)

i−1 is sampled from particles at step i − 1 with weight W (l) i−1,

l = 1, . . . , M

3

Compute weights etc. as for time step 1.

3 Set ˜

L = P(X0|α0) S

i=1 ˜

Li.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 17 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-18
SLIDE 18

Results from Particle MCMC

Trace of mean N

  • 1000

2000 3000 4000 5000 2 3 4 5 6

MCMC algorithm

Index mean log N

  • 1000

2000 3000 4000 5000 6000 2 3 4 5 6

GIMH algorithm

Index mean log N

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 18 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-19
SLIDE 19

2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 mean N density 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 mu 2 density 0.0 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 FST 3 density

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 19 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-20
SLIDE 20

ABC and Particle MCMC

Motivation is to see whether this approach is feasible and competitive with particle MCMC. Replace importance estimate of ˜ Li with proportion of simulated points that are within tolerance interval.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 20 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-21
SLIDE 21

ABC and Particle MCMC: application to TMA

Implementation

1 For sample point 1: 1

set φ1 =

exp(−∆t1/N1) (1−exp(−∆t1/N1))·

2

Simulate M particles: α(j)

1

∼ D(φ1α′

0), X ′ 1 ∼ Multinom(α(j) 1 ).

3

Compute (0,1) weight W (j)

1

= I(|X ′

1 − X1| < δ).

4

Set ˜ L1 = 1/M W (j)

1 .

2 For sample points i > 1: 1

Set φi.

2

Simulate M particles: α(j)

i

∼ D(φiα′(j)

i−1),

where α′(j)

i−1 = (1 − µi)α(j) i−1 + µiβi

where α(j)

i−1 is sampled from particles at step i − 1 with weight W (l) i−1,

l = 1, . . . , M.

3

Compute weights etc. as for time step 1.

3 Set ˜

L = P(X0|α0) S

i=1 ˜

Li.

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 21 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-22
SLIDE 22

Summary Statistics

1 Allele frequency is used. 2 Compute Q = 1/(K − 1) (X ′

i − Xi)/(Xi + g) for alleles

i = 1, . . . , K.

3 For threshold R, accept if Q < R. 4 In examples, R = 0·3 or 0·4 and g = 1. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 22 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-23
SLIDE 23

Results from Particle MCMC with ABC

  • 1000

2000 3000 4000 5000 2 3 4 5 6

MCMC algorithm

Index mean log N

  • 1000

3000 5000 2 3 4 5 6

GIMH algorithm

Index mean log N

  • 1000

2000 3000 4000 2 3 4 5 6 7

ABC algorithm, thresh=0.4

Index mean log N

  • 1000

2000 3000 4000 2 3 4 5 6 7

ABC algorithm, thresh=0.3

Index mean log N

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 23 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-24
SLIDE 24

Results from Particle MCMC with ABC

2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 mean N density 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 mu 2 density 0.0 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 2.0 FST 3 density

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 24 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011

slide-25
SLIDE 25

Acknowledgments

Beth Okamura Sophia Ahmed

Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK () ABC for Temporally Sampled Genetic Data 05 April 2011 25 / 25

Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011