A Bayesian method to detect targets of selection in - - PowerPoint PPT Presentation

a bayesian method to detect targets of selection in
SMART_READER_LITE
LIVE PREVIEW

A Bayesian method to detect targets of selection in - - PowerPoint PPT Presentation

A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments Rui Borges, Carolina Barata and Carolin Kosiol SMBE Satellite Meeting 13 February 2018 Natural selection Natural selection is a key force in evolution


slide-1
SLIDE 1

A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments

Rui Borges, Carolina Barata and Carolin Kosiol SMBE Satellite Meeting 13 February 2018

slide-2
SLIDE 2

Natural selection

◮ Natural selection is a key force in evolution ◮ Mechanisms by which populations can adapt to external pressures ◮ We focus on the adaptive evolution due to standing variation in

fixed-size populations

slide-3
SLIDE 3

Selection and experimental evolution

Time t0 Time t1 Time tN Pressure Sequencing Evolve . . . Cage Replicates ◮ Evolve-and-Resequence experiments: pool-seq time series data

slide-4
SLIDE 4

Testing/measuring adaptive scenarios

Empirical methods

◮ Fisher exact test ◮ CMH test

(Agresti and Kateri 2011)

◮ Gaussian processes

(Topa et al. 2015) Mechanistic methods

◮ WFABC

(Foll et al. 2015)

◮ Clear

(Iranmehr et al. 2017)

◮ LLS

(Taus et al. 20157)

slide-5
SLIDE 5

Measuring adaptive scenarios

WFABC

◮ Bayes factors ◮ Highly dependent

  • n the summary

statistics Clear

◮ LRTs ◮ Empirical p-values

based on genome-wide drift simulations LLS

◮ Least squares ◮ Empirical p-values

based on simulations of allele trajectories

slide-6
SLIDE 6

Improvements to existing methods

◮ Full picture: distribution of σ ◮ Computationally fast, allowing genome-wide applications

slide-7
SLIDE 7

Defining population states

◮ Consider two alleles A and a and a

population of fixed size N

◮ Population states: {nA, (N − n)a}

{2A, 8a}

slide-8
SLIDE 8

Evolving population states

{1A,9a} {3A,7a} {7A,3a}

◮ Trajectory Xt is a collection of states {nA, (N − n)a} informing the

number of alleles A on time t

◮ According to the Moran model with selection

(Moran 1958) n → n − 1 :

n(N−n) N

n → n + 1 :

n(N−n) N

(1 + σ)

◮ Neutrality: σ = 0

slide-9
SLIDE 9

The likelihood function

◮ Xt: number of alleles A on time t ◮ T: number of time points ◮ R: number or replicates

p(X | σ) =

R

  • r=1

p(X r

0 = xr 0) T

  • t=1

p(X r

t = xr t | X r t−1 = xr t−1, σ)

slide-10
SLIDE 10

Sequencing noise

◮ allele counts indirectly inform on the frequency of an allele in a population ◮ binomial sampling process

p({nA, (N − n)a} | c) ∝

  • C

c n N c 1 − n N C−c

slide-11
SLIDE 11

Algorithm

◮ Calculates the allele trajectories ◮ Calculates the likelihood/posterior given σ ◮ Adjusts the the log posterior using orthogonal polynomials ◮ Calculates summary statistics: ˆ

σ and BF

log posterior allele trajectories sync file

slide-12
SLIDE 12

Algorithm: example

Neσ = 10 C = 40x BF = 5.7 CPU time = 1 second Burke et al. (2014) dataset (75500 sites) = 21 hours

slide-13
SLIDE 13

Simulated data

Experimental conditions:

◮ Number of replicates ◮ Time schemes ◮ Coverage

Population scenarios:

◮ Effective population size ◮ Strength of selection ◮ Allele initial frequency

slide-14
SLIDE 14

Number of replicates

◮ Higher number of replicates lead to unbiased and narrower ˆ

σ

◮ Two replicates are likely to lead to erroneous conclusions, specially in

regimens of selection

Genetic drift < Selection

Ne=300 Neσ=10

  • 5

5 10 15

Replicates 2 5 10 scaled selection coefficient true σ 0.01 0.05 0.1 0.5 Initial frequency:

slide-15
SLIDE 15

Time schemes

true σ 0.01 0.05 0.1 0.5 Initial frequency: Number of time points

  • 5

5 10 15

2 5 10 Ne=300 Neσ=10 Tmax=Ne/2 Time schemes: uniform more sampling at the begginig more sampling at the end Number of time points

  • 10
  • 5

5 10 15

2 5 10 Ne=300 Neσ=10 Tmax=Ne/4 scaled selection coefficient

Genetic drift < Selection Genetic drift < Selection

◮ ˆ

σ are less biased for more time points

◮ more sampling at the begging improves ˆ

σ, specially in regimens of selection

◮ Two time points are likely to lead to erroneous conclusions

slide-16
SLIDE 16

Coverage

◮ Coverage does not seem to significantly interfere with ˆ

σ

Genetic drift < Selection

Ne=300 Neσ=10 true σ 0.01 0.05 0.1 0.5 Initial frequency:

  • 10
  • 5

5 10 15

scaled selection coefficient 20x 60x 100x 200x Coverage

slide-17
SLIDE 17

Our method vs. LLS

  • 0.1

0.0 0.1 0.2

  • 0.1

0.0 0.1 0.2

  • 0.03
  • 0.01

0.01 0.03

  • 0.03
  • 0.01

0.01 0.03

σ LLS σ LLS

Ne=300 Neσ=1 p0=0.01 Ne=300 Neσ=1 p0=0.5 ^ ^

σ our algorithm

^

σ our algorithm

^

bias of σ with our algorithm bias of σ with LLS true σ

^ ^ ◮ LLS overestimates σ for trajectories starting with lower frequencies ◮ both methods perform quite similar for trajectories starting with higher

frequencies

slide-18
SLIDE 18

Application to real data

Drosophila simulans dataset (Barghi et al. 2019)

◮ 10 replicates ◮ sequencing at 7 time points: 0, 10, ..., 60 ◮ Ne ≈ 300 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2 4 6 8 10

Chromosome X

Genomic position |log BF|

selective SNPs neutral SNPs

slide-19
SLIDE 19

Application to real data

  • 0.03
  • 0.02
  • 0.01

0.00 0.01 0.02 0.00000 0.00004 0.00008 0.00012 neutral SNPs selective SNPs

Chromosome 3R average σ variance σ ◮ variance of σ measures the heterogeneity of allele trajectories among

replicates

◮ identify SNP with different adaptive strategies

slide-20
SLIDE 20

Summary and future work

◮ Distribution of σ ◮ Statistically cleaner ◮ Computationally fast ◮ Flexible tool ◮ More testing in real datasets (suggestions?) ◮ Multiple testing scheme for the BFs

slide-21
SLIDE 21

Acknowledgements

◮ Claus Vogel ◮ Neda Barghi ◮ Marta Pelizzola ◮ WWTF Project MA16-061

Institute of Population Genetics Vetmeduni Vienna Centre for Biological Diversity University of St Andrews