SLIDE 1
A Bayesian method to detect targets of selection in - - PowerPoint PPT Presentation
A Bayesian method to detect targets of selection in - - PowerPoint PPT Presentation
A Bayesian method to detect targets of selection in Evolve-and-Resequence experiments Rui Borges, Carolina Barata and Carolin Kosiol SMBE Satellite Meeting 13 February 2018 Natural selection Natural selection is a key force in evolution
SLIDE 2
SLIDE 3
Selection and experimental evolution
Time t0 Time t1 Time tN Pressure Sequencing Evolve . . . Cage Replicates ◮ Evolve-and-Resequence experiments: pool-seq time series data
SLIDE 4
Testing/measuring adaptive scenarios
Empirical methods
◮ Fisher exact test ◮ CMH test
(Agresti and Kateri 2011)
◮ Gaussian processes
(Topa et al. 2015) Mechanistic methods
◮ WFABC
(Foll et al. 2015)
◮ Clear
(Iranmehr et al. 2017)
◮ LLS
(Taus et al. 20157)
SLIDE 5
Measuring adaptive scenarios
WFABC
◮ Bayes factors ◮ Highly dependent
- n the summary
statistics Clear
◮ LRTs ◮ Empirical p-values
based on genome-wide drift simulations LLS
◮ Least squares ◮ Empirical p-values
based on simulations of allele trajectories
SLIDE 6
Improvements to existing methods
◮ Full picture: distribution of σ ◮ Computationally fast, allowing genome-wide applications
SLIDE 7
Defining population states
◮ Consider two alleles A and a and a
population of fixed size N
◮ Population states: {nA, (N − n)a}
{2A, 8a}
SLIDE 8
Evolving population states
{1A,9a} {3A,7a} {7A,3a}
◮ Trajectory Xt is a collection of states {nA, (N − n)a} informing the
number of alleles A on time t
◮ According to the Moran model with selection
(Moran 1958) n → n − 1 :
n(N−n) N
n → n + 1 :
n(N−n) N
(1 + σ)
◮ Neutrality: σ = 0
SLIDE 9
The likelihood function
◮ Xt: number of alleles A on time t ◮ T: number of time points ◮ R: number or replicates
p(X | σ) =
R
- r=1
p(X r
0 = xr 0) T
- t=1
p(X r
t = xr t | X r t−1 = xr t−1, σ)
SLIDE 10
Sequencing noise
◮ allele counts indirectly inform on the frequency of an allele in a population ◮ binomial sampling process
p({nA, (N − n)a} | c) ∝
- C
c n N c 1 − n N C−c
SLIDE 11
Algorithm
◮ Calculates the allele trajectories ◮ Calculates the likelihood/posterior given σ ◮ Adjusts the the log posterior using orthogonal polynomials ◮ Calculates summary statistics: ˆ
σ and BF
log posterior allele trajectories sync file
SLIDE 12
Algorithm: example
Neσ = 10 C = 40x BF = 5.7 CPU time = 1 second Burke et al. (2014) dataset (75500 sites) = 21 hours
SLIDE 13
Simulated data
Experimental conditions:
◮ Number of replicates ◮ Time schemes ◮ Coverage
Population scenarios:
◮ Effective population size ◮ Strength of selection ◮ Allele initial frequency
SLIDE 14
Number of replicates
◮ Higher number of replicates lead to unbiased and narrower ˆ
σ
◮ Two replicates are likely to lead to erroneous conclusions, specially in
regimens of selection
Genetic drift < Selection
Ne=300 Neσ=10
- 5
5 10 15
Replicates 2 5 10 scaled selection coefficient true σ 0.01 0.05 0.1 0.5 Initial frequency:
SLIDE 15
Time schemes
true σ 0.01 0.05 0.1 0.5 Initial frequency: Number of time points
- 5
5 10 15
2 5 10 Ne=300 Neσ=10 Tmax=Ne/2 Time schemes: uniform more sampling at the begginig more sampling at the end Number of time points
- 10
- 5
5 10 15
2 5 10 Ne=300 Neσ=10 Tmax=Ne/4 scaled selection coefficient
Genetic drift < Selection Genetic drift < Selection
◮ ˆ
σ are less biased for more time points
◮ more sampling at the begging improves ˆ
σ, specially in regimens of selection
◮ Two time points are likely to lead to erroneous conclusions
SLIDE 16
Coverage
◮ Coverage does not seem to significantly interfere with ˆ
σ
Genetic drift < Selection
Ne=300 Neσ=10 true σ 0.01 0.05 0.1 0.5 Initial frequency:
- 10
- 5
5 10 15
scaled selection coefficient 20x 60x 100x 200x Coverage
SLIDE 17
Our method vs. LLS
- 0.1
0.0 0.1 0.2
- 0.1
0.0 0.1 0.2
- 0.03
- 0.01
0.01 0.03
- 0.03
- 0.01
0.01 0.03
σ LLS σ LLS
Ne=300 Neσ=1 p0=0.01 Ne=300 Neσ=1 p0=0.5 ^ ^
σ our algorithm
^
σ our algorithm
^
bias of σ with our algorithm bias of σ with LLS true σ
^ ^ ◮ LLS overestimates σ for trajectories starting with lower frequencies ◮ both methods perform quite similar for trajectories starting with higher
frequencies
SLIDE 18
Application to real data
Drosophila simulans dataset (Barghi et al. 2019)
◮ 10 replicates ◮ sequencing at 7 time points: 0, 10, ..., 60 ◮ Ne ≈ 300 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2 4 6 8 10
Chromosome X
Genomic position |log BF|
selective SNPs neutral SNPs
SLIDE 19
Application to real data
- 0.03
- 0.02
- 0.01
0.00 0.01 0.02 0.00000 0.00004 0.00008 0.00012 neutral SNPs selective SNPs
Chromosome 3R average σ variance σ ◮ variance of σ measures the heterogeneity of allele trajectories among
replicates
◮ identify SNP with different adaptive strategies
SLIDE 20
Summary and future work
◮ Distribution of σ ◮ Statistically cleaner ◮ Computationally fast ◮ Flexible tool ◮ More testing in real datasets (suggestions?) ◮ Multiple testing scheme for the BFs
SLIDE 21