Hitch-hiking and polygenic adaptation Kevin Thornton Ecology and - - PowerPoint PPT Presentation
Hitch-hiking and polygenic adaptation Kevin Thornton Ecology and - - PowerPoint PPT Presentation
Hitch-hiking and polygenic adaptation Kevin Thornton Ecology and Evolutionary Biology, UC Irvine 1 Linked selection vs. fates of selected mutations Hudson & Kaplan, 1995 De Vladar & Barton, 2014 2 Modeling traditions Population
Linked selection vs. fates of selected mutations
Hudson & Kaplan, 1995 De Vladar & Barton, 2014
2
Modeling traditions
Population genetics
- Evol. quantitative genetics
Fixed effect sizes Variable effect sizes Single selected site Many sites Directional sel’n Stabilizing selection Partial linkage LE or QLE
3
Tree structures
Neutral Recent hard sweep
4
Patterns reflect the tree structures
5
Linked selection during polygenic adaptation
- Use forward simulations
- fwdpy11 is a Python package
- Uses a C++ back-end (Thornton, 2014, Genetics)
6
Simulation scheme
A locus w = e−(z−zo)2/(2VS)
- 10 unlinked loci, θ = ρ = 1, 000 per locus
- Additive mutations arise at rate µ, Θ = 4Nµ
- Two thetas cannot possibly be confusing.
- N = 5, 000 diploids
- Evolve under GSS for 10N generations with optimal trait value of 0
- Shift optimal trait value to zo > 0 and evolve for 10N more generations
7
Adaptation occurs rapidly and before fixation
0.0 0.5 1.0 Value
= 2.5 × 10
4, z0 = 1,
= 0.045
Mean trait value 5 × V(G)
= 0.001, z0 = 1, = 0.089
Mean trait value 5 × V(G)
= 0.005, z0 = 1, = 0.200
Mean trait value 5 × V(G)
0.0 0.2 0.4 0.6 0.8 1.0 Mutation frequency = effect size,
- = origination time
= 0.38, o = -0.0018 = 0.17, o = 0.0040 = -0.09, o = 0.3042
= effect size,
- = origination time
= 0.55, o = -0.0004
= effect size,
- = origination time
= 0.26, o = -0.0044
0.02 0.00 0.02 0.04 0.0 0.2 0.4 0.6 0.8 1.0 Mutation frequency = effect size,
- = origination time
= 0.57, o = 0.0000 = 0.46, o = 0.0004 = 0.36, o = 0.0002 = 0.34, o = 0.0006 = 0.31, o = 0.0068
0.02 0.00 0.02 0.04 Time since optimum shift (units of N generations) = effect size,
- = origination time
= 0.75, o = 0.0022 = 0.57, o = -0.0002 = 0.55, o = 0.0066 = 0.48, o = 0.0012 = 0.39, o = 0.0000
0.02 0.00 0.02 0.04 = effect size,
- = origination time
= 0.68, o = -0.0016 = 0.57, o = -0.0012 = 0.54, o = -0.0004 = 0.51, o = -0.0008 = 0.49, o = 0.0182
Figure 1: Large optimum shift, zo = 1 with VS = 1.
8
Contributions of different loci
Time since optimum shift (units of N generations) Mean genetic value of locus. 0.2 0.4 0.6 0.8 1.0 . . 5 . 1 . 1 5 . 2 = Mutation rate, µ 0.00025 = Optimal trait value, zo 1 . . 5 . 1 . 1 5 . 2 = Mutation rate, µ 0.005 = Optimal trait value, zo 1
Figure 2: Mean trait value per locus, colored by rank.
9
Sweeps from SGV start out rare
0.0 0.5 Effect size ( ) 500 1000 1500 2000 2500 3000 3500 4000 Number of haplotypes with mutation
zo = 1.00 = 0.00025
1 2 3 4 5 6 7 8 9 0.0 0.5 Effect size ( )
zo = 1.00 = 0.001
2 4 6 8 10 12 0.0 0.5 Effect size ( )
zo = 1.00 = 0.005
5 10 15 20 25
This predicts “hard” sweep signals due to sweeps from large-effect SGV.
10
Temporal and spatial patterns of “selection signals”
Mean H' −0.4 −0.3 −0.2 −0.1 0.0 0.1 1 2 3 = µ 0.00025 = zo 1 1 2 3 = µ 0.001 = zo 1 1 2 3 = µ 0.005 = zo 1 Distance from window with causal mutations. 1 2 3 4 5 Time since optimum shift (units of N generations) Mean z−score −0.05 0.0 0.05 1 2 3 = µ 0.00025 = zo 1 1 2 3 = µ 0.001 = zo 1 1 2 3 = µ 0.005 = zo 1
Figure 3: Mean statistic per window over time for a large optimum shift. z scores are for the nSL statistic (Ferrer-Admetlla et al. (2014), MBE
11
Similar patterns for new mutations vs SVG
Mean H' −2 −1 1 2 3 = µ 0.00025 = zo 1:New mutation 1 2 3 = µ 0.001 = zo 1:New mutation 1 2 3 = µ 0.005 = zo 1:New mutation −2 −1 = µ 0.00025 = zo 1:Standing var. = µ 0.001 = zo 1:Standing var. = µ 0.005 = zo 1:Standing var. Time since optimum shift (units of N generations) Mean z−score −0.2 −0.1 0.0 0.1 0.2 0.3 1 2 3 = µ 0.00025 = zo 1:New mutation 1 2 3 = µ 0.001 = zo 1:New mutation 1 2 3 = µ 0.005 = zo 1:New mutation −0.2 −0.1 0.0 0.1 0.2 0.3 = µ 0.00025 = zo 1:Standing Var. = µ 0.001 = zo 1:Standing Var. = µ 0.005 = zo 1:Standing Var.
Figure 4: Same data, but conditioning on fixations of large effect
12
Mutational variance matters
Time since optimum shift (units of N generations) Mean H' −1.0 −0.5 0.0 0.00 0.05 0.10 0.15 = µ 0.00025 = Pr(|γ| >= γ ^) 0.05 0.00 0.05 0.10 0.15 = µ 0.001 = Pr(|γ| >= γ ^) 0.05 0.00 0.05 0.10 0.15 = µ 0.005 = Pr(|γ| >= γ ^) 0.05 −1.0 −0.5 0.0 = µ 0.00025 = Pr(|γ| >= γ ^) 0.75 = µ 0.001 = Pr(|γ| >= γ ^) 0.75 = µ 0.005 = Pr(|γ| >= γ ^) 0.75 Distance from window with causal mutations. 1 2 3 4 5
Figure 5: Choose σµ so that probability of a large-effect mutation is constant. Time scale is determined by δq of fixations.
13
Implications
- Patterns unique to “soft sweeps” are not generated by this model!
- We are using supervised machine learning (Schrider/Kern) to further investigate this.
- Hitch-hiking signals decrease as Θ increases
- Keep in mind that our “tests” are usuall designed to detect hard sweeps
Data not shown:
- Small optimum shifts leave less dramatic patterns
14
Tree sequences: representing genetic data using tables
Kelleher, et al. 2016. PLoS Computational Biology a.k.a "The msprime paper"
Time ago 2 1 5 10 Genomic position
Nodes:
ID time 1 2 3 4 0.0 0.0 1.0 2.0 0.0
Edges:
left right 5 5 10 10 10 10 parent child 1 3 2 3 4 3 4 5 2 4 5 3
Sites:
ID position ancestral state 1 2.5 7.5 A G ID site node
Mutations:
derived state 1 1 2 3 T C 2 1 1 G
Tree topologies and mutations:
T A A G G C
1 2
Intervals:
4 3 1 2 4 3 2 1 3 1 4 3 3 4 2 3 4 2
15
Tree sequence simplification. . .
Kelleher, et al. 2018. PLoS Computational Biology
16
. . . can be done in FAST linear time. . .
17
. . . and give a huge performance boost. . .
103 104 105 Scaled recombination rate ( = 4Nr) 10 20 30 40 50 Speedup due to pedigree recording N = 1e + 03 N = 1e + 04 N = 5e + 04
18
. . . allowing chromosome-scale simulations in large N
Θ =10 Θ =100
. 2 . 5 5 . 7 . 5 1 . . 2 . 5 5 . 7 . 5 1 .
0.110 0.112 0.114 0.116
Distance from trees with selected mutations (units of 4Nr) Expected proportion of singleton mutations Generations since
- ptimum shift
50 100 150 200 250 "Complete and partial sweeps" doman. "Polygenic adaptation" doman.
Figure 6: N = 2 × 105 diploids, ρ = 105 (≈ 100MB in humans),γ ∼ N(0, 0.25), VS = 1. Analysis based on n = 3, 000 diploids.
19
Facilitates better testing
- Methods for detecting polygenic adaptation of continuous traits shouldn’t be
evaluated with simulations of strong sweeps.
- Methods assuming linkage equilibrium need to be tested using simulations involving
partial linkage
- etc.
20
Resources
- fwdpy11: https://fwdpy11.readthedocs.org
- msprime: https://msprime.readthedocs.org
- Tree sequence tutorials: https://tskit-dev.github.io/tutorials/
- The tree sequence toolkit: https://github.com/tskit-dev/tskit (“almost ready”)
21
Thanks
- David Lawrie
- Khoi Hyunh
- Jaleal Sanjak
- Tony Long
- Jerome Kelleher, Jaime Ashander, Peter Ralph
- NIH for funding
- UCI HPCC for computing support