Balancing Selection and Beyond:
Machine learning approaches for determining selection scenarios in a complex parameter space
Thursday 14th February 2019
Kaileigh Ahlquist
Ramachandran Lab, Brown University
Balancing Selection and Beyond: Machine learning approaches for - - PowerPoint PPT Presentation
Balancing Selection and Beyond: Machine learning approaches for determining selection scenarios in a complex parameter space Thursday 14 th February 2019 Kaileigh Ahlquist Ramachandran Lab, Brown University Balancing selection maintains
Thursday 14th February 2019
Kaileigh Ahlquist
Ramachandran Lab, Brown University
Pauling, Linus, et al. 1949. Science Ingram, V.M. 1957. Nature
AA AT TT Malaria Susceptible Malaria Resistant Sickle Cell Anemia Malaria Disease Pressure
Uyenoyama, M.K., and E. Newbigin. 2000. Plant Cell Kamau, Esther, and Deborah Charlesworth. 2005. Current Biology
S1 S2 Incompatible: S1, S2 Compatible: S3, S4
S1 S3
Plant Reproductive System
“The MHC is one of the most prominent examples of balancing selection in the vertebrate genome”
Lenz, Tobias L., et al. 2016. Molecular Biology and Evolution
“…several alleles within the MHC region show evidence for recent selective sweeps.”
De Bakker, et al. 2006. Nature Genetics
“Searching human and chimpanzee gene sequences for trans-specific polymorphism, we uncovered little evidence for long-term balancing selection”
Charlesworth, Deborah. 2006. PLoS Genetics
Problem 1: Distinguishing multiple modes of selection
Human chromosome 6 Major Histocompatibility Complex
Example of heterozygote advantage/overdominance Example of negative frequency dependent selection
Problem 2: Identifying multiple types of balancing selection Problem 3: Variable selection parameters and detection limits
Problem 1: Distinguishing multiple modes of selection
Problem 2: Identifying multiple types of balancing selection
Problem 3: Variable selection parameters and detection limits
“…specifically tailored to uncover regions of long-term balancing selection”
Cheng, Xiaoheng, and Michael DeGiorgio. 2018. bioRxiv preprint NCD statistic
“…the new methods have limited power to detect young balanced polymorphisms”
DeGiorgio, Michael, Kirk E. Lohmueller, and Rasmus Nielsen. 2014. Molecular Biology and Evolution T1, T2 BALLET statistics
Known Balanced Site Known Sweep Site Unknown 1 Unknown 2 Unknown 3 Unknown 4
91 undefined 89 77 undefined 27 54 96 56 98 94 71 Balanced ? Balanced Balanced ? ? Balanced Sweep Balanced Sweep Sweep ?
Training Testing
Sweep Detection Statistic Balancing Detection Statistic Balancing Statistic Only Combined Statistics
Data Classification
SWIF(r) = SWeep Inference Framework (controlling for correlation)
“can be run without imputing undefined statistics” “explicitly learns pairwise joint distributions
substantial gains in power” “computes the per-site calibrated probability
interpretable and does not require comparison with a genome-wide distribution”
any selective phenomena with training examples
Statistic 1 Statistic 2 Frequency Frequency Statistic 2 Statistic 1 Hard to separate
Simulation Pipeline
Haller, Benjamin C., and Philipp W. Messer. 2017. “SLiM 2: Flexible, Interactive Forward Genetic Simulations.” Molecular Biology and Evolution
Neutrally evolving population
Control: Neutrally evolving population Population experiencing selection
Statistic 1 Statistic 3 Statistic 2 Statistic 4
100
Neutral Sweep Balanced
SWIF(r) Classification True Classification
Probability of class “Sweep” 0.472015870602 0.524628895006 0.661572464304 0.62060257772 0.577741418065 0.58133184056 0.509290518833 0.537009355657 0.599232903772 0.773543482346 ... SWIF(r) Classification: Highest Probability
Problem 1: Distinguishing multiple modes of selection
Heterozygote advantage/
Heterozygote fitness > Homozygote fitness
2 alleles present Stable at 50%
Negative frequency dependent selection
Fitness adjusted in each generation depending on allele frequency
Statistic 1 Statistic 3 Statistic 2 Statistic 2
Neutral Overdominance (b01) Frequency Dependent (b02)
100
SWIF(r) Classification True Classification
SWIF(r) Classification: Highest Probability
Neutral Overdominance Frequency- Dependent Neutral O FD
Problem 2: Identifying multiple types of balancing selection Problem 3: Variable selection parameters and detection limits
Problem 1: Distinguishing multiple modes of selection
a probability of each mode
Problem 2: Identifying multiple types of balancing selection
how similar or distinct they are
Problem 3: Variable selection parameters and detection limits
Molecular Biology, Cell Biology and Biochemistry Graduate Program Committee Members: Mark Johnson David Rand Daniel Weinreich Sohini Ramachandran Lauren Alpert Sudgen Katherine Brunson Michael Turchin Wei Cheng Priyanka Nakka Sahar Shahamatdar Sam Smith