Balancing Selection and Beyond: Machine learning approaches for - - PowerPoint PPT Presentation

balancing selection and beyond
SMART_READER_LITE
LIVE PREVIEW

Balancing Selection and Beyond: Machine learning approaches for - - PowerPoint PPT Presentation

Balancing Selection and Beyond: Machine learning approaches for determining selection scenarios in a complex parameter space Thursday 14 th February 2019 Kaileigh Ahlquist Ramachandran Lab, Brown University Balancing selection maintains


slide-1
SLIDE 1

Balancing Selection and Beyond:

Machine learning approaches for determining selection scenarios in a complex parameter space

Thursday 14th February 2019

Kaileigh Ahlquist

Ramachandran Lab, Brown University

slide-2
SLIDE 2

Balancing selection maintains malaria resistance and sickle cell anemia

Pauling, Linus, et al. 1949. Science Ingram, V.M. 1957. Nature

AA AT TT Malaria Susceptible Malaria Resistant Sickle Cell Anemia Malaria Disease Pressure

slide-3
SLIDE 3

Balancing selection creates diversity in self-incompatibility systems

Uyenoyama, M.K., and E. Newbigin. 2000. Plant Cell Kamau, Esther, and Deborah Charlesworth. 2005. Current Biology

S1 S2 Incompatible: S1, S2 Compatible: S3, S4

S1 S3

Plant Reproductive System

slide-4
SLIDE 4

Balanced sites in the genome are important and there is potential for more discovery

“The MHC is one of the most prominent examples of balancing selection in the vertebrate genome”

Lenz, Tobias L., et al. 2016. Molecular Biology and Evolution

“…several alleles within the MHC region show evidence for recent selective sweeps.”

De Bakker, et al. 2006. Nature Genetics

“Searching human and chimpanzee gene sequences for trans-specific polymorphism, we uncovered little evidence for long-term balancing selection”

Charlesworth, Deborah. 2006. PLoS Genetics

Problem 1: Distinguishing multiple modes of selection

Human chromosome 6 Major Histocompatibility Complex

slide-5
SLIDE 5

There are multiple types of balancing selection

Example of heterozygote advantage/overdominance Example of negative frequency dependent selection

Problem 2: Identifying multiple types of balancing selection Problem 3: Variable selection parameters and detection limits

slide-6
SLIDE 6

Core problems in balancing selection

Problem 1: Distinguishing multiple modes of selection

  • Example: positive selection, background selection, balancing selection

Problem 2: Identifying multiple types of balancing selection

  • Example: overdominance, frequency dependent

Problem 3: Variable selection parameters and detection limits

  • Example: age of mutation, selection strength, overlapping events
slide-7
SLIDE 7

Methods to detect balancing selection focused

  • n older events, polymorphic sites

“…specifically tailored to uncover regions of long-term balancing selection”

Cheng, Xiaoheng, and Michael DeGiorgio. 2018. bioRxiv preprint NCD statistic

“…the new methods have limited power to detect young balanced polymorphisms”

DeGiorgio, Michael, Kirk E. Lohmueller, and Rasmus Nielsen. 2014. Molecular Biology and Evolution T1, T2 BALLET statistics

Methods to detect positive selection tend to focus on sweeps

slide-8
SLIDE 8

Known Balanced Site Known Sweep Site Unknown 1 Unknown 2 Unknown 3 Unknown 4

91 undefined 89 77 undefined 27 54 96 56 98 94 71 Balanced ? Balanced Balanced ? ? Balanced Sweep Balanced Sweep Sweep ?

Training Testing

Sweep Detection Statistic Balancing Detection Statistic Balancing Statistic Only Combined Statistics

Data Classification

slide-9
SLIDE 9

SWIF(r) is a machine learning approach that uses multiple statistics, handles missing data, and can compare multiple selection scenarios

SWIF(r) = SWeep Inference Framework (controlling for correlation)

“can be run without imputing undefined statistics” “explicitly learns pairwise joint distributions

  • f selection statistics, which gives

substantial gains in power” “computes the per-site calibrated probability

  • f selective sweep, which is immediately

interpretable and does not require comparison with a genome-wide distribution”

any selective phenomena with training examples

slide-10
SLIDE 10

SWIF(r) joint distributions gain power over individual statistics

Statistic 1 Statistic 2 Frequency Frequency Statistic 2 Statistic 1 Hard to separate

slide-11
SLIDE 11

SWIF(r) is trained on simulated data to create many instances where the selection scenario is known

Simulation Pipeline

Haller, Benjamin C., and Philipp W. Messer. 2017. “SLiM 2: Flexible, Interactive Forward Genetic Simulations.” Molecular Biology and Evolution

Neutrally evolving population

Control: Neutrally evolving population Population experiencing selection

slide-12
SLIDE 12

SWIF(r) classification with multiple modes of selection

Statistic 1 Statistic 3 Statistic 2 Statistic 4

slide-13
SLIDE 13

100

Neutral Sweep Balanced

SWIF(r) Classification True Classification

SWIF(r) can usefully express ambiguity

Probability of class “Sweep” 0.472015870602 0.524628895006 0.661572464304 0.62060257772 0.577741418065 0.58133184056 0.509290518833 0.537009355657 0.599232903772 0.773543482346 ... SWIF(r) Classification: Highest Probability

Problem 1: Distinguishing multiple modes of selection

slide-14
SLIDE 14

Finding detection limits with similar selection scenarios

Heterozygote advantage/

  • verdominance

Heterozygote fitness > Homozygote fitness

2 alleles present Stable at 50%

Negative frequency dependent selection

Fitness adjusted in each generation depending on allele frequency

slide-15
SLIDE 15

SWIF(r) can identify ambiguity between similar modes

  • f selection

Statistic 1 Statistic 3 Statistic 2 Statistic 2

Neutral Overdominance (b01) Frequency Dependent (b02)

slide-16
SLIDE 16

SWIF(r) can identify ambiguity between similar modes of selection

100

SWIF(r) Classification True Classification

SWIF(r) Classification: Highest Probability

Neutral Overdominance Frequency- Dependent Neutral O FD

Problem 2: Identifying multiple types of balancing selection Problem 3: Variable selection parameters and detection limits

slide-17
SLIDE 17

Addressing core problems in balancing selection

Problem 1: Distinguishing multiple modes of selection

  • With SWIF(r) we can compare multiple modes, even if some data is missing, and get

a probability of each mode

Problem 2: Identifying multiple types of balancing selection

  • With SWIF(r) we can measure different types of balancing selection and determine

how similar or distinct they are

Problem 3: Variable selection parameters and detection limits

  • We can use SWIF(r) to find detection limits
slide-18
SLIDE 18

Applications and Future Directions

  • Understanding ambiguity:
  • Allows us to asses claims made in the literature more accurately
  • Encourages targeted development of new methods
  • Accurately classifying sites:
  • Identifies targets for experimentation, modification or gene therapy
slide-19
SLIDE 19

Acknowledgements

Molecular Biology, Cell Biology and Biochemistry Graduate Program Committee Members: Mark Johnson David Rand Daniel Weinreich Sohini Ramachandran Lauren Alpert Sudgen Katherine Brunson Michael Turchin Wei Cheng Priyanka Nakka Sahar Shahamatdar Sam Smith