Diffusion Models in Population Genetics Laura Kubatko - PowerPoint PPT Presentation

Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 1 / 24

Population Genetics Population genetics: Study of genetic variation within a population Assume that a gene has two alleles, call them A and a Population is composed of N individuals who have two copies of each gene – so possible genotypes are: AA Aa aa The population evolves over time We are interested in the composition of the population at generation t Need a model for how a generation is derived from the previous generation Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 2 / 24

Wright-Fisher Model Assumptions: ◮ Population of 2 N gene copies ◮ Discrete, non-overlapping generations of equal size ◮ Parents of next generation of 2 N genes are picked randomly with replacement from preceding generation (genetic differences have no fitness consequences) ◮ Probability of a specific parent for a gene in the next generation is 1 2 N Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 3 / 24

Wright-Fisher Model Source: Popvizard, a python program to simulate evolution under the WF and other models, written by Peter Beerli http://people.sc.fsu.edu/ pbeerli/popvizard.tar.gz Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 4 / 24

The Wright-Fisher Model View Wright-Fisher model as a discrete-time Markov process Let Y t = number of alleles of type A in population at generation t , 0 ≤ Y t ≤ 2 N for t = 0 , 1 , . . . Define p ij = P ( Y t +1 = j | Y t = i ). Then, �� 2 N ( i 2 N ) j ( 2 N − i 2 N ) 2 N − j , � j = 0 , 1 , . . . , 2 N j p ij = 0 , otherwise States 0 and 2 N are absorbing states – we can never leave these states Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 5 / 24

The Wright-Fisher Model Note that: ◮ E ( Y t +1 | Y t = i ) = 2 N ( i 2 N ) = i ◮ Var ( Y t +1 | Y t = i ) = 2 N ( i i 2 N )(1 − 2 N ) ◮ So the expected number of A alleles remains the same, but the actual number may vary between 0 and 2 N Classical approach: Look at the limit as the population size N → ∞ Kingman’s Coalescent Process ◮ Widely used in population genetics and phylogenetics ◮ Difficult to extend to handle features of the evolutionary process, such as selection Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 6 / 24

Wright-Fisher Model as a Diffusion Process Define a diffusion process { X t } t ≥ 0 as a continuous-time Markov process with approximately Guassian increments over small time intervals and for which the following three conditions hold for small δ t and X t = x : ◮ E ( X t + δ t − X t | X t = x ) = µ ( t , x ) δ t + o ( δ t ) ◮ E (( X t + δ t − X t ) 2 | X t = x ) = σ 2 ( t , x ) δ t + o ( δ t ) ◮ E (( X t + δ t − X t ) k | X t = x ) = 0 for k > 2 From Radu’s slides, we had: dX t = S ( X t ) dt + σ ( X t ) dW t , where S ( X t ) is the drift coefficient and σ ( X t ) is the diffusion coefficient. For standard Brownian Motion, µ ( t , x ) = 0 and σ 2 ( t , x ) = 1. Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 7 / 24

Wright-Fisher Model as a Diffusion Process Let Y t be the number of A alleles in the population at generation t Let X t = proportion of A alleles in population at generation t ; X t = Y t 2 N Let X t represent the continuous-time process (eventually measure time in units of 2 N generations, as before) Define ∆ Y t = Y t +1 − Y t and ∆ X t = X t +1 − X t Then E ( Y t +1 | X t = x ) = 2 Nx E (∆ Y t | X t = x ) = 0 E [(∆ Y t ) 2 | X t = x )] = 2 Nx (1 − x ) E (∆ X t | X t = x ) = 0 = µ ( t , x ) = µ ( x ) E ((∆ X t ) 2 | X t = x ) = x (1 − x ) = σ 2 ( t , x ) = σ 2 ( x ) 2 N Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 8 / 24

Wright-Fisher Model as a Diffusion Process Now re-define ∆ Y t = Y t +∆ t − Y t and ∆ X t = X t +∆ t − X t , 2 N and let N → ∞ , so that E ((∆ X t ) 2 | X t ) = X t (1 − X t )∆ t 1 where ∆ t = The corresponding SDE is � d X t = X t (1 − X t ) d W t , X t ∈ [0 , 1] where W t is standard Brownian Motion (See Pardoux, 2009, for a rigorous proof) Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 9 / 24

The Wright-Fisher Model with Selection Model for selection: ◮ Suppose that allele A is superior to allele a so that 2 Nx (1 + s ) p x = 2 Nx (1 + s ) + (2 N − 2 Nx ) ◮ As before, let N → ∞ and define s = β/ (2 N ). ◮ E (∆ X t | X t ) ≈ ( β X t (1 − X t ))∆ t ◮ E ((∆ X t ) 2 | X t ) ≈ X t (1 − X t )∆ t The corresponding SDE is � d X t = β X t (1 − X t ) dt + X t (1 − X t ) d W t , X t ∈ [0 , 1] Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 10 / 24

The Wright-Fisher Diffusion with Selection: Intuition Use the Euler Method (see Radu’s lectures) to simulate from the WF Diffusion model X ( t i +1 ) = X ( t i ) + β X ( t i )(1 − X ( t i ))( t i +1 − t i ) + √ t i +1 − t i � X ( t i )(1 − X ( t i )) Z where Z ∼ N (0 , 1) Python code to simulate this: ◮ T = 0 . 05 ◮ Define 0 = t 0 < t 1 < · · · < t N − 1 < t N = T , equally spaced ◮ Vary β Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 11 / 24

The Wright-Fisher Diffusion with Selection: Intuition β = 0, varying N Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 12 / 24

The Wright-Fisher Diffusion with Selection: Intuition N = 1000, vary β Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 13 / 24

Application: Inferring Selection From Genome-scale Data Diffusion models are currently becoming more widely used in analyzing genome-scale data. Example: Williamson, S. H. et al. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. PNAS: 120(22): 7882-7887. Data: NIEHS Environmental Genome Project web site (http://egp.gs.washington. edu) ◮ Sequenced 301 genes associated with variation in response to environmental exposure ◮ 90 individuals: 24 African Americans, 24 Asian Americans, 24 European Americans, 12 Mexican Americans, and 6 Native Americans Goal: Detect selection in different types of mutations; distinguish selection from other demographic factors, such as population size change Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 14 / 24

Application: Inferring Selection From Genome-scale Data Data are recorded as SNPs – bases in the DNA sequence at which there is variation across individuals Example data: Taxon Sequence (A) Human GCCGATGCCGATGCCGAA (B) Chimp GCCGTTGCCGTTGCCGTT (C ) Gorilla GCGGAAGCGGAAGCGGAA this would be Taxon Sequence (A) Human CATCATCAA (B) Chimp CTTCTTCTT (C ) Gorilla GAAGAAGAA Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 15 / 24

Application: Inferring Selection From Genome-scale Data Example SNP data is Taxon Sequence (A) Human CATCATCAA (B) Chimp CTTCTTCTT (C ) Gorilla GAAGAAGAA Record this as the site frequency spectrum (SFS), denoted by the vector u , where entry u i = number of SNP sites with i copies of the derived allele For the example, we have (assuming that the ancestral state is that found in Gorilla), u = (4 , 5) If we let Human be ancestral, we’d have u = (9 , 0) Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 16 / 24

Application: Inferring Selection From Genome-scale Data Idea of analysis: ◮ Write the likelihood function and obtain MLEs of the parameters of interest ◮ Likelihood function for a sample of K SNPs: K � L ( β ) = Pr ( i k , n k | β ) k =1 i k where Pr ( i k , n k ) is the probability of that SNP k is at frequency n k Pr ( i k , n k ) comes from the diffusion model – how? ◮ Williamson et al. (2005): Use numerical methods to approximate the diffusion ◮ Today: use a naive sampling method based on the Euler approximation ◮ Ongoing work (with Radu Herbei and Jeff Gory): use exact sampling from the WF diffusion to implement a Bayesian version of the model Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 17 / 24

Application: Inferring Selection From Genome-scale Data Naive method: Use the Euler method to simulate a path from the WF diffusion with selection 1 parameter β , and record the final allele frequency, q . For the q from step 1, simulate the data for a SNP by drawing 2 Y ∼ Bin (2 n , q ). n is the number of “people” in the sample. Repeat steps 1-2 a large number of times, say M (the larger, the better), to 3 generate a set of observed Y values, Y 1 , Y 2 , · · · , Y M . Form the estimates ˆ 1 � M P i ( β ) = m =1 I ( Y m = i ) 4 M The approximate likelihood is then K ˆ � ˆ L ( β ) = P i k ( β ) k =1 Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 18 / 24

Application: Inferring Selection From Genome-scale Data Does it work? Simulate data for 15 people and 100 SNPs with various values of β and M β = 0 . 2 Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 19 / 24

Application: Inferring Selection From Genome-scale Data Does it work? Simulate data for 15 people and 100 SNPs with various values of β and M β = 2 . 0 Laura Kubatko Diffusion Models in Population Genetics July 10, 2015 20 / 24

Diffusion Models in Population Genetics Laura Kubatko - PowerPoint PPT Presentation

Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura Kubatko Diffusion Models in

c + = Diffusion Diffusion 2 6.82 10 -6 v c D c 10 -1 Equation

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

NON-SYMMETRIC FRACTIONAL DIFFUSION NON-SYMMETRIC FRACTIONAL DIFFUSION AS A SPECIAL CASE OF AS A

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

Modelling Membrane Potentials by Diffusion Leaky Integrate-and-Fire Models Patrick Jahn

A Review of Useful Elementary Population Genetics David Duffy Queensland Institute of Medical

Degenerate Diffusions in Genetics In Memory of Gennadi Population Genetics Henkin Charles L.

Information Diffusion on Social Networks SMART Summer School 2017 Sylvain Lamprier LIP6 - UPMC

Frequency Spectra and Inference in Population Genetics Although coalescent models have come to

A Bloch Torrey Equation for Diffusion in a Deforming Media Damien Rohmer November 21, 2006 A

Inhomogeneous materials can become homogeneous by diffusion. For an active diffusion to occur, the

31/10/2019 Diffusion General Note. Atomic diffusion is a process whereby the random

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

From normal to anomalous deterministic diffusion Part 1: Normal deterministic diffusion Rainer

Administrative Leadership Meeting Randy Woodson Chancellor Tuesday, July 10, 2018 Upcoming

(STEM) Landscape: Trends and Models T HE P ARTHENON G ROUP February 5, 2013 Objectives for Today

Removal-Fill Regulatory Process & Farmed Wetlands Wetland Regulation Work Group House

Cannabis Production and Processing Operations MANAGING AIR EMISSIONS Julie Saxton PhD P.Chem.

Simulating Population Genetics on the XT5 E. A. Duenez-Guzman, A. D. Vose, M. D. Vose, S.

ISAB and ISRP ISAB Ex Officio Contributors & Coordinator J Richard Alldredge, PhD Michael

11/15/2016 SAI WHM SAI BUM SWO 60 60 50 60 50 40 50 40 30 40 30 20 30 20 10 10

Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial

Sambuz

Useful Links

Newsletter

Mail Us

Diffusion Models in Population Genetics Laura Kubatko - PowerPoint PPT Presentation

Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura Kubatko Diffusion Models in

c + = Diffusion Diffusion 2 6.82 10 -6 v c D c 10 -1 Equation

Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2016 Human Genetics

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

NON-SYMMETRIC FRACTIONAL DIFFUSION NON-SYMMETRIC FRACTIONAL DIFFUSION AS A SPECIAL CASE OF AS A

Carl Spickett Academic Laboratory of Medical Genetics Academic Laboratory of Medical Genetics

Modelling Membrane Potentials by Diffusion Leaky Integrate-and-Fire Models Patrick Jahn

A Review of Useful Elementary Population Genetics David Duffy Queensland Institute of Medical

Degenerate Diffusions in Genetics In Memory of Gennadi Population Genetics Henkin Charles L.

Information Diffusion on Social Networks SMART Summer School 2017 Sylvain Lamprier LIP6 - UPMC

Frequency Spectra and Inference in Population Genetics Although coalescent models have come to

A Bloch Torrey Equation for Diffusion in a Deforming Media Damien Rohmer November 21, 2006 A

Inhomogeneous materials can become homogeneous by diffusion. For an active diffusion to occur, the

31/10/2019 Diffusion General Note. Atomic diffusion is a process whereby the random

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

From normal to anomalous deterministic diffusion Part 1: Normal deterministic diffusion Rainer

Administrative Leadership Meeting Randy Woodson Chancellor Tuesday, July 10, 2018 Upcoming

(STEM) Landscape: Trends and Models T HE P ARTHENON G ROUP February 5, 2013 Objectives for Today

Removal-Fill Regulatory Process &amp; Farmed Wetlands Wetland Regulation Work Group House

Cannabis Production and Processing Operations MANAGING AIR EMISSIONS Julie Saxton PhD P.Chem.

Simulating Population Genetics on the XT5 E. A. Duenez-Guzman, A. D. Vose, M. D. Vose, S.

ISAB and ISRP ISAB Ex Officio Contributors &amp; Coordinator J Richard Alldredge, PhD Michael

11/15/2016 SAI WHM SAI BUM SWO 60 60 50 60 50 40 50 40 30 40 30 20 30 20 10 10

Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial

Sambuz

Useful Links

Newsletter

Mail Us

Removal-Fill Regulatory Process & Farmed Wetlands Wetland Regulation Work Group House

ISAB and ISRP ISAB Ex Officio Contributors & Coordinator J Richard Alldredge, PhD Michael