2017-07-29 codon substitution models and the analysis of natural - PDF document

2017-‑07-‑29 ¡ codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University The goals and the plan • neutral theory dN/dS • v ¡ • mechanistic process phenomenological outcomes • • MutSel framework part 1: introduction freq dependent selection • v ¡ • episodic selection part 2: mechanistic process shifting balance • part 3: data analysis part 4: phenomenological load types of models • v ¡ 3 analysis tasks • • analysis of deviance v ¡ biological inferences • 1 ¡

2017-‑07-‑29 ¡ population time-scale macroevolutioanry time-scale part 1: introduction evolutionary rate depends on intensity of selection selectively constrained = slower than neutral (drift alone) adaptive divergence = faster than neutral (drift alone) conserved sites: slower than neutral ? GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC ... ... ... G.C ... ... ... T.. ..T ... ... ... ... ... ... ... ... ... .GC A.. ... ... ... ..C ..T ... ... ... ... A.. ... A.T ... ... .AA ... A.C ... AGC ... ... ..C ... G.A .AT ... ..A ... ... A.. ... AA. TG. ... ..G ... A.. ..T .GC ..T ... ..C ..G GA. ..T ... ... ..T C.. ..G ..A ... AT. ... ..T ... ..G ..A .GC ... fast sites: neutral ? or faster than neutral ? What is the neutral expectation? 2 ¡

2017-‑07-‑29 ¡ neutral theory of molecular evolution (Kimura 1968) the number of new 2 N µ v ¡ mutations arising in a diploid population the fixation 12 N probability of a new v ¡ mutant by drift The substitution k = 2 N µ × 1 2 N v ¡ (fixation) rate , k k = µ the elegant simplicity of neutral theory : genetic code determines impact of a mutation Kimura (1968) d S : number of synonymous substitutions per synonymous site ( K S ) d N : number of nonsynonymous substitutions per nonsynonymous site ( K A ) polypeptide ω : the ratio d N / d S ; it measures selection at the protein level http://www.langara.bc.ca/biology/mario/Assets/Geneticode.jpg ¡ The genetic code determines how random changes to the gene brought about by the process of mutation will impact the function of the encoded protein. 3 ¡

2017-‑07-‑29 ¡ an index of selection pressure rate ratio mode example dN/dS < 1 purifying histones (negative) selection dN/dS =1 Neutral pseudogenes Evolution Diversifying MHC, dN/dS > 1 (positive) Lysin selection an index of selection pressure Why use d N and d S ? (Why not use raw counts?) example of counts: 300 codon gene from a pair of species 5 synonymous differences 5 nonsynonymous differences 5/5 = 1 why don’t we conclude that rates are equal (i.e., neutral evolution ) ? 4 ¡

2017-‑07-‑29 ¡ the genetic code & mutational opportunities Relative proportion of different types of mutations in hypothetical protein coding sequence. Expected number of changes (proportion) Type All 3 Positions 1 st positions 2 nd positions 3 rd positions Total mutations 549 (100) 183 (100) 183 (100) 183 (100) Synonymous 134 (25) 8 (4) 0 (0) 126 (69) Nonsyonymous 392 (71) 166 (91) 176 (96) 57 (27) nonsense 23 (4) 9 (5) 7 (4) 7 (4) Modified from Li and Graur (1991). Note that we assume a hypothetical model where all codons are used equally and that all types of point mutations are equally likely. Why do we use d N and d S ? ¡ same example, but using d N and d S : Synonymous sites = 25.5% S = 300 × 3 × 25.5% = 229.5 Nonsynonymous sites = 74.5% N = 300 × 3 × 74.5% = 670.5 So, d S = 5/229.5 = 0.0218 d N = 5/670.5 = 0.0075 d N / d S ( ω ) = 0.34, purifying selection !!! 5 ¡

2017-‑07-‑29 ¡ an index of selection pressure acting on the protein conserved sites: dN/dS < 1 GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC ... ... ... G.C ... ... ... T.. ..T ... ... ... ... ... ... ... ... ... .GC A.. ... ... ... ..C ..T ... ... ... ... A.. ... A.T ... ... .AA ... A.C ... AGC ... ... ..C ... G.A .AT ... ..A ... ... A.. ... AA. TG. ... ..G ... A.. ..T .GC ..T ... ..C ..G GA. ..T ... ... ..T C.. ..G ..A ... AT. ... ..T ... ..G ..A .GC ... fast sites: dN/dS > 1 conclusion: dN differs from dS due to the effect of selection on the protein. mutational opportunity vs. physical site Relative proportion of different types of mutations in hypothetical protein coding sequence. Expected number of changes (proportion) Type All 3 Positions 1 st positions 2 nd positions 3 rd positions Total mutations 549 (100) 183 (100) 183 (100) 183 (100) Synonymous 134 (25) 8 (4) 0 (0) 126 (69) Nonsyonymous 392 (71) 166 (91) 176 (96) 57 (27) nonsense 23 (4) 9 (5) 7 (4) 7 (4) Note that by framing the counting of sites in this way we are using a “mutational opportunity” definition of the sites. Thus, a synonymous or non-synonymous site is not considered a physical entity! Note that we assume a hypothetical model where all codons are used equally and that all types of point mutations are equally likely. 6 ¡

2017-‑07-‑29 ¡ real data have biases ( Drosophila GstD1 gene) transitions vs. transversions : A G ts /tv = 2.71 C T preferred vs. un-preferred codons: partial codon usage table for the GstD gene of Drosophila ------------------------------------------------------------------------------ Phe F TTT 0 | Ser S TCT 0 | Tyr Y TAT 1 | Cys C TGT 0 TTC 27 | TCC 15 | TAC 22 | TGC 6 Leu L TTA 0 | TCA 0 | *** * TAA 0 | *** * TGA 0 TTG 1 | TCG 1 | TAG 0 | Trp W TGG 8 ------------------------------------------------------------------------------ Leu L CTT 2 | Pro P CCT 1 | His H CAT 0 | Arg R CGT 1 CTC 2 | CCC 15 | CAC 4 | CGC 7 CTA 0 | CCA 3 | Gln Q CAA 0 | CGA 0 CTG 29 | CCG 1 | CAG 14 | CGG 0 ------------------------------------------------------------------------------ an index of selection pressure acting on the protein ω = dN Don’t worry: we will improve upon the counting method later in dS this lecture via likelihood! correcting dS and dN for underlying mutational process of the DNA makes them sensitive to assumptions about the process of evolution ! 7 ¡

2017-‑07-‑29 ¡ reconciling evolutionary time scales population time-scale macroevolutioanry time-scale mutation: μ ij drift: N selection: s ij population time-scale macroevolutioanry time-scale h dS i h dN i 8 ¡

2017-‑07-‑29 ¡ mechanistic models population time-scale macroevolutioanry time-scale phenomenological models mechanistic μ ¡ population time-scale models k ¡ macroevolutioanry time-scale “MutSel models” � ⎧ • Wright-Fisher population µ ij N × 1 μ ij ¡ ⎪ N = µ ij if neutral ⎪ • drift: N Pr = ⎨ ⎪ 2 s ij µ ij N × • mutation: μ if selected ⎪ − 2 Ns ij 1 − e ⎩ • selection: s ij s ij = Δ f ij • s ij vary among sites AND amino acids Halpern ¡and ¡Bruno ¡(1998) ¡ • expected dN h /dS h 9 ¡

2017-‑07-‑29 ¡ fixation probability with selection population genetics at a single codon site ( h ) f h = f 1 , … , f 61 fitness coefficients h = f j h − f i h s ij selection coefficients h 2 s ij h ) = fixation probability (Kimura, 1962) Pr( s ij − 2 Ns ij h 1 − e fixation probability with selection MutSel: selection favours amino acids with higher fitness (if N is large enough) Δ f Ile → Leu h 1. A TA ( Ile ) ! T TA ( Leu ) : !!!!!!!! !!!!!!!!! ( conservative ) Δ f Ile → Lys h 2. A T A ( Ile ) ! A A A ( Lys ): ( radical ) realism : fitness expected to differ among sites and amino acids according to protein function the cost of realism : too complex to fit such a model to real data (but simplified versions will allow new ways of data analysis) 10 ¡

2017-‑07-‑29 ¡ population time-scale macroevolutioanry time-scale phenomenological models population time-scale phenomenological macroevolutioanry models time-scale “omega models” � • phenomenological ⎧ parameters 0 if i and j differ by > 1 ⎪ ⎪ π j for synonymous tv. • ts/tv ratio: κ ⎪ q ij = κπ j ⎨ for synonymous ts. • codon frequencies: π j ⎪ ωπ j for non-synonymous tv. ⎪ • ω = dN/dS ⎪ ωκπ j for non-synonymous ts. ⎩ • parameter estimation via ML Goldman ¡and ¡Yang ¡(1994) ¡ Muse ¡and ¡Gaut ¡(1994) ¡ • stationary process 11 ¡

2017-07-29 codon substitution models and the analysis of natural - PDF document

2017-07-29 codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University The goals and the plan neutral

2015-07-20 codon substitution models and the analysis of natural selection pressure Joseph P.

part II codon substitution models and the analysis of natural selection pressure Joseph P.

codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski

types of codon models Q ij = j for synonymous ts. j for non-synonymous

types of codon models Q ij = j for synonymous ts. j for non-synonymous

Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's

Retrospective Price Indices and Substitution Bias Retrospective Price Indices and Substitution

Example 1 ln x x dx Example 1 ln x x dx We make the substitution: Example 1 ln x

More general naming A substitution model for Bindex Theory of Programming Languages Computer

Substitution Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 17 Section 7.2 ::

Introduction Plethystic substitution Substitution operation in the ring of power series in

Compression of Genetic Coding Sequences MohammadReza Ghodsi Genetic Code (Recap) The code

Compound Heterozygosity for Silent Cap +1570 (T>C) (HBB: c*96T>C), Codon 39 (C>T) ( HBB :

Veterinary Vaccinology Network Early Career Vaccinologists Journal Club 08-05-15 Andrew

Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an

Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS

MolecularBio January 28, 2020 1 Lecture 6: Molecular Biology Primer CBIO (CSCI) 4835/6835:

Algorithms in Bioinformatics: A f Practical Introduction Practical Introduction Peptide

DamClust: Assessment of multimodality : Assessment of multimodality DamClust ( has has damaver

Convenient synthesis of some novel amino acid coupled triazoles S. M. El Rayes Department of

HiddenMarkovModels September 25, 2018 1 Lecture 14: Hidden Markov Models CBIO (CSCI) 4835/6835:

Bioinformatics Sequence comparison 1 global pairwise alignment David Gilbert Bioinformatics

CSE182-L9 Protein domain analysis via HMMs Gene finding November 09 QUIZ! Question: Your

23 [3] Di Francesco F, Garnier J, and Munson PJ (1997) Protein topology recognition from

Sambuz

Useful Links

Newsletter

Mail Us

2017-07-29 codon substitution models and the analysis of natural - PDF document

2017-07-29 codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University The goals and the plan neutral

2015-07-20 codon substitution models and the analysis of natural selection pressure Joseph P.

part II codon substitution models and the analysis of natural selection pressure Joseph P.

codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski

types of codon models Q ij = j for synonymous ts. j for non-synonymous

types of codon models Q ij = j for synonymous ts. j for non-synonymous

Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's

Retrospective Price Indices and Substitution Bias Retrospective Price Indices and Substitution

Example 1 ln x x dx Example 1 ln x x dx We make the substitution: Example 1 ln x

More general naming A substitution model for Bindex Theory of Programming Languages Computer

Substitution Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 17 Section 7.2 ::

Introduction Plethystic substitution Substitution operation in the ring of power series in

Compression of Genetic Coding Sequences MohammadReza Ghodsi Genetic Code (Recap) The code

Compound Heterozygosity for Silent Cap +1570 (T&gt;C) (HBB: c*96T&gt;C), Codon 39 (C&gt;T) ( HBB :

Veterinary Vaccinology Network Early Career Vaccinologists Journal Club 08-05-15 Andrew

Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an

Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS

MolecularBio January 28, 2020 1 Lecture 6: Molecular Biology Primer CBIO (CSCI) 4835/6835:

Algorithms in Bioinformatics: A f Practical Introduction Practical Introduction Peptide

DamClust: Assessment of multimodality : Assessment of multimodality DamClust ( has has damaver

Convenient synthesis of some novel amino acid coupled triazoles S. M. El Rayes Department of

HiddenMarkovModels September 25, 2018 1 Lecture 14: Hidden Markov Models CBIO (CSCI) 4835/6835:

Bioinformatics Sequence comparison 1 global pairwise alignment David Gilbert Bioinformatics

CSE182-L9 Protein domain analysis via HMMs Gene finding November 09 QUIZ! Question: Your

23 [3] Di Francesco F, Garnier J, and Munson PJ (1997) Protein topology recognition from

Sambuz

Useful Links

Newsletter

Mail Us

Compound Heterozygosity for Silent Cap +1570 (T>C) (HBB: c*96T>C), Codon 39 (C>T) ( HBB :