part II codon substitution models and the analysis of natural - PDF document

2015-07-21 part II codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University part II model based inference 1. 3 inference tasks 2. example gene analysis ( & experimental validation ) 3. MLE instabilities 4. false biological conclusions 5. example evolutionary survey ( & experimental validation ) 6. closing thoughts 1

2015-07-21 1. three tasks model based inference 3 analytical tasks Task 1 . Parameter estimation (e.g., ω ) Task 2 . Hypothesis testing Task 3 . Make predictions (e.g., sites having ω > 1 ) 1. analytical task-1 task 1: parameter estimation t, κ , ω = unknown constants estimated by ML π ’s = empirical [GY: F3 × 4 or F61 in Lab] use a numerical hill-climbing algorithm to maximize the likelihood function 2

2015-07-21 1. analytical task-1 parameter estimation Parameters : t and ω Gene : acetylcholine α receptor human mouse common ancestor lnL = -2399 1. three tasks How do we know that the estimate is significant? Task 1. Parameter estimation (e.g., ω ) ✔ Task 2. Hypothesis testing LRT Task 3. Prediction / Site identification 3

2015-07-21 1. analytical task-2 LRT No. 1: Does selection pressure vary among sites? H 0 : uniform selective pressure among sites (M0) H 1 : variable selective pressure among sites (M3) Compare 2 Δ l = 2( l 1 - l 0 ) with a χ 2 distribution Model 3 Model 0 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 ω ˆ ω ω ˆ ω ˆ ˆ = 0.65 = 0.01 = 0.90 = 5.55 1. analytical task-2 LRT No. 2: Have some sites evolved under positive selection? H 0 : variable selective pressure but NO positive selection (M1) H 1 : variable selective pressure with positive selection (M2) Compare 2 Δ l = 2( l 1 - l 0 ) with a χ 2 distribution Model 1a Model 2a 0.7 1 0.9 0.6 0.8 0.5 0.7 0.6 0.4 0.5 0.3 0.4 0.2 0.3 0.2 0.1 0.1 0 0 ω ˆ ( ω = 1) = 0.5 ω ω ˆ ˆ = 0.5 ( ω = 1) = 3.25 4

2015-07-21 1. analytical task-2 the LRT does not follow the χ 2 distribution simulated 0.2 simulated χ 2 4 0.15 frequency 0.1 0.05 0 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 2 Δ ℓ Data from: Anisimova, Bielawski, and Yang (2001) MBE 18: 1585-1592 1. analytical task-2 the LRT is conservative Number of cases out of 100 for which the null hypothesis was rejected at the α = 1% (5%) significance levels Simulation parameters Type I error Exp. Simulation LRT at α = 1% (5%) Taxa κ ω t N = 100 N = 500 A…. M0 M0 & M3 6 2 0.40 0.11 0 (0) 0 (0) 1.1 0 (0) 0 (0) 11 0 (0) 0 (0) B…. M0 M0 & M3 17 2 0.40 2.11 0 (0) 0 ( 1 ) 8.44 0 (0) 0 ( 1 ) 16.88 0 ( 1 ) 0 (0) C… M0 M0 & M3 5 5 0.25 0.91 0 (0) 0 (0) 9.1 0 (0) 0 ( 1 ) 18.2 0 ( 1 ) 2 ( 3 ) D… M7 M7 & M8 6 2 p = 0.41 0.11 N/A 0 (0) q = 1.10 1.1 N/A 1 ( 5 ) 11 N/A 1 ( 4 ) NOTE: Here t denotes total tree length (sum of all branch lengths in the tree ) Data from: Anisimova, Bielawski, and Yang (2001) MBE 18: 1585-1592 5

2015-07-21 1. analytical task-2 the LRT can be powerful Power of the LRT: Number of replicates out of 100 in which positive selection was 5% ( P +s, 0.05 , in parentheses) significance levels indicated by parameter estimates ( P + ), or detected by the LRT at the 1% ( P +s, 0.01 ) and Simulation parameters P + P +s, 0.01 (0.05) Simulation LRT Taxa ω distribution t L C = 100 L C = 500 L C = 100 L C = 500 κ M3 M0 & M3 17 2 ω 0 = 0.018, p 0 = 0.386 0.38 61 80 10 (17) 66 (72) ω 1 = 0.304, p 1 = 0.535 ω 2 = 1.691, p 2 = 0.079 2.11 93 100 91 (92) 100 (100) 8.44 99 100 99 (99) 100 (100) 16.88 99 99 99 (99) 99 (99) 105.5 31 58 31 (31) 58 (58) NOTE: Here t denotes total tree length (sum of all branch lengths in the tree ) Data from: Anisimova, Bielawski, and Yang (2001) MBE 18: 1585-1592 1. three tasks How do we identify the selected sites ? Task 1. Parameter estimation (e.g., ω ) ✔ Task 2. Hypothesis testing ✔ Task 3. Prediction / Site identification Bayes’ rule 6

2015-07-21 1. analytical task-3 Which sites have ω > 1 ? 1 0.9 0.8 model: 0.7 0.6 9% have ω > 1 0.5 0.4 0.3 0.2 0.1 0 GTG CTG TCT CCT GCC GAC AAG ACC AAC GTC AAG GCC GCC TGG GGC AAG GTT GGC GCG CAC Bayes’ rule: ... ... ... G.C ... ... ... T.. ..T ... ... ... ... ... ... ... ... ... .GC A.. site 4, 12 & 13 ... ... ... ..C ..T ... ... ... ... A.. ... A.T ... ... .AA ... A.C ... AGC ... ... ..C ... G.A .AT ... ..A ... ... A.. ... AA. TG. ... ..G ... A.. ..T .GC ..T ... ..C ..G GA. ..T ... ... ..T C.. ..G ..A ... AT. ... ..T ... ..G ..A .GC ... structure: sites are in contact 1. analytical task-3 Bayes’ rule: yet another (silly) example of Suppose that a population consists of 60% males and 40% females, and a disease occurs at the rate 1% in males and 0.1% in females. Q 1 : What is the probability that any individual carries the disease? A 1 : 0.6 × 0.01 + 0.4 × 0.001 = 0.0064 P (D) = P (M) P (D|M) + P (F) P (D|F) See Yang and Bielawski (2000) TREE 15:496-503 for a detailed presentation of this example 7

2015-07-21 1. analytical task-3 Bayes’ rule: yet another (silly) example of Q 2 : Given that an individual carries the disease, what is the probability that it is a male? A 2 : 0.6 × 0.01/0.0064 = 0.94 P (M) P (D|M) P (M|D) = P (D) See Yang and Bielawski (2000) TREE 15:496-503 for a detailed presentation of this example From Paul Lewis’ lecture …. Bayes’ rule in statistics Prior probability of hypothesis θ Likelihood of hypothesis θ Pr( D | θ ) Pr( θ ) Pr( θ | D ) = � θ Pr( D | θ ) Pr( θ ) Marginal probability Posterior probability of the data (marginalizing of hypothesis θ over hypotheses) 1 8

2015-07-21 1. analytical task-3 identifying selected sites under a codon model K − 1 ∑ p ( ω i ) P ( x h | ω i ) P ( x h ) = i = 0 Likelihood Total Prior probability = 0.03 = 0.40 = 14.1 ω 2 ω 0 ω 1 p 1 p 2 p 0 = 0.85 = 0.10 = 0.05 1. analytical task-3 Bayes’ rule for identifying selected sites Site class 0: ω 0 = .03, 85% of codon sites Site class 1: ω 1 = .40, 10% of codon sites ? ? Site class 2: ω 2 = 14, 05% of codon sites Likelihood of hypothesis ( ω 2 ) Prior probability of hypothesis ( ω 2 ) ( ) P ( ω 2 | x h ) = P ( ω 2 ) P x h | ω 2 K − 1 ∑ ( ) P ( ω i ) P x h | ω i i = 0 Posterior probability of Marginal probability (Total hypothesis ( ω 2 ) probability) of the data 9

2015-07-21 1. analytical task-3 Bayes’ rule for identifying selected sites ,-./012%34514/67%837/56% 956:38430%837/56% (" !#'" >5:;38/58%.85?-?/1/;2% !#&" !#%" !#$" !" (" &" ((" (&" $(" $&" )(" )&" %(" %&" *(" *&" &(" &&" +(" +&" '(" '&" ,(" ,&" (!(" (!&" (((" ((&" ($(" ($&" ()(" ()&" (%(" (%&" (*(" (*&" (&(" (&&" (+(" (+&" ('(" ('&" (,(" (,&" $!(" Site class 0: ω 0 = .03 (strong purifying selection) Site class 1: ω 1 = .40 (weak purifying selection) Site class 2: ω 2 = 14 (positive selection) NOTE : The posterior probability should NOT be interpreted as a “ P -value”; it can be interpreted as a measure of relative support, although there is rarely any attempt at “calibration” 1. analytical task-3 Bayes’ rule for identifying selected sites Empirical Bayes Naive Empirical Bayes Bayes Empirical Bayes • (NEB) • (BEB) • Nielsen and Yang, 1998 • Yang et al., 2005 • assumes no MLE errors • accommodate MLE errors 10

2015-07-21 1. analytical task-3 Bayes Empirical Bayes (BEB) 1. assign a prior to ω distribution parameters 2. fix branch lengths to MLEs 3. integrate over uncertainty 4. BEB is faster than “Full Bayes” (FB) False classification rates Small datasets: FB/BEB < NEB Large datasets: FB/BEB ≈ NEB* * exception: extreme parameter estimates See: Yang Z, Wong WS, Nielsen R. 2005. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 22(4):1107-1118. model based inference progress … Task 1. Parameter estimation (e.g., ω ) Task 2. Hypothesis testing Task 3. Prediction / Site identification let’s put this into practice … 11

2015-07-21 model based inference progress 1. 3 inference tasks ✔ 2. example gene analysis ( & experimental validation ) 3. MLE instabilities 4. false biological conclusions 5. example evolutionary survey ( & experimental validation ) 6. closing thoughts 2. example analysis colour diversity of coral pigments Red/blue colour morphs of the great star coal Montastraea cavernosa o Is color diversity tuned by natural selection? o Is there a relationship between colour and endosymbiotic algae? See Field et al. 2006 J. Mol. Evol. 62(3):332-9 for details. 12

part II codon substitution models and the analysis of natural - PDF document

2015-07-21 part II codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University part II model based inference 1. 3

Conformal Field Theories, Conformal Bootstrap and Applications Konstantinos Deligiannis December

Part 0: Git-ing Started Part 1: Essential Skills Part 2: Introduction to Git Part 3: Advanced

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

FY17 CONSOLIDATED RESULTS UNIPOL AND UNIPOLSAI Bologna, 23 March 2018 2 PART 1 PART 2 PART 3

Answers To Common Questions (Part-2) ? Part 1 : Christian walk, Marriage Part 2 : Lifestyle

Cardiff Schools Facilities Presentation Part 1: History of Cardiff Schools Part 2: Todays

Wind Part 1: How do we measure it? Part 2: What exactly is wind? Part 3: Where is it? PART 1:

Introduction Part One: Initial Problem Part Two: Progress Over Six Months Part

SANLAM STAFF UMBRELLA PROVIDENT AND PENSION FUND AND RELATED GROUP INSURANCE agenda PART A -

FY17 Grants Program Presented by the DCCAH Grants Department Agenda: Part 1: The Challenge

Part 2 2017- 2018 Supts Proposed Budget Part 3 Call for Advocacy 2 Part 1 Budget Context

Commercial Dog Breeders Part 8: Housing (Part 2) Introduction Housing Part 1 Housing Part 2

Answers To Common Questions (Part-1) ? Part 1 : Christian walk, Marriage Part 2 : Lifestyle,

DMR - Part 2 of 3 May 2, 2020 Part 1 - Mike Moore KC2NM Part 2 - Rich Hoffarth K2AXP Part 3 -

Fusion - Part 3 of 3 May 16, 2020 Part 1 - Mike Moore KC2NM Part 2 - Rich Hoffarth K2AXP Part 3

The heartful PRESENTER Influence minds and win hearts Contents 04 PART 1 INTRODUCTION 06

General Pearls Immunocompromised patients with Management of post transplant infections

On the Formal Generation of Process Redesigns Mariska Netjes Hajo A. Reijers Wil M.P. van der

HOST Hardware Trojans I ECE 525 Hardware Trojans (HT) What is a hardware Trojan? A deliberate

bnlearn Learning Bayesian Networks 10 Years Later Marco Scutari scutari@stats.ox.ac.uk

Multilingual and Crosslingual Information Retrieval and Access Feiyu Xu DFKI, LT-Lab Germany

Case Discussion Late summer: Right leg started to occasionally give out Michael Wilson, MD,

Systems Biology 1-650-479-3207 Meeting access number: 736 250 270 Shannon Hughes, Ph.D. Audio

Closing the Loop in Research/Care-Delivery Partnerships: Communication and Dissemination in

Sambuz

Useful Links

Newsletter

Mail Us