Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University - PowerPoint PPT Presentation

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University 2 The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S 3 Assumptions Characters are mutually independent Following a speciation event, characters continue to evolve independently Phylogenetics-Likelihood - March 30, 2017

4 The likelihood of model M given data D , denoted by L(M|D), is p(D|M). For example, consider the following data D that result from tossing a coin 10 times: HTTTTHTTTT 5 Model M1: A fair coin (p(H)=p(T)=0.5) L(M1|D)=p(D|M1)=0.5 10 6 Model M2: A biased coin (p(H)=0.8,p(T)=0.2) L(M2|D)=p(D|M2)=0.8 2 0.2 8 Phylogenetics-Likelihood - March 30, 2017

7 Model M3: A biased coin (p(H)=0.1,p(T)=0.9) L(M3|D)=p(D|M3)=0.1 2 0.9 8 8 The problem of interest is to infer the model M from the (observed) data D. 9 The maximum likelihood estimate, or MLE, is: ˆ M ← argmax M p ( D | M ) Phylogenetics-Likelihood - March 30, 2017

10 D=HTTTTHTTTT M1: p(H)=p(T)=0.5 M2: p(H)=0.8, p(T)=0.2 M3: p(H)=0.1, p(T)=0.9 MLE (among the three models) is M3. 11 A more complex example: The model M is an HMM The data D is a sequence of observations Baum-Welch is an algorithm for obtaining the MLE M from the data D 12 The model parameters that we seek to learn can vary for the same data and model. For example, in the case of HMMs: The parameters are the states, the transition and emission probabilities (no parameter values in the model are known) The parameters are the transition and emission probabilities (the states are known) The parameters are the transition probabilities (the states and emission probabilities are known) Phylogenetics-Likelihood - March 30, 2017

13 Back to Phylogenetic Trees What are the data D? A multiple sequence alignment (or, a matrix of taxa/ characters) 14 Back to Phylogenetic Trees What is the (generative) model M? The tree topology The branch lengths The model of evolution (JC, ..) 15 Back to Phylogenetic Trees What is the (generative) model M? The tree topology, T The branch lengths, λ The model of evolution (JC, ..), Ε Phylogenetics-Likelihood - March 30, 2017

16 Back to Phylogenetic Trees The likelihood is p(D|T, λ , Ε ). The MLE is ( ˆ T, ˆ λ , ˆ E ) ← argmax ( T, λ ,E ) p ( D | T, λ , E ) 17 Back to Phylogenetic Trees In practice, the model of evolution is estimated from the data first, and in the phylogenetic inference it is assumed to be known. In this case, given D and E, the MLE is ( ˆ T, ˆ λ ) ← argmax ( T, λ ) p ( D | T, λ ) 18 Assumptions Characters are independent Markov process: probability of a node having a given label depends only on the label of the parent node and branch length between them t Phylogenetics-Likelihood - March 30, 2017

19 Maximum Likelihood Input: a matrix D of taxa-characters Output: tree T leaf-labeled by the set of taxa, and with branch lengths λ so as to maximize the likelihood P(D|T, λ ) 20 P(D|T, λ ) P ( D | T, λ ) = Q site j p ( D j | T, λ ) Q site j ( P = R p ( D j , R | T, λ )) ⇣P h i⌘ = Q p ( root ) · Q edge u → v p u → v ( t uv ) site j R 21 What is p i → j (t uv ) for a branch u ￫ v in the tree, where i and j are the states of the site at nodes u and v, respectively? Phylogenetics-Likelihood - March 30, 2017

22 For the Jukes-Cantor model with the parameter μ (the overall substitution rate), we have ⇢ 1 4 (1 + 3 e − tµ ) i = j p i → j ( t ) = 1 4 (1 � e − tµ ) i 6 = j 23 If branch lengths are measured in expected number of mutations per site, ν (for JC: ν =( μ / 4+ μ / 4+ μ / 4)t=(3/ 4) μ t) 4 (1 + 3 e − 4 ν / 3 ) ⇢ 1 i = j p i → j ( ν ) = 4 (1 � e − 4 ν / 3 ) 1 i 6 = j 24 The ML problem is NP-hard (that is, finding the MLE (T, λ ) is very hard computationally) Heuristics involve searching the tree space, while computing the likelihood of trees Computing the likelihood of a leaf-labeled tree T with branch lengths can be done efficiently using dynamic programming Phylogenetics-Likelihood - March 30, 2017

25 P(D|T, λ ) Let C j (x,v) = P (subtree whose root is v | v j =x) � 1 v j = x Initialization: leaf v and state x C j ( x, v ) = 0 otherwise Recursion: node v with children u,w �� C j ( x, v ) = C j ( y, u ) · P x → y ( t vu ) C j ( y, w ) · P x → y ( t vw ) · y y Termination: m �� L = C j ( x, root ) · P( x ) j =1 x 26 Running Time Takes time O(nk 2 m), where n is the number of leaves in the tree, m is the number of sites, and k is the maximum number of states per site (for DNA, k=4) 27 Unidentifiability of the Root If the base substitution model is reversible (most of them are!), then rooting the same tree differently doesn’t change the likelihood. Phylogenetics-Likelihood - March 30, 2017

28 Questions? Phylogenetics-Likelihood - March 30, 2017

Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University - PowerPoint PPT Presentation

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University 2 The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S 3 Assumptions Characters are mutually independent Following a speciation

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Lecture 8: Finite State Machines And Sequential circuit Design CSE 140: Components and Design

Multiprogramming Single $PC Multiple $PCs (CPUs point of view) (process point of view) A

The many pit itfalls of poly lysemy: gaps and bri ridges between the dif ifferent

How far down the digital road will EL assessment go? TECHNOLOGY FOR TEACHERS IN ASSESSMENT

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

Statistical inference for incomplete Ins Couso ranking data: A comparison of two Mohsen Ahmadi

10-701 Probability and MLE (brief) intro to probability Basic notations Random variable -

Sambuz

Useful Links

Newsletter

Mail Us

Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University - PowerPoint PPT Presentation

1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University 2 The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S 3 Assumptions Characters are mutually independent Following a speciation

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics &amp; big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes &amp; RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Lecture 8: Finite State Machines And Sequential circuit Design CSE 140: Components and Design

Multiprogramming Single $PC Multiple $PCs (CPUs point of view) (process point of view) A

The many pit itfalls of poly lysemy: gaps and bri ridges between the dif ifferent

How far down the digital road will EL assessment go? TECHNOLOGY FOR TEACHERS IN ASSESSMENT

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

Statistical inference for incomplete Ins Couso ranking data: A comparison of two Mohsen Ahmadi

10-701 Probability and MLE (brief) intro to probability Basic notations Random variable -

Sambuz

Useful Links

Newsletter

Mail Us

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)