csce 471 871 lecture 5
play

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic - PowerPoint PPT Presentation

CSCE 471/871 Lecture 5: Building CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott Phylogenetic Trees Building Trees Stephen Scott Parsimony Heins Algorithm sscott@cse.unl.edu 1 / 26 Outline


  1. CSCE 471/871 Lecture 5: Building CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott Phylogenetic Trees Building Trees Stephen Scott Parsimony Hein’s Algorithm sscott@cse.unl.edu 1 / 26

  2. Outline CSCE 471/871 Lecture 5: Building Phylogenetic Trees Stephen Scott Phylogenetic trees Phylogenetic Trees Building trees from pairwise distances Building Trees Parsimony Parsimony Hein’s Simultaneous sequence alignment and phylogeny Algorithm 2 / 26

  3. Phylogenetic Trees CSCE 471/871 Lecture 5: Assumption: all organisms on Earth have a common Building Phylogenetic ancestor Trees ⇒ all species are related in some way Stephen Scott Relationships represented by phyogenetic trees Phylogenetic Trees Trees can represent relationships between orthologs or Building Trees paralogs Parsimony Othorlogs: Genes in different species that evolved from Hein’s a common ancestral gene by speciation (evolution of Algorithm one species out of another) Normally, orthologs retain the same function in the course of evolution Paralogs: genes related by duplication within a genome In contrast to orthologs, paralogs evolve new functions 3 / 26

  4. Phylogenetic Trees (2) CSCE 471/871 Lecture 5: Building Phylogenetic Trees Stephen Scott We’ll use binary trees, both rooted and unrooted Phylogenetic Trees Rooted for when we know the direction of evolution Building Trees (i.e., the common ancestor) Parsimony Can sometimes find the root by adding a distantly Hein’s Algorithm related organism/sequence to an existing tree (Fig 7.1) 4 / 26

  5. Phylogenetic Trees (3) CSCE 471/871 Lecture 5: Building Phylogenetic A weighted tree, where each weight ( edge length ) is an Trees estimate of evolutionary time between events Stephen Scott Based on distance measure (e.g., substitution scoring Phylogenetic matrices) between sequences Trees Gives a reasonably accurate approximation of relative Building Trees evolutionary times, despite the fact that sequences can Parsimony evolve at different rates Hein’s Algorithm Number of possible binary trees on n nodes grows exponentially in n E.g., n = 20 has about 2 . 2 × 10 20 trees We’ll use hueristics, of course 5 / 26

  6. Building Trees from Pairwise Distances UPGMA CSCE Start with some distance measure between sequences, 471/871 Lecture 5: e.g., Jukes-Cantor: Building Phylogenetic Trees d ij = − 0 . 75 log ( 1 − 4 f ij / 3 ) , Stephen Scott where f ij is fraction of residues that differ between Phylogenetic Trees sequences x i and x j when pairwise aligned Building Trees UPGMA UPGMA (unweighted pair group method average) algorithm Neighbor Joining Parsimony One of a family of hierarchical clustering algorithms Hein’s Algorithm Basic idea of algorithmic family: Find minimum inter-cluster distance d ij in current distance matrix, merge clusters i and j , then update distance matrix Differences among algorithms lie in matrix update For phylogenetic trees, also add edge lengths 6 / 26

  7. Building Trees from Pairwise Distances UPGMA (2) CSCE ∀ i , assign seq x i to cluster C i and give it its own leaf, 1 471/871 Lecture 5: with height 0 Building Phylogenetic While there are more than two clusters 2 Trees Find minimum d ij in distance matrix Stephen Scott 1 Add to the clustering cluster C k = C i ∪ C j and delete C i 2 Phylogenetic and C j Trees For each cluster C ℓ �∈ { C k , C i , C j } 3 Building Trees UPGMA 1 Neighbor Joining � d k ℓ = d pq Parsimony | C k | | C ℓ | p ∈ C k , q ∈ C ℓ Hein’s Algorithm [Shortcut: Eq. (7.2)] Add to the tree node k with children i and j , with height 4 d ij / 2 When only C i and C j remain, place root at height d ij / 2 3 Example: Fig 7.4 7 / 26

  8. Building Trees from Pairwise Distances UPGMA (3) CSCE 471/871 Lecture 5: Building If the rate of evolution is the same at all points in Phylogenetic Trees original (target) phylogenetic tree, then UPGMA will Stephen Scott recover the correct tree This occurs iff length of all paths from root to leaves are Phylogenetic Trees equal in terms of evolutionary time Building Trees If this is not the case, then UPGMA may find incorrect UPGMA Neighbor Joining topology (Fig. 7.5, p. 170) Parsimony Can avoid this if distances satisfy ultrametric condition: Hein’s Algorithm for any three sequences x i , x j , x k , the distances d ij , d jk , d ik are either all equal, or two are equal and one is smaller 8 / 26

  9. Building Trees from Pairwise Distances Neighbor Joining CSCE If ultrametric property doesn’t hold, can still recover original 471/871 tree if additivity holds Lecture 5: Building Phylogenetic If, in original tree, distance between any pair of leaves = Trees Stephen Scott sum of lengths of edges of path connecting them Phylogenetic If additivity holds, neighbor joining finds the original tree Trees Building Trees First, find a pair of neighboring leaves i and j , assign UPGMA Neighbor Joining them parent k , then replace i and j with k , where for all Parsimony other leaves m , d km = ( d im + d jm − d ij ) / 2 Hein’s Algorithm But it does NOT work to simply choose pair ( i , j ) with minimum d ij (Fig. 7.7) Instead, choose ( i , j ) minimizing D ij = d ij − ( r i + r j ) , where L is current set of “leaves” and 1 � r i = d ik | L | − 2 k ∈ L 9 / 26

  10. Building Trees from Pairwise Distances Neighbor Joining (2) CSCE 471/871 Lecture 5: Building Phylogenetic Trees Initialize L = T = set of leaves 1 Stephen Scott While | L | > 2 2 Phylogenetic Choose i and j minimizing D ij 1 Trees Define new node k and set d km = ( d im + d jm − d ij ) / 2 for 2 Building Trees all m ∈ L UPGMA Neighbor Joining Add k to T with edges of lengths d ik = ( d ij + r i − r j ) / 2 3 Parsimony and d jk = d ij − d ik Hein’s Update L = { k } ∪ L \ { i , j } Algorithm 4 Add final, length- d ij edge between final nodes i and j 3 10 / 26

  11. Parsimony CSCE Widely used approach for tree building 471/871 Lecture 5: Scores tree based on the cost of substitutions going Building Phylogenetic from node to its child Trees ⇒ Will assign hypothetical ancestral sequences to internal Stephen Scott nodes, e.g., Figure 7.9 Phylogenetic Generally consists of two components Trees Computing cost of tree T over n aligned sequences 1 Building Trees Searching through the space of possible trees for 2 Parsimony min-cost one Hein’s Algorithm Treat each site independently of the others, so for a length- m alignment, run scoring algorithm on each of the m sites separately Let S ( a , b ) be cost of substituting b for a Scoring site (tree) u ∈ { 1 , . . . , m } , let S k ( a ) be the minimal cost for the assignment of symbol (residue) a to node k 11 / 26

  12. Parsimony (2) CSCE 471/871 Lecture 5: Building Phylogenetic Initialize k = 2 n − 1 (index of the root node) 1 Trees Recursively compute S k ( a ) for all a in the alphabet: 2 Stephen Scott If k is a leaf, set S k ( a ) = 0 for a = x k u and S k ( a ) = ∞ 1 Phylogenetic otherwise Trees ⇒ a must match u th symbol in sequence Building Trees Parsimony Else S k ( a ) = min b ( S i ( b ) + S ( a , b )) + min b ( S j ( b ) + S ( a , b )) , 2 Hein’s where i and j are k ’s children Algorithm Return min a { S 2 n − 1 ( a ) } as minimum cost of tree 3 Can recover ancestral residues by tracking where min comes from in recurisve step 12 / 26

  13. Parsimony (3) Searching for a Tree CSCE 471/871 Lecture 5: Building Phylogenetic Not practical to enumerate the entire set of possible Trees trees and score them all Stephen Scott Will use branch and bound to speed it up (though no Phylogenetic guarantee of an efficient algorithm) Trees When incrementally building a tree, adding edges will Building Trees never decrease its cost Parsimony Thus if a tree’s cost already exceeds the final cost of Hein’s Algorithm the best tree so far, we can discard it Algorithm: systematically grow existing tree by adding edges, stopping expansion if current tree’s cost exceeds final cost of best tree so far 13 / 26

  14. Hein’s Algorithm CSCE For simultaneously finding alignment and phylogeny 471/871 Lecture 5: Building Similar to parsimony in that, given a topology, it infers Phylogenetic Trees ancestral sequences Stephen Scott But this algorithm uses an affine gap penalty model Phylogenetic (separate penalties for opening and extending gaps) Trees First, it ascends the tree from the leaves, determining Building Trees the set of sequences that best align with leaf Parsimony sequences Hein’s Algorithm Represents such a set of sequences as a digraph Finding Sequences to Align with Leaves Building Sequence Then it works its way up toward the root, at each step Graphs Filling in Ancestors inferring the set of sequences that best align with the Building Topology child graphs Finally, it descends from the root to the leaves, fixing the specific ancestral sequences 14 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend