outline
play

Outline Searching Through trees 1. Op3mizing branch lengths in ML. - PDF document

2/25/09 CSCI1950Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1


  1. 2/25/09 CSCI1950‐Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1

  2. 2/25/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a tree (T, t * ) with leaves labeled by characters in M , Pr[ M | T , t * ] is the probability of a labeling of ancestral nodes. Assume: 1. Characters evolve independently: Pr[ M | T , t * ] = Π j Pr[ M j | T , t * ] so consider each character separately 2. Constant rate of muta3on on each branch. 3. State of a vertex depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Probabilis3c Model n species: x 1 , x 2 , …, x n Let α( i ) = ancestor of node i . Let a n +1 , a n +2 , …, a 2 n ‐1 = characters on internal nodes, where nodes are number from internal ver3ces up to root. Pr [ x 1 , ..., x n | T, t 1 , ..., t 2 n − 2 ] = 2 n − 2 n � � Pr [ a i | a α ( i ) , t i ] � Pr [ x i | a α ( i ) , t i ] q a 2 n − 1 i = n +1 i =1 a n +1 ,a n +2 ,..,a 2 n − 1 Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). 2

  3. 2/25/09 Felsenstein’s Algorithm Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. a Compute via dynamic programming b c � � Pr [ T k | a ] = Pr [ b | a, t i ] Pr [ T i | b ] Pr [ c | a, t j ] Pr [ T j | c ] b c Ini3al condi3ons. For k = 1, …, n (leaf nodes) Pr[ T k | a ] = 1, if a = x k 0, otherwise. Maximum Likelihood when T unknown Find T, t* that maximize: � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Must search over all trees T. Complexity unknown un3l recently: – Felsenstein book (2004): “There has also been no proof that the problem is NP‐hard (as there has been for many other methods” – Shamir notes (2000): “[Maximum likelihood] not proven to be NP‐complete.” • ML is NP‐hard (B. Chor and T. Tuller, RECOMB 2005). – Use Jukes‐Cantor model . 3

  4. 2/25/09 Unknown branch lengths • T fixed, branch lengths t * are unknown. • Use local op3miza3on rou3ne: e.g. Newton’s method or Expecta6on Maximiza6on Finding the Op3mal tree Large Parsimony Problem Maximum Likelihood Input: Input: M : an n x m character matrix . M : an n x m character Output: matrix . A tree T with: • n leaves labeled by the n Output: rows of matrix M A tree T and branch • labeling of the internal lengths t * : ver3ces of T minimizing the parsimony • n leaves labeled by the score over all possible trees n rows of matrix M and all possible labelings of Pr[ M | T , t * ] is maximized. internal ver3ces 4

  5. 2/25/09 Finding the Op3mal tree • Both problems are NP‐hard. • Possible search space is huge, especially as n increases: – (2 n – 3)!! possible rooted trees – (2 n – 5)!! possible unrooted trees • Exhaus3ve search only possible w/ small n (< 10) • Thus, heuris3c search techniques (branch and bound, simulated annealing, gene3c algorithms, etc. are used) Heuris3c Search 1. Start with an arbitrary tree T. 2. Check “neighbors” of T *. 3. Move to a neighbor if it provides the best improvement in parsimony/likelihood score. Caveats: Could be stuck in local op3mum, and not achieve global op3mum 5

  6. 2/25/09 Tree Perturba3on Simple opera3on: add or remove an edge. ρ ( T 1 , T 2 ) = min { k : There exist α 1 , . . . , α k such that α k ◦ α k − 1 ◦ ... ◦ α 1 ( T 1 ) = T 2 } Trees and Splits Given a set X, a split is a par33on of X into two non‐ empty subsets A and B such that X = A | B. For a phylogene3c tree T with leaves L , each edge e defines a split L e = A | B , where A and B are the leaves in the subtrees obtained by removing e . e A B 6

  7. 2/25/09 Compu3ng the Splits Metric A phylogene3c tree T defines a collec3on of splits Σ(T) = { L e | e is edge in T}. Theorem : ρ( T 1 , T 2 ) = | Σ( T 1 ) \ Σ( T 2 ) | + |Σ(T 2 ) \ Σ(T 1 ) | = |Σ(T 1 )| + |Σ(T 2 )| ‐ 2 |Σ( T 1 ) ∩ Σ( T 2 )| Proof: (whiteboard) Nota3on: A \ B = {x: x ∈ A, x ∉ B} Example |Σ( T 1 )| = E( T 1 ) = 8. |Σ( T 2 )| = |E( T 2 )| = 8. |Σ( T 1 ) ∩ Σ( T 2 )| = |E( T )| = 6 From: Semple and Steel (2003) 7

  8. 2/25/09 Splits Metric Note: ρ( T 1 , T 2 ) = | Σ( T 1 ) \ Σ( T 2 ) | + |Σ(T 1 ) \ Σ(T 2 ) | = | Σ( T 1 ) Δ Σ( T 2 ) | (symmetric difference) Also called Robinson‐Foulds Metric (1981) Nearest Neighbor Interchange A Greedy Algorithm • A Branch Swapping algorithm • Only evaluates a subset of all possible trees • Defines a neighbor of a tree as one reachable by a nearest neighbor interchange – A rearrangement of the four subtrees defined by one internal edge – Only three different rearrangements per edge 8

  9. 2/25/09 Nearest Neighbor Interchange Rearrange four subtrees defined by one internal edge Figure: Jones and Pevzner Nearest Neighbor Interchange B ( n ) := (unrooted) binary phylogene3c trees with n leaves. Theorem (Robinson 1971): For all T and T ’ in B ( n ), there is a sequence of NNI that transform T into T ’. 9

  10. 2/25/09 Nearest Neighbor Interchange ρ NNI ( T 1 , T 2 ) = min { k : There exist β 1 , . . . , β k such that β k ◦ β k − 1 ◦ ... ◦ β 1 ( T 1 ) = T 2 } Claim: ρ NNI ≤ 2 ρ Proof: Every NNI can be obtained by dele3ng an edge and inser3ng an edge. Nearest Neighbor Interchange ρ NNI ( T 1 , T 2 ) = min { k : There exist β 1 , . . . , β k such that β k ◦ β k − 1 ◦ ... ◦ β 1 ( T 1 ) = T 2 } Compu3ng ρ NNI for binary trees is NP‐complete (Li and Zhang 1999) 10

  11. 2/25/09 Nearest Neighbor Interchange Claim : The number of NNI neighbors of a binary tree is 2(n‐3) Proof: (whiteboard) Neighboring Trees NNI neighborhood for trees with 5 leaves Parsimony scores for trees 11

  12. 2/25/09 Nearest Neighbor Interchange Subtree Pruning and Regra{ing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch http://artedi.ebc.uu.se/course/BioInfo-10p-2001/Phylogeny/Phylogeny-TreeSearch/SPR.gif 12

  13. 2/25/09 Subtree Pruning and Regra{ing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch Tree Bisec3on and Reconnec3on (TBR) 1. Remove a branch. 2. Reconnect subtrees by adding new branch that subdivides branches in both. 13

  14. 2/25/09 Tree Bisec3on and Reconnec3on (TBR) 1. Remove a branch. 2. Reconnect subtrees by adding new branch that subdivides branches in both. 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend