outline
play

Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de - PDF document

2/15/09 CSCI1950Z Computa3onal Methods for Biology Lecture 7 Ben Raphael February 9, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of trees 1


  1. 2/15/09 CSCI1950‐Z Computa3onal Methods for Biology Lecture 7 Ben Raphael February 9, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of trees 1

  2. 2/15/09 Distances from Sequences convergent mul3ple Chimpanzee: CCTGCCAGTTAGCAAACGG G T Ancestor: CCCGCGACTTAACAAACGC G CCTGCGAGTTAACAAACGA Human: parallel back coincident Hamming distance = 3. D S = differences per site = 3/20 12 total muta3ons. Jukes‐Cantor Model A C G T A ‐3α α α α C α ‐3α α α Q = G α α ‐3α α T α α α ‐3α P(t) = e Qt p xy (t) = Pr[X t = x | X 0 = y] = ¼ + ¾ e ‐4αt if x = y ¼ ‐ ¼ e ‐4αt if x ≠ y 2

  3. 2/15/09 Observed Differences vs. Jukes‐Cantor = α t Other Models In biology, not all subs3tu3ons are equally likely. {A, G} purines Kimura 2 parameter model transversion A C G T A ‐2β‐ α β α β {C, T} pyrimidines C β ‐2β‐ α β α transi3on G α β ‐2β‐ α β T β α β ‐2β‐ α Other models • HKY model for DNA • Other models for protein sequences (20 x 20 matrices) 3

  4. 2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a character matrix M, what is the “most likely” tree T that generated M? Rat: ACAGTGACGCCCCAAACGT Mouse: ACAGTGACGCTACAAACGT Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA CCTGTGACGTAGCAAACGA Human: Likelihood Given data D and a model (hypothesis) H for genera3on of D, the likelihood of H is the quan3ty L [ H ; D ] = Pr[ D | H ]. Note: likelihood of H is not the probability of H. 4

  5. 2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a character matrix M, what is the “most likely” tree T that generated M? Assume: 1. Characters evolve independently. 2. Constant rate of muta3on on each branch. 3. State of a node depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Maximum Likelihood Given a character matrix M , a tree T with branch lengths t * = t 1 , …, t 2 n ‐2 : L( T , t * ) = Pr[ M | T , t * ] is called the likelihood . Maximum likelihood : Find argmax T , t L( T , t * ) First: How to compute L(T, t * )? 5

  6. 2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a tree (T, t * ) with leaves labeled by characters in M , Pr[ M | T , t * ] is the probability of a labeling of ancestral nodes. Assume: 1. Characters evolve independently: Pr[ M | T , t * ] = Π i Pr[ M i | T , t * ] so consider each character separately 2. Constant rate of muta3on on each branch. 3. State of a vertex depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Example x t 5 t 6 y z t 2 t 3 G t 1 t 4 C A T 6

  7. 2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y t mutates to x in 3me t x Two species a T = tree topology t 1 t 2 x 1 , x 2 : characters for each species a : character for ancestor x 1 x 2 Pr[ x 1 , x 2 , a | T , t 1 , t 2 ] = q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] q a = Pr[ ancestor has character a] Probabilis3c Model Two species a T = tree topology t 1 t 2 x 1 , x 2 : characters for each species a : character for ancestor x 1 x 2 Pr[ x 1 , x 2 , a | T , t 1 , t 2 ] = q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] q a = Pr[ ancestor has character a] Pr[ x 1 , x 2 | T , t 1 , t 2 ] = Σ a q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). 7

  8. 2/15/09 Probabilis3c Model n species: x 1 , x 2 , …, x n Let α( i ) = ancestor of node i . Let a n +1 , a n +2 , …, a 2 n ‐1 = characters on internal nodes, where nodes are number from internal ver3ces up to root. Pr [ x 1 , ..., x n | T, t 1 , ..., t 2 n − 2 ] = 2 n − 2 n � � Pr [ a i | a α ( i ) , t i ] � Pr [ x i | a α ( i ) , t i ] q a 2 n − 1 i = n +1 i =1 a n +1 ,a n +2 ,..,a 2 n − 1 Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). Felsenstein’s Algorithm Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. a Compute via dynamic programming b c � � Pr [ T k | a ] = Pr [ b | a, t i ] Pr [ T i | b ] Pr [ c | a, t j ] Pr [ T j | c ] c b Ini3al condi3ons. For k = 1, …, n (leaf nodes) Pr[ T k | a ] = 1, if a = x k 0, otherwise. 8

  9. 2/15/09 Compu3ng the Likelihood Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Note: Root is node 2 n ‐1 Maximum Likelihood when T unknown Find T, t* that maximize: � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Must search over all trees T. Complexity unknown un3l recently: – Felsenstein book (2004): “There has also been no proof that the problem is NP‐hard (as there has been for many other methods” – Shamir notes (2000): “[Maximum likelihood] not proven to be NP‐complete.” • ML is NP‐hard (B. Chor and T. Tuller, RECOMB 2005). 9

  10. 2/15/09 Unknown branch lengths • T fixed, branch lengths t * are unknown. • Use local op3miza3on rou3ne: e.g. Newton’s method or Expecta3on Maximiza3on Finding Ancestral States Let Pr[ T k | a ] = probability of best assignment of ancestral states to nodes “below” node k , given a k = a. � � � � Pr [ T k | a ] = max Pr [ b | a, t i ] Pr [ T i | b ] max Pr [ c | a, t j ] Pr [ T j | c ] c b Traceback as before with Sankoff’s algorithm. 10

  11. 2/15/09 Max. Parsimony vs. Max. Likelihood • Set δ ij = ‐log P( j | i ) in weighted parsimony (Sankoff algorithm) • Weighted parsimony produces “maximum probability” assignments, ignoring branch lengths Searching over Tree Space • How to find T with maximum likelihood? • How to find T with maximum parsimony? 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend