Outline Searching Through trees 1. Op3mizing branch lengths in ML. - PDF document

2/25/09 CSCI1950‐Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1

2/25/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a tree (T, t * ) with leaves labeled by characters in M , Pr[ M | T , t * ] is the probability of a labeling of ancestral nodes. Assume: 1. Characters evolve independently: Pr[ M | T , t * ] = Π j Pr[ M j | T , t * ] so consider each character separately 2. Constant rate of muta3on on each branch. 3. State of a vertex depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Probabilis3c Model n species: x 1 , x 2 , …, x n Let α( i ) = ancestor of node i . Let a n +1 , a n +2 , …, a 2 n ‐1 = characters on internal nodes, where nodes are number from internal ver3ces up to root. Pr [ x 1 , ..., x n | T, t 1 , ..., t 2 n − 2 ] = 2 n − 2 n � � Pr [ a i | a α ( i ) , t i ] � Pr [ x i | a α ( i ) , t i ] q a 2 n − 1 i = n +1 i =1 a n +1 ,a n +2 ,..,a 2 n − 1 Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). 2

2/25/09 Felsenstein’s Algorithm Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. a Compute via dynamic programming b c � � Pr [ T k | a ] = Pr [ b | a, t i ] Pr [ T i | b ] Pr [ c | a, t j ] Pr [ T j | c ] b c Ini3al condi3ons. For k = 1, …, n (leaf nodes) Pr[ T k | a ] = 1, if a = x k 0, otherwise. Maximum Likelihood when T unknown Find T, t* that maximize: � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Must search over all trees T. Complexity unknown un3l recently: – Felsenstein book (2004): “There has also been no proof that the problem is NP‐hard (as there has been for many other methods” – Shamir notes (2000): “[Maximum likelihood] not proven to be NP‐complete.” • ML is NP‐hard (B. Chor and T. Tuller, RECOMB 2005). – Use Jukes‐Cantor model . 3

2/25/09 Unknown branch lengths • T fixed, branch lengths t * are unknown. • Use local op3miza3on rou3ne: e.g. Newton’s method or Expecta6on Maximiza6on Finding the Op3mal tree Large Parsimony Problem Maximum Likelihood Input: Input: M : an n x m character matrix . M : an n x m character Output: matrix . A tree T with: • n leaves labeled by the n Output: rows of matrix M A tree T and branch • labeling of the internal lengths t * : ver3ces of T minimizing the parsimony • n leaves labeled by the score over all possible trees n rows of matrix M and all possible labelings of Pr[ M | T , t * ] is maximized. internal ver3ces 4

2/25/09 Finding the Op3mal tree • Both problems are NP‐hard. • Possible search space is huge, especially as n increases: – (2 n – 3)!! possible rooted trees – (2 n – 5)!! possible unrooted trees • Exhaus3ve search only possible w/ small n (< 10) • Thus, heuris3c search techniques (branch and bound, simulated annealing, gene3c algorithms, etc. are used) Heuris3c Search 1. Start with an arbitrary tree T. 2. Check “neighbors” of T *. 3. Move to a neighbor if it provides the best improvement in parsimony/likelihood score. Caveats: Could be stuck in local op3mum, and not achieve global op3mum 5

2/25/09 Tree Perturba3on Simple opera3on: add or remove an edge. ρ ( T 1 , T 2 ) = min { k : There exist α 1 , . . . , α k such that α k ◦ α k − 1 ◦ ... ◦ α 1 ( T 1 ) = T 2 } Trees and Splits Given a set X, a split is a par33on of X into two non‐ empty subsets A and B such that X = A | B. For a phylogene3c tree T with leaves L , each edge e defines a split L e = A | B , where A and B are the leaves in the subtrees obtained by removing e . e A B 6

2/25/09 Compu3ng the Splits Metric A phylogene3c tree T defines a collec3on of splits Σ(T) = { L e | e is edge in T}. Theorem : ρ( T 1 , T 2 ) = | Σ( T 1 ) \ Σ( T 2 ) | + |Σ(T 2 ) \ Σ(T 1 ) | = |Σ(T 1 )| + |Σ(T 2 )| ‐ 2 |Σ( T 1 ) ∩ Σ( T 2 )| Proof: (whiteboard) Nota3on: A \ B = {x: x ∈ A, x ∉ B} Example |Σ( T 1 )| = E( T 1 ) = 8. |Σ( T 2 )| = |E( T 2 )| = 8. |Σ( T 1 ) ∩ Σ( T 2 )| = |E( T )| = 6 From: Semple and Steel (2003) 7

2/25/09 Splits Metric Note: ρ( T 1 , T 2 ) = | Σ( T 1 ) \ Σ( T 2 ) | + |Σ(T 1 ) \ Σ(T 2 ) | = | Σ( T 1 ) Δ Σ( T 2 ) | (symmetric difference) Also called Robinson‐Foulds Metric (1981) Nearest Neighbor Interchange A Greedy Algorithm • A Branch Swapping algorithm • Only evaluates a subset of all possible trees • Defines a neighbor of a tree as one reachable by a nearest neighbor interchange – A rearrangement of the four subtrees defined by one internal edge – Only three different rearrangements per edge 8

2/25/09 Nearest Neighbor Interchange Rearrange four subtrees defined by one internal edge Figure: Jones and Pevzner Nearest Neighbor Interchange B ( n ) := (unrooted) binary phylogene3c trees with n leaves. Theorem (Robinson 1971): For all T and T ’ in B ( n ), there is a sequence of NNI that transform T into T ’. 9

2/25/09 Nearest Neighbor Interchange ρ NNI ( T 1 , T 2 ) = min { k : There exist β 1 , . . . , β k such that β k ◦ β k − 1 ◦ ... ◦ β 1 ( T 1 ) = T 2 } Claim: ρ NNI ≤ 2 ρ Proof: Every NNI can be obtained by dele3ng an edge and inser3ng an edge. Nearest Neighbor Interchange ρ NNI ( T 1 , T 2 ) = min { k : There exist β 1 , . . . , β k such that β k ◦ β k − 1 ◦ ... ◦ β 1 ( T 1 ) = T 2 } Compu3ng ρ NNI for binary trees is NP‐complete (Li and Zhang 1999) 10

2/25/09 Nearest Neighbor Interchange Claim : The number of NNI neighbors of a binary tree is 2(n‐3) Proof: (whiteboard) Neighboring Trees NNI neighborhood for trees with 5 leaves Parsimony scores for trees 11

2/25/09 Nearest Neighbor Interchange Subtree Pruning and Regra{ing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch http://artedi.ebc.uu.se/course/BioInfo-10p-2001/Phylogeny/Phylogeny-TreeSearch/SPR.gif 12

2/25/09 Subtree Pruning and Regra{ing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch Tree Bisec3on and Reconnec3on (TBR) 1. Remove a branch. 2. Reconnect subtrees by adding new branch that subdivides branches in both. 13

2/25/09 Tree Bisec3on and Reconnec3on (TBR) 1. Remove a branch. 2. Reconnect subtrees by adding new branch that subdivides branches in both. 14

Outline Searching Through trees 1. Op3mizing branch lengths in ML. - PDF document

2/25/09 CSCI1950Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Non-Bayesian Classifiers Part I: k -Nearest Neighbor Classifier and Distance Functions Selim

Outline Searching Through trees 1. Op3mizing branch lengths in ML. - PDF document

2/25/09 CSCI1950Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Non-Bayesian Classifiers Part I: k -Nearest Neighbor Classifier and Distance Functions Selim

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures