CSI5126 . Algorithms in bioinformatics Phylogeny Marcel Turcotte - PowerPoint PPT Presentation

d H u v for all the edges u v is minimum; d H is the Hamming distance. . Character-based . . . . . . . . . Preamble Preamble Maximum likelihood . Character-based Maximum likelihood Small parsimony problem Problem: Find the most parsimonious labelling of the internal vertices in a given evolutionary tree. Input: A tree T with each leaf labelled by an m -character array. Output: Labels ( m -character arrays) for all the internal nodes such that Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics AGC AGT CGT ATT ACC

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem Problem: Find the most parsimonious labelling of the internal vertices in a given evolutionary tree. Input: A tree T with each leaf labelled by an m -character array. Output: Labels ( m -character arrays) for all the internal Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . AGC AGT CGT ATT ACC nodes such that Σ d H ( u , v ) for all the edges ( u , v ) is minimum; d H is the Hamming distance.

. Character-based . . . . . . . . . . Preamble Maximum likelihood . Preamble Character-based Maximum likelihood Observation Notice that the characters are independent . The total number of changes is the sum of the number of changes for the fjrst character, second character, and the third character. Thus, it suffjces to develop a method that works for a single character and to apply it to all the characters. Proposals? Marcel Turcotte . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . AGT 1 AGT 1 AGT ATT 1 2 AGC AGT CGT ATT ACC

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood Observation Notice that the characters are independent . The total number of changes is the sum of the number of changes for the fjrst character, second character, and the third character. Thus, it suffjces to develop a method that works for a single character and to apply it to all the characters. Proposals? Marcel Turcotte . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . AGT 1 AGT 1 AGT ATT 1 2 AGC AGT CGT ATT ACC

How to computer s c u ? What do you s c u . . . . . . . . . . . Preamble . . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem when u is labelled with c . need to know? What are the dependencies? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . u v w Let’s defjne s c ( u ) as the minimum parsimony score obtained

s c u . . . . . . . . . . . Preamble . . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem What do you need to know? What are the dependencies? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . u v w Let’s defjne s c ( u ) as the minimum parsimony score obtained when u is labelled with c . How to computer s c ( u ) ?

s c u . . . . . . . . . . . Preamble . . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem need to know? What are the dependencies? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . u v w Let’s defjne s c ( u ) as the minimum parsimony score obtained when u is labelled with c . How to computer s c ( u ) ? What do you

s c u . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem need to know? What are the dependencies? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . u v w Let’s defjne s c ( u ) as the minimum parsimony score obtained when u is labelled with c . How to computer s c ( u ) ? What do you

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem need to know? What are the dependencies? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . u v w Let’s defjne s c ( u ) as the minimum parsimony score obtained when u is labelled with c . How to computer s c ( u ) ? What do you s c ( u ) = . . .

s A v s A w s C v s C w s G v s G w s T v s T w . . . . . . . Preamble . . Maximum likelihood Character-based Maximum likelihood Preamble Character-based . Small parsimony problem For instance, what would be the most parsimonious score if u was labelled with A . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics u v w

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem For instance, what would be the most parsimonious score if u was labelled with A . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . u v w s A ( v )+? s A ( w )+? s C ( v )+? s C ( w )+? ? s G ( v )+? s G ( w )+? s T ( v )+? s T ( w )+?

s A v s A w s C v s C w s G v s G w s T v s T w . . . Small parsimony problem . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Preamble . For instance, what would be the most parsimonious score if u was labelled with A . 0 . 1 1 1 0 1 1 1 Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics u v w

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem For instance, what would be the most parsimonious score if u was labelled with A . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . u v w s A ( v ) + 0 s A ( w ) + 0 s C ( v ) + 1 s C ( w ) + 1 ? s G ( v ) + 1 s G ( w ) + 1 s T ( v ) + 1 s T ( w ) + 1

. . . . . . . . . . . . . . . . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem Marcel Turcotte . Preamble . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . u v w   s A ( v ) + 0 s A ( w ) + 0      s C ( v ) + 1  s C ( w ) + 1   s A ( u ) = min + min s G ( v ) + 1 s G ( w ) + 1      s T ( v ) + 1  s T ( w ) + 1  

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Weighted small parsimony problem (Sankofg 1975) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics u v w i { s i ( v ) + δ i , c } + min j { s j ( w ) + δ j , c } s c ( u ) = min where δ j , c is a k × k scoring matrix.

. G 0 1 1 1 T 1 0 1 1 1 C 1 0 1 C 1 1 1 0 A A G G 0.33 Marcel Turcotte 0 1 0.33 1 T 1 0 1 G T 0.33 1 0 1 C 1 0.33 1 0 A T C . . . . . . . . . . . . . . . . . . . . . . . . A . Examples of scoring matrices Maximum likelihood Character-based Preamble Maximum likelihood Character-based Preamble . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Solving the small parsimony problem General case. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . s c ( u ) = min i { s i ( v ) + δ i , c } + min j { s j ( w ) + δ j , c } u A C G T v w A C G T A C G T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Solving the small parsimony problem Initialisation . otherwise. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . For each leaf, s c ( v ) = 0 if character c is found at that node and ∞ u A C G T 0 v w 0 A C G T A C G T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . u A C G T v A C G T w x A C G T A C G T C C G A C A C G T A C G T A C G T A C G T A C G T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . u A C G T v A C G T w x A C G T A C G T C C G A C 0 0 0 0 0 A C G T A C G T A C G T A C G T A C G T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Is the solution unique ? How do you retrieve a solution? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . S A ( w ) = 2 = min {∞ + 0 , 0 + 1 , ∞ + 1 , ∞ + 1 } + min {∞ + 0 , 0 + 1 , ∞ + 1 , ∞ + 1 } S C ( w ) = 0 = min {∞ + 1 , 0 + 0 , ∞ + 1 , ∞ + 1 } + min {∞ + 1 , 0 + 0 , ∞ + 1 , ∞ + 1 } S G ( w ) = 2 = min {∞ + 1 , 0 + 1 , ∞ + 0 , ∞ + 1 } + min {∞ + 1 , 0 + 1 , ∞ + 0 , ∞ + 1 } S T ( w ) = 2 = min {∞ + 1 , 0 + 1 , ∞ + 1 , ∞ + 0 } + min {∞ + 1 , 0 + 1 , ∞ + 1 , ∞ + 0 } S A ( x ) = 1 = min { 0 + 0 , ∞ + 1 , ∞ + 1 , ∞ + 1 } + min {∞ + 0 , 0 + 1 , ∞ + 1 , ∞ + 1 } S C ( x ) = 1 = min { 0 + 1 , ∞ + 0 , ∞ + 1 , ∞ + 1 } + min {∞ + 1 , 0 + 0 , ∞ + 1 , ∞ + 1 } S G ( x ) = 2 = min { 0 + 1 , ∞ + 1 , ∞ + 0 , ∞ + 1 } + min {∞ + 1 , 0 + 1 , ∞ + 0 , ∞ + 1 } S T ( x ) = 2 = min { 0 + 1 , ∞ + 1 , ∞ + 1 , ∞ + 0 } + min {∞ + 1 , 0 + 1 , ∞ + 1 , ∞ + 0 } S A ( v ) = 2 = min { 2 + 0 , 0 + 1 , 2 + 1 , 2 + 1 } + min {∞ + 0 , ∞ + 1 , 0 + 1 , ∞ + 1 } S C ( v ) = 1 = min { 2 + 1 , 0 + 0 , 2 + 1 , 2 + 1 } + min {∞ + 1 , ∞ + 0 , 0 + 1 , ∞ + 1 } S G ( v ) = 1 = min { 2 + 1 , 0 + 1 , 2 + 0 , 2 + 1 } + min {∞ + 1 , ∞ + 1 , 0 + 0 , ∞ + 1 } S T ( v ) = 2 = min { 2 + 1 , 0 + 1 , 2 + 1 , 2 + 0 } + min {∞ + 1 , ∞ + 1 , 0 + 1 , ∞ + 0 } S A ( u ) = 3 = min { 2 + 0 , 1 + 1 , 1 + 1 , 2 + 1 } + min { 1 + 0 , 1 + 1 , 2 + 1 , 2 + 1 } S C ( u ) = 2 = min { 2 + 1 , 1 + 0 , 1 + 1 , 2 + 1 } + min { 1 + 1 , 1 + 0 , 2 + 1 , 2 + 1 } S G ( u ) = 3 = min { 2 + 1 , 1 + 1 , 1 + 0 , 2 + 1 } + min { 1 + 1 , 1 + 1 , 2 + 0 , 2 + 1 } S T ( u ) = 4 = min { 2 + 1 , 1 + 1 , 1 + 1 , 2 + 0 } + min { 1 + 1 , 1 + 1 , 2 + 1 , 2 + 0 }

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . u A C G T v A C G T w x A C G T A C G T G A C A C A C G T A C G T A C G T A C G T A C G T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Small parsimony problem . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . u A C G T v A C G T w x A C G T A C G T G A C A C 0 0 0 0 0 A C G T A C G T A C G T A C G T A C G T

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Marcel Turcotte . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . S A ( w ) = 1 = min {∞ + 0 , ∞ + 1 , 0 + 1 , ∞ + 1 } + min { 0 + 0 , ∞ + 1 , ∞ + 1 , ∞ + 1 } S C ( w ) = 2 = min {∞ + 1 , ∞ + 0 , 0 + 1 , ∞ + 1 } + min { 0 + 1 , ∞ + 0 , ∞ + 1 , ∞ + 1 } S G ( w ) = 1 = min {∞ + 1 , ∞ + 1 , 0 + 0 , ∞ + 1 } + min { 0 + 1 , ∞ + 1 , ∞ + 0 , ∞ + 1 } S T ( w ) = 2 = min {∞ + 1 , ∞ + 1 , 0 + 1 , ∞ + 0 } + min { 0 + 1 , ∞ + 1 , ∞ + 1 , ∞ + 0 } S A ( x ) = 1 = min { 0 + 0 , ∞ + 1 , ∞ + 1 , ∞ + 1 } + min {∞ + 0 , 0 + 1 , ∞ + 1 , ∞ + 1 } S C ( x ) = 1 = min { 0 + 1 , ∞ + 0 , ∞ + 1 , ∞ + 1 } + min {∞ + 1 , 0 + 0 , ∞ + 1 , ∞ + 1 } S G ( x ) = 2 = min { 0 + 1 , ∞ + 1 , ∞ + 0 , ∞ + 1 } + min {∞ + 1 , 0 + 1 , ∞ + 0 , ∞ + 1 } S T ( x ) = 2 = min { 0 + 1 , ∞ + 1 , ∞ + 1 , ∞ + 0 } + min {∞ + 1 , 0 + 1 , ∞ + 1 , ∞ + 0 } S A ( v ) = 2 = min { 1 + 0 , 2 + 1 , 1 + 1 , 2 + 1 } + min {∞ + 0 , 0 + 1 , ∞ + 1 , ∞ + 1 } S C ( v ) = 2 = min { 1 + 1 , 2 + 0 , 1 + 1 , 2 + 1 } + min {∞ + 1 , 0 + 0 , ∞ + 1 , ∞ + 1 } S G ( v ) = 2 = min { 1 + 1 , 2 + 1 , 1 + 0 , 2 + 1 } + min {∞ + 1 , 0 + 1 , ∞ + 0 , ∞ + 1 } S T ( v ) = 3 = min { 1 + 1 , 2 + 1 , 1 + 1 , 2 + 0 } + min {∞ + 1 , 0 + 1 , ∞ + 1 , ∞ + 0 } S A ( u ) = 3 = min { 2 + 0 , 2 + 1 , 2 + 1 , 3 + 1 } + min { 1 + 0 , 1 + 1 , 2 + 1 , 2 + 1 } S C ( u ) = 3 = min { 2 + 1 , 2 + 0 , 2 + 1 , 3 + 1 } + min { 1 + 1 , 1 + 0 , 2 + 1 , 2 + 1 } S G ( u ) = 4 = min { 2 + 1 , 2 + 1 , 2 + 0 , 3 + 1 } + min { 1 + 1 , 1 + 1 , 2 + 0 , 2 + 1 } S T ( u ) = 5 = min { 2 + 1 , 2 + 1 , 2 + 1 , 3 + 0 } + min { 1 + 1 , 1 + 1 , 2 + 1 , 2 + 0 }

labelled with arrays of m characters such that the overall . Preamble . . . . . . . . Preamble Character-based Maximum likelihood Maximum likelihood Character-based . Large parsimony problem Problem: Find a tree having the minimum parsimony score. Input: An n m matrix (alignment). Output: A tree T with n leaves labeled by the n rows ( m characters) of the input matrix. The internal nodes are parsimony score is minimum. The problem is known to be -complete. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

labelled with arrays of m characters such that the overall . Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Large parsimony problem Problem: Find a tree having the minimum parsimony score. Output: A tree T with n leaves labeled by the n rows ( m characters) of the input matrix. The internal nodes are parsimony score is minimum. The problem is known to be -complete. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics Input: An n × m matrix (alignment).

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Large parsimony problem Problem: Find a tree having the minimum parsimony score. Output: A tree T with n leaves labeled by the n rows ( m characters) of the input matrix. The internal nodes are parsimony score is minimum. The problem is known to be -complete. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics Input: An n × m matrix (alignment). labelled with arrays of m characters such that the overall

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood Large parsimony problem Problem: Find a tree having the minimum parsimony score. Output: A tree T with n leaves labeled by the n rows ( m characters) of the input matrix. The internal nodes are parsimony score is minimum. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics Input: An n × m matrix (alignment). labelled with arrays of m characters such that the overall The problem is known to be NP -complete.

Exhaustive approach: 4 to 15 sequences 11 minimum parsimony score is reported. the minimum parsimony score. The tree that has the overall exhaustively enumerate all the trees, and for each tree calculate For a small number of species, say less than 15, it is be possible to 7,905,853,580,625 15 316,234,143,225 # Species 13,749,310,575 13 654,729,075 12 34,459,425 14 2,027,025 10 1,35,135 9 10,395 8 945 7 105 6 15 5 3 4 # unrooted trees . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy Given three species, there is a single unrooted tree. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics B A C

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy Each branch can serve as an insertion point, adding a new branch ofg the middle of any existing branch. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics B A C

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy Therefore producing 3 four species unrooted trees. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . A D C B D B B A A C C A D B C

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy The same process is applied to all 3 four species trees. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . A D C B D B B A A C C A D B C

. . . . . . . . . . . . . . . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy A four species unrooted tree has 5 edges, thus leading to 5 new unrooted trees. Marcel Turcotte . Preamble . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . A D E D A E B C B C A D B C B C A D A D E A D E C B E B C

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy There will be 15 fjve species unrooted trees. Marcel Turcotte . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . D B E B D E A C A C D B A C A C D B D B E D B E C A E A C

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Sequential addition strategy Marcel Turcotte . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . A D E D A E C B C B A D C B C B A D A D E A D E B C E C B

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . A D E D A E C B C B A D C B D B C B E B D E A D A D E A D A C A C E B C E D B B C B A A C C A C A D D B D B E D A E E D B E C A E B C B C A D A C B C B C A D A D E A D E C B E B C

descendants, are pruned from the search space. . Just like backtracking , branch and bound is a state . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound Branch and bound is used to solve optimization space search algorithm. . problems. Herein, for simplicity, let’s assume a minimization problem is to be solved. Just like backtracking , non-promising nodes, and their For each node , the algorithm computes a bound . The bound generally consists of two terms : the cost for the partial solution up to that node, as well as, a lower bound for the minimum cost extending the solution (visiting the yet unseen states). The descendants of a node are pruned (not visited), if the best solution that could be found in the sub-tree would be worse than the best solution found so far. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Just like backtracking , branch and bound is a state . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound space search algorithm. . Branch and bound is used to solve optimization problems. Herein, for simplicity, let’s assume a minimization problem is to be solved. Just like backtracking , non-promising nodes, and their For each node , the algorithm computes a bound . The bound generally consists of two terms : the cost for the partial solution up to that node, as well as, a lower bound for the minimum cost extending the solution (visiting the yet unseen states). The descendants of a node are pruned (not visited), if the best solution that could be found in the sub-tree would be worse than the best solution found so far. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics descendants, are pruned from the search space.

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Branch and bound (continued) No prescribed order to search the tree: Queue: breadth-fjrst search with branch and bound pruning; Stack: depth-fjrst search with branch and bound pruning; Priority queue: best-fjrst search with branch and bound pruning. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. While open is not empty . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound (version 1.0) : 4 to 20 sequences Let L, the minimum parsimony score far, be infinity. Create two empty lists, open and solutions Create an unrooted tree for three species and add it open Remove the front element of the list and call it current . Foreach tree t created by a sequential addition to current do If the minimum parsimony score of t is larger than L than discard t If the minimum parsimony score of t is is lower than L If t has n leaves: clear solutions add t to solutions set L to the minimum parsimony score of t Else add t to the rear of open Else (equals case) If t has n leaves: add t to solutions Else add t to the rear of open solutions is the list of all the solutions, their score is L. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . E A D D B E D A E A C C B C B Bound = L A D C B D B C B E B D E A D A D E A D A C A C E B C E D B B C B A A C C A C A D D B D B E D A E E D B E C A E B C B C A D A C B C B C A D A D E A D E C B E B C

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood . Marcel Turcotte . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . . . . . . . . . E A D D B E D A E A C C B C B A D C B D B C B E B D E A D A D E A D A C A C E B C E D B B C B A A C C A C A D D B D B E D A E E D B E C A E B C B C A D A C B C B C A D A D E A D E C B E B C Bound = L

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Branch and bound (version 2.0) How can you improve our algorithm? Estimating the cost of extending a solution (adding the remaining n k species to our tree, which already contains k sequences). How? Each site (character) introducing new states (nucleotide) will increase the parsimony score. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Character-based . . . . . . . . . . Preamble Maximum likelihood . Preamble Character-based Maximum likelihood Branch and bound (version 2.0) How can you improve our algorithm? Estimating the cost of extending a solution (adding contains k sequences). How? Each site (character) introducing new states (nucleotide) will increase the parsimony score. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics the remaining n − k species to our tree, which already

. G G G G G G G G G 6 G 5 4 3 2 1 Species Sites (characters) . A T Character-based A Marcel Turcotte C A C T G G A C G A A G G A T A G Maximum likelihood Branch and bound (version 2.0) Preamble . . . . . . . . . . . . . . . . . . . . Maximum likelihood . . . Character-based Preamble . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics α β γ δ ϵ

. Remove the front element of the list and call it current . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound (version 2.0) Let L, the minimum parsimony score far, be infinity. Create two empty lists, open and solutions Create an unrooted tree for three species and add it open While open is not empty Foreach tree t created by a sequential addition to current do . Let Lt be the minimum parsimony score of t + extension cost If Lt is larger than L than discard t If Lt is is lower than L If t has n leaves: clear solutions add t to solutions set L to the minimum parsimony score of t Else add t to the rear of open Else (equals case) If t has n leaves: add t to solutions Else add t to the rear of open solutions is the list of all the solutions, their score is L. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . Preamble . Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound (version 3.0) How can you improve our algorithm? Generate a realistic bound, using neighbour-joining, at the start of the algorithm. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. While open is not empty . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Branch and bound (version 3.0) Generate an initial tree T (using neighbour-joining method for instance) Compute L the minimum parsimony score of T ( lowest score so far ) Create two empty lists, open and solutions Create an unrooted tree for three species and add it open Remove the front element of the list and call it current . set L to the minimum parsimony score of t Marcel Turcotte solutions is the list of all the solutions, their score is L. Else add t to the rear of open If t has n leaves: add t to solutions Else (equals case) Else add t to the rear of open add t to solutions Foreach tree t created by a sequential addition to current do clear solutions If t has n leaves: L If Lt is is lower than If Lt is larger than L than discard t Let Lt be the minimum parsimony score of t + extension cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Branch and bound Other ideas to improve the algorithm: Use a priority queue to store the partial solutions. Thus always looking at the most promising solutions fjrst. Derive a tighter bound : Compatibility; Zharkikh rules. See [4]. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Character-based . . . . . . . . . . Preamble Maximum likelihood . Preamble Character-based Maximum likelihood Greedy algorithm 1. Generate an initial topology (using neighbour-joining, for instance); 2. Apply nearest neighbour interchange (NNI) transformations to all the internal edges; 3. Select the minimum parsimony tree; 4. Goto step 2. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Other heuristics include: subtree pruning and regrafting (SPR) or tree bisection and reconnection (TBR). Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . n > 20: Nearest-neighbour interchange (NNI) u w v x u v w x u w x v

. Searching the tree space . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Nearest-neighbour interchanges (NNI) . Given an internal branch and its four connected nodes, exchanging the positions of v and x . Moves are very “local” Subtree pruning and regrafting (SPR) Disconnects a subtree and reconnects that subtree in one of the branches of the remaining tree. Wider search. Tree bisection and reconnection Remove one branch, thus creating two subtrees. Consider all possible trees that are created by connecting one branch of the fjrst subtree to another branch of the second subtree. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics u , v , w , x . NNI generates two novel solutions: one by exchanging the postions of v and w , the other by

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Discussion What are the drawbacks of greedy approaches? Finds a local optimum! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood Remarks : distance-based vs character-based Distance-based methods compute the pairwise sequence distances 1) directly , 2) in isolation, 3) before inferring the tree topology Instead, for character-based methods, 1) extant sequences are never compared directly 2) the pairwise distances depend on the reconstructed ancestral sequences, and 3) this process (solving the small phylogeny problem) takes all the sequences into account Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood Remarks The particular methods that were presented are not modelling the base substitutions accurately. Specifjcally, these methods are ignoring the fact that multiple substitutions (for a given site) are likely to occur in any given branch of the tree (time interval). Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood vs Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics A A T

. . . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood vs Marcel Turcotte . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . A A G C A T A T

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood III. Maximum likelihood methods Informal discussion! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood III. Maximum likelihood methods branch length, evolutionary model…). Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics P ( D | Θ) denotes the probability of the data given some model Θ (set of parameters, such as tree topology, Let L (Θ) = P ( D | Θ) be the likelihood function . The maximum likelihood estimate is the value of Θ that maximizes L (Θ) .

. Character-based . . . . . . . . Preamble Character-based Maximum likelihood Preamble Maximum likelihood . III. Maximum likelihood methods is defjned as the probability of the data (generally sequences) for a given tree (topology, branch length, A maximum likelihood approach fjnds a tree, amongst all possible trees, with the largest value of L . Such tree explains best the data. [ See Felsenstein 2004, pages 251–253. ] Assumption s that are generally made: 1. Sites are independent 2. Lineages are independent Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics Let L (Θ) be the likelihood of a phylogenetic tree. L (Θ) evolutionary model), P ( observed sequences | tree ) .

. Maximum likelihood . . . . . . . . . Preamble Character-based Preamble . Character-based Maximum likelihood III. Maximum likelihood methods is defjned as the probability of the data (generally sequences) for a given tree (topology, branch length, A maximum likelihood approach fjnds a tree, amongst tree explains best the data. [ See Felsenstein 2004, pages 251–253. ] Assumption s that are generally made: 1. Sites are independent 2. Lineages are independent Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics Let L (Θ) be the likelihood of a phylogenetic tree. L (Θ) evolutionary model), P ( observed sequences | tree ) . all possible trees, with the largest value of L (Θ) . Such

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Probability of a tree i j Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . i j t 2 t 1 A A T ∑ ∑ p i q iA ( t 2 ) q ij ( t 2 − t 1 ) q jA ( t 1 ) q jT ( t 1 )

. Character-based . . . . . . . . . . Preamble Maximum likelihood . Preamble Character-based Maximum likelihood Probability of a tree (cont.) i j position in the 3 organisms under study. Assuming that the positions (sites) are independent one from another (are evolving independently), the probability of the tree would be the product over all site probabilities. Marcel Turcotte . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . i j t 2 t 1 k l m ∑ ∑ p i q ik ( t 2 ) q ij ( t 2 − t 1 ) q jl ( t 1 ) q jm ( t 1 ) where k , l , m are nucleotide types found at the given sequence

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood Probability of a tree (cont.) at a given site knowing that its ancestor had the nucleotide type i at the same position at time t (earlier). Examples of substitution schemes modeling multiple substitutions for a given time interval include Jukes-Cantor one-parameter model and Kimura’s two-parameter model. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics The q ij ( t ) terms give the probability of fjnding the nucleotide type j

. . . . . . . . . . . . . . Preamble Character-based Maximum likelihood Preamble Character-based Maximum likelihood Probability of a tree : model of evolution Transition rate: blue and transversion rate: red Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics C G T A

. Character-based . . . . . . . . Preamble Character-based Maximum likelihood Preamble Maximum likelihood . Probability of a tree: model of evolution JC69 : Jukes and Cantor 1969 ; bases are equiprobable; transition rate = transversion rate K80 : Kimura 1980 ; bases are equiprobable; transition rate transversion rate F81 : Felsenstein 1981 ; variable base composition; transition rate = transversion rate HKY85 : Hasegawa et al. 1985 ; variable base composition; transition rate transversion rate; variable transition and transversion rates Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Preamble . . . . . . . . Preamble Character-based Maximum likelihood Character-based . Maximum likelihood Probability of a tree: model of evolution JC69 : Jukes and Cantor 1969 ; bases are equiprobable; transition rate = transversion rate transversion rate F81 : Felsenstein 1981 ; variable base composition; transition rate = transversion rate HKY85 : Hasegawa et al. 1985 ; variable base composition; transition rate transversion rate; variable transition and transversion rates Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics K80 : Kimura 1980 ; bases are equiprobable; transition rate ̸ =

CSI5126 . Algorithms in bioinformatics Phylogeny Marcel Turcotte - PowerPoint PPT Presentation

. Preamble . . . . . . . . . . Character-based . Maximum likelihood Preamble Character-based Maximum likelihood CSI5126 . Algorithms in bioinformatics Phylogeny Marcel Turcotte School of Electrical Engineering and Computer

CSI5126 . Algorithms in bioinformatics Deterministic Sequence Motifs Marcel Turcotte School of

CSI5126 . Algorithms in bioinformatics Overview of the course content and expectations Marcel

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Essential Cellular Biology (continued) Marcel Turcotte

CSI5126 . Algorithms in bioinformatics Essential Cellular Biology Marcel Turcotte School of

CSI5126 . Algorithms in bioinformatics RNA Secondary Structure Search Problem Marcel Turcotte

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Substitution Score Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Hidden Markov Models (continued) Marcel Turcotte School of

CSI5126 . Algorithms in bioinformatics Multiple Sequence Alignment (MSA) Marcel Turcotte School

CSI5126 . Algorithms in bioinformatics Hidden Markov Models Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Pairwise Sequence Alignment Marcel Turcotte School of

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Nuclear Safety Authority (ASN) opinion n 2012-AV-0139 of 3 rd January 2012 concerning the

Veriolog Overview Hardware Description Languages HDL CS/EE 3710 Designed to be an

A NOTHER REASON TO COME TO B RAZIL ! C OMPUTATIONAL GROUP THEORY (CGT) After some early work in

Game Theory -- Administration Patrick Loiseau EURECOM Fall 2016 1 Administration Courses

On the distance matrices of the CP graphs Jephian C.-H. Lin Department of Applied Mathematics,

Introducing CLOUD DIAL The Leading Cloud Dial company in Emerging Markets FOR YOUR CUSTOMERS 2

Visual Comparison of Customer Stickiness in Retail Stores Tao Jiang 1,3,4 , Lei Shi 1 , Ye Zhao 2

Finitely Repeated Games 14.12 Game Theory Muhamet Yildiz 1 Road Map 1.