Algorithm Summary Method Input Output Sankoffs & Fitchs - PDF document

2/4/09 CSCI1950‐Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Algorithm Summary Method Input Output Sankoff’s & Fitch’s Characters, T A, B Parsimony Alg. Perfect Phylogeny Characters A, B, T Probabilis4c Felsenstein Characters, T, B A T = tree topology B = branch lengths A = ancestral states 1

2/4/09 Pairwise Compa4bility Test (Wilson 1965) Binary characters i and j are pairwise compa4ble if and only if: j is homogenous w.r.t i 0 or i 1 . Equivalently: i 1 and j 1 are disjoint or one contains the other Equivalently: i j k A 0 A 0 A 0 all 4 rows do not exist B 0 B 0 B 0 i 0 C 0 C 1 C 1 (0,0), (0,1), (1,0), (1,1) D 1 D 0 D 1 i 1 E 1 E 0 E 0 Pairwise Compa4bility Theorem (Estabrook et al. 1976) A set S of binary characters is mutually compa4ble if and only if all pairs c and c ’ of characters in S are pairwise compa4ble. Pairwise compa4bility  mutual compa4bility. 2

2/4/09 Perfect Phylogeny traits A set of mutually compa4ble binary 1 2 3 4 5 characters gives a perfect phylogeny : A 1 1 0 0 0 species B 0 0 1 0 0 C 1 1 0 1 0 1. Evolu4onary model D 0 0 1 0 1 – Binary characters {0,1} E 1 0 0 0 0 – Each character changes state only once in evolu4onary history (no homoplasy!). 2. Tree in which every muta4on is on an edge of the tree. 1 – All the species in one sub‐tree contain a 0, and all species in the other contain a 1. – For simplicity, assume root = (0, 0, 0, 0, 0) Last )me: algorithm to reconstruct a tree. 1 0 Trees and Splits • Given a set X, a split is a par44on of X into two non‐empty subsets A and B. X = A | B. • For a phylogene4c tree T with leaves L , each edge e defines a split L e = A | B , where A and B are the leaves in the subtrees obtained by removing e . i In perfect phylogeny, edges where binary character changes state gave split i 0 and i 1 . We will return to splits in a future lecture. i 1 i 0 3

2/4/09 Splits Equivalence Theorem A phylogene4c tree T defines a collec4on of splits Σ(T) = { L e | e is edge in T}. Splits A 1 | B 1 and A 2 | B 2 are pairwise compa3ble if at least one of A 1 ∩ A 2 , A 1 ∩ B 2 , B 1 ∩ A 2 , and B 1 ∩ B 2 is the empty set. Splits Equivalence Theorem : Let Σ be a collec4on of splits. There is a phylogene4c tree such that Σ(T) = Σ if and only if the splits in Σ are pairwise compa4ble. The Pairwise Compa4bility Theorem (for binary characters) follows from this theorem. Outline Distance‐based methods for phylogene4c tree reconstruc4on. • Review of distances/metrics. • Tree distances and addi4ve distances – Small and large phylogeny problems. • Non‐addi4ve distances and clustering – UPGMA and ultrametric distances. 4

2/4/09 Distances A distance on a set X is a func4on d: X  R sa4sfying: d( x , y ) ≥ 0, with equality iff x = y . For all x , y ∈ X, d( x , y ) = d( y , x ) [symmetry] For all x , y , z ∈ X, d( x , z ) ≤ d( x , y ) + d( y , z ) [triangle inequality] Examples: X = real numbers, d( x , y ) = | x – y | is distance. X = strings over some alphabet. d H ( s , t ) = number of posi4ons where s and t differ is called Hamming distance. Distances in Biological Data • String distances (e.g. Hamming distance, edit distance) on DNA/protein sequence data • Subs4tu4on model (Jukes‐Cantor, Kimura, etc.): scores for par4cular changes A  T, C  G, etc. Rat: ACAGTCACGCCCCACACGT Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGAGGTAGCAAACGA CCTGTGAGGTAGCACACGA Human: 5

2/4/09 Distance Matrix • For n species, form n x n distance matrix D ij • Example: D ij = edit distance between a gene in species i and species j . 0 7 11 10 Mouse: ACAGTGACGCCACACACGT 7 0 4 6 Gorilla: CCTGCGACGTAACAAACGC 11 4 0 2 Chimpanzee: CCTGCCAGTTAGCAAACGC 10 6 2 0 CCTGCCAGTTAGCACACGA Human: Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Sequence a gene of Gorilla: CCTGCGACGTAACAAACGC length m in n Chimpanzee: CCTGCCAGTTAGCAAACGC species  n x m CCTGCCAGTTAGCACACGA Human: alignment matrix. Reverse Transform transforma4on not possible due to loss into… of informa4on . 0 7 11 10 n x n distance matrix 7 0 4 6 11 4 0 2 10 6 2 0 6

2/4/09 Distances in Trees Given a tree T with a posi4ve weight w ( e ) on each edge, we define the tree distance d T on the set L of leaves by: d T ( i , j ) = sum of weights of edges on unique path from i to j. In evolu4onary biology, weights are some4mes called branch lengths . Distance in Trees: an Example j i d T (1,4) = 12 + 13 + 14 + 17 + 13 = 69 7

2/4/09 Distance vs. Tree Distance • n x n distance matrix for n species • Note that d T ( i , j ), tree distance between i and j, not necessarily equal to D ij as given by distance matrix. Rat: ACAGTGACGCCCCAAACGT Mouse: ACAGTGACGCTACAAACGT Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA Fivng a Distance Matrix • Given n species, we can compute the n x n distance matrix D ij • Evolu4on of these species is described by a tree that we don’t know . • We need an algorithm to construct a tree that best fits the distance matrix D ij Find a tree T such that: Lengths of path in an ( unknown ) tree T D ij = d T (i,j ) Distance between species ( known ) 8

2/4/09 Distance Based Phylogeny Problem Goal: Reconstruct an evolu4onary tree from a distance matrix Input: n x n distance matrix D ij Output: weighted tree T with n leaves fivng D Unknown topology of tree makes evolu4onary tree reconstruc4on hard ! # unrooted binary trees n leaves: T(n) = (2n‐3)! / ((n‐2)! 2 n‐2 ) 26 n = 24: T(n) = 5.74 x 10 If D is addi3ve , this problem has a solu4on and there is a simple algorithm to solve it Distance‐based vs. character‐based Key difference: Distance‐based methods do not reconstruct ancestral states. A B C D A 0 1 2 2 B 1 0 1 1 C 2 1 0 0 D 2 1 0 0 Note that C and D are iden4cal. 9

2/4/09 Reconstruc4ng a 3 Leaved Tree • Tree reconstruc4on for a 3x3 matrix is straighxorward • We have 3 leaves i, j, k and a center vertex c Observe: d ic + d jc = D ij d ic + d kc = D ik d jc + d kc = D jk Reconstruc4ng a 3 Leaved Tree (cont’d) d ic + d jc = D ij + d ic + d kc = D ik 2d ic + d jc + d kc = D ij + D ik 2d ic + D jk = D ij + D ik d ic = (D ij + D ik – D jk )/2 Similarly, d jc = (D ij + D jk – D ik )/2 d kc = (D ki + D kj – D ij )/2 10

2/4/09 Trees with > 3 Leaves • A binary tree with n leaves has 2n‐3 edges • Fivng a given tree to a distance matrix D requires solving a system with n ( n ‐1)/2 equa4ons and 2n‐3 variables • Solu4on not always possible for n > 3. Addi4ve Distance Matrices Matrix D is ADDITIVE if there exists a tree T with d ij ( T ) = D ij NON-ADDITIVE otherwise 11

2/4/09 Addi4ve Distance Phylogeny Small Addi>ve Distance Phylogeny : Given phylogene4c tree T and distance matrix D, determine branch lengths such that d T (i,j ) = D ij . Large Addi>ve Distance Phylogeny : Given distance matrix D, find T and branch lengths such that d T (i,j ) = D ij . Both of these problems can be solved efficiently. Reconstruc4ng Addi4ve Distances Given T x T D y 5 4 v w x y z 3 z v 0 10 17 16 16 3 4 7 w w 0 15 14 14 6 x 0 9 15 v y 0 14 If we know T and D, but do not know the length of each edge, we z 0 can reconstruct those lengths 12

2/4/09 Reconstruc4ng Addi4ve Distances Given T x T D y v w x y z z v 0 10 17 16 16 w w 0 15 14 14 x 0 9 15 v y 0 14 z 0 Reconstruc4ng Addi4ve Distances Given T x v w x y z Find neighbors v, w v 0 10 17 16 16 y (common parent) D w 0 15 14 14 x 0 9 15 z y 0 14 a w z 0 v a x y z d ax = ½ (d vx + d wx – d vw ) a 0 11 10 10 d ay = ½ (d vy + d wy – d vw ) D 1 x 0 9 15 y 0 14 d az = ½ (d vz + d wz – d vw ) z 0 13

2/4/09 Reconstruc4ng Addi4ve Distances Given T x a x y z Neighbors x, y y 5 a 0 11 10 10 (common parent) 4 D 1 x 0 9 15 b 3 y 0 14 z 3 a 4 c 7 w z 0 6 d(a, c) = 3 v d(b, c) = d(a, b) – d(a, c) = 3 a b z D 3 d(c, z) = d(a, z) – d(a, c) = 7 d(b, x) = d(a, x) – d(a, b) = 5 a 0 6 10 a c D 2 d(b, y) = d(a, y) – d(a, b) = 4 d(a, w) = d(z, w) – d(a, z) = 4 a 0 3 b 0 10 d(a, v) = d(z, v) – d(a, z) = 6 c 0 Correct!!! z 0 Trees and Neighbors Previous algorithm relied only on finding neighboring leaves: 1. Find neighboring leaves i and j with parent k 2. Remove the rows and columns of i and j 3. Add a new row and column corresponding to k , where the distance from k to any other leaf m can be computed as: D km = (D im + D jm – D ij )/2 Compress i and j into k , iterate algorithm for rest of tree 14

2/4/09 Finding Neighboring Leaves To find neighboring leaves we simply select a pair of closest leaves. WRONG! i j k l i 0 13 21 22 j 0 12 13 k 0 13 l 0 i and j are neighbors, but ( d ij = 13) > ( d jk = 12). Finding a pair of neighboring leaves is a nontrivial problem! Degenerate Triples • A degenerate triple is a set of three dis4nct elements 1 ≤ i, j, k ≤ n where D ij + D jk = D ik • Element j in a degenerate triple i,j,k lies on the evolu4onary path from i to k (or is aHached to this path by an edge of length 0). 15

Algorithm Summary Method Input Output Sankoffs & Fitchs - PDF document

2/4/09 CSCI1950Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hHp://cs.brown.edu/courses/csci1950z/ Algorithm Summary Method Input Output Sankoffs & Fitchs Characters, T A, B Parsimony Alg.

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Planning and Optimization C14. Merge-and-Shrink Abstractions: Generic Algorithm Malte Helmert and

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Another Dynamic Algorithm: Scoreboard Summary Tomasulo Algorithm Speedup 1.7 from compiler;

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

A Logarithmic Additive Integrality Gap for Bin Packing Rebecca Hoberg and Thomas Rothvoss Dep.

Analysing geoadditive regression data: a mixed model approach Thomas Kneib Institut f ur

Additive Cyclic Codes Funda Ozdemir Faculty of Engineering and Natural Sciences Sabanc

Strong approximation for additive functionals of geometrically ergodic Markov chains Florence

IDENTIFY THE VALUE IN AN ENTERPRISE-WIDE DEPLOYMENT OF ADDITIVE MANUFACTURING TRANSPORTATION

UNAF: A Special Set of Additive Differences with Application to the Differential Analysis of ARX

Joint work with Noga Alon WOLA 2019 GRAPH MODIFICATION For an input graph find the minimum

Two camps of program verifjcation Interactive Theorem Provers (ITPs): Coq, Agda, Lean, Idris, ...

Algorithm Summary Method Input Output Sankoffs & Fitchs - PDF document

2/4/09 CSCI1950Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hHp://cs.brown.edu/courses/csci1950z/ Algorithm Summary Method Input Output Sankoffs & Fitchs Characters, T A, B Parsimony Alg.

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Planning and Optimization C14. Merge-and-Shrink Abstractions: Generic Algorithm Malte Helmert and

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm &amp; Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Another Dynamic Algorithm: Scoreboard Summary Tomasulo Algorithm Speedup 1.7 from compiler;

1 Product Range Products 2 summary summary summary summary Relays with 8 and 11-Pins

An Ultramarathon Pie with Doge Glaze An Ultramarathon Pie with Doge Glaze Marathon: The Summary

A Logarithmic Additive Integrality Gap for Bin Packing Rebecca Hoberg and Thomas Rothvoss Dep.

Analysing geoadditive regression data: a mixed model approach Thomas Kneib Institut f ur

Additive Cyclic Codes Funda Ozdemir Faculty of Engineering and Natural Sciences Sabanc

Strong approximation for additive functionals of geometrically ergodic Markov chains Florence

IDENTIFY THE VALUE IN AN ENTERPRISE-WIDE DEPLOYMENT OF ADDITIVE MANUFACTURING TRANSPORTATION

UNAF: A Special Set of Additive Differences with Application to the Differential Analysis of ARX

Joint work with Noga Alon WOLA 2019 GRAPH MODIFICATION For an input graph find the minimum

Two camps of program verifjcation Interactive Theorem Provers (ITPs): Coq, Agda, Lean, Idris, ...

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM