Phylogenetics Introduction to Bioinformatics Dortmund, - PowerPoint PPT Presentation

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1

Phylogenetics ● phylum = tree ● phylogenetics: reconstruction of evolutionary trees ● phylogeny: an evolutionary tree, “Stammbaum” 2

Tree Of Life Web Project URL: http://www.tolweb.org 3

Software Collection ● URL: http://evolution.genetics.washington.edu/phylip/software.html ● 4

PHYLIP ● PHYLIP is one of the most widely used software packages for phylogenetic analysis. ● PHYLIP project homepage: http://evolution.genetics.washington.edu/phylip.html ● Online server URL: http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html 5

Trees ● T = (V,E) a tree is a graph , consists of vertices and edges ● V = vertices, also called nodes – leaves L, inner nodes N, root r (for rooted trees) ● E = edges (connect vertices) ● Trees can be rooted or unrooted ● Trees are connected , acyclic graphs ● Unrooted binary trees satisfy: |E| = 2|L| - 3 and |N| = |L| - 2 Rooted trees have one more edge, plus a root 6

Unrooted and Rooted Trees inner node leaf root edge 7

Number of Trees ● unrooted trees 3 leaves: 1 4 leaves: 3 5 leaves: 3*5 = 15 6 leaves: 15*7 = 105 ● rooted trees 3 leaves: 3 4 leaves: 3*5 = 15 5 leaves: 15*7 = 105 ● super-exponentially many trees 8

Principles in Phylogenetics ● Parsimony Methods : Occam's razor, simplest (=shortest) explanation is best ● Distance-based methods : Distances in tree should resemble pairwise distances between sequences ● Maximum Likelihood methods : what's the most plausible (not: probable!) evolutionary scenario? ● Bayesian methods : what's the most probable scenario, considering 9 prior knowledge and the sequence data?

Small Parsimony Problem ● Given a tree, sequences at the leaves, a multiple alignment, a cost matrix for substitutions, ● find sequences at inner nodes of the tree to minimize overall change cost ? along all edges ● Efficient algorithms: ? ? – Fitch O(|V|*|Alphabet|) – Sankoff ? 10 A G G A G

Big Parsimony Problem ● Given sequences (at the leaves), ● find a tree and the best labeling of inner nodes with minimal substitution cost ● No efficient algorithm known, problem is NP-hard ● Essentially have to enumerate all trees. Super-exponentially many trees! 11

Distance-Based Methods ● Given sequences, first compute a pairwise distance matrix using – edit distance – edit distance “corrected” for minimality (“Jukes-Cantor correction”, “Kimura correction”) – distance based on an evolutionary model based on a time-continuous Markov process ● Then find a tree (unrooted or rooted) and edge lengths , such that distances in the tree match all pairwise distances in the distance matrix 12

Fitting Distances on a Tree: Problems ● More pairwise distance values in the matrix than edges in the tree: problem overdetermined. A perfectly fitting tree does not always exist! ● Metric : typical distance properties Tree metric : Distance values fit a tree Ultrametric : Distance values fit a rooted tree, all paths from root to leaves have same length ● Good algorithms: Find correct tree + edge lengths if one exists, find good approximation otherwise ● UPGMA for ultrametric, NJ for tree metric 13

Clustering Algorithm “UPGMA” ● Unweighted pair group method using averages ● always returns a rooted ultrametric tree (all leaves have same distance from the root) ● correct tree returned if distances are ultrametric ● Algorithm: – While more than one object remains: ● Find the pair of objects x,y with the smallest distance ● Replace them with a single object (x,y) ● Compute distances from (x,y) to other objects a,b,c... by averaging d(x,a) with d(y,a), ... – Order in which objects are grouped defines a tree 14

Neighbor Joining (NJ) ● creates an unrooted tree by iteratively joining two subtrees, taking into account their distance and also the distance between all other subtrees. ● The two closest sequences need not be neighbors! B A 10 10 10 10 d(A,B) = 30, smallest; but tree is AC || BD 30 30 C D ● NJ finds the correct tree if the distances admit one. ● NJ finds a “good” tree otherwise (heuristic) 15

Probabilistic Methods ● Require a model of an evolutionary process, sometimes limited to substitutions (no gaps) ● Maximum Likelihood (ML) – Assuming a tree topology and edge lengths (T,L), what is the total probability P(seqs | T,L), summed over all choices of inner node sequences, that this choice generates the observed sequences? This is the Likelihood of (T,L) Maximize this over all possible choices (T,L) ● Bayesian (more natural question) – Using prior knowledge / personal bias, for each choice (T,L) as above, compute P((T,L) | seqs), 16 conditional probability of (T,L), given the seqs.

Probabilistic Models ● require understanding of time-continuous Markov processes as evolutionary substitution models ● require understanding of probability theory ● Top-level view: a similar, but “softer” approach, than Parsimony methods. There is not “one” solution, but each tree has a certain likelihood / probability. ● No details on algorithms given here 17

Which Method should I use? (Personal Opinion) ● Distance-Based methods usually work fine. As a good first choice, run some NJ variant. ● Parsimony may underestimate the true number of evolutionary changes, as it looks for the “shortest” possible explanation. OK when sequences are closely related. Problem when sequences are distantly related! Parsimony has no “edge lengths”! ● Probabilistic methods might return more accurate trees than distance methods, but are usually slower. 18

How robust is the tree? ● Robustness := tree does not change (fundamentally) when small errors are introduced into the data ● Robustness is not accuracy! ● Accuracy := the tree is (close to) the biologically correct one ● A tree that is not robust, however, is “instable”, and unlikely to be accurate. ● Accuracy: hard to measure (except in simulations) ● Robustness: easy to measure 19

Measuring Robustness ● Basic idea: – For as many times as possible, ● modify original sequences / alignment slightly ● compute and store tree for modified data – Finally, compare original tree with those trees – Or, compute a consensus tree (multifurcating?) ● Frequently done using “bootstrap”: – Randomly draw a selection of original alignment columns, of the same cardinality as original alignment ● Phylip contains a program for generating bootstrap trees. 20

Phylogenetics Introduction to Bioinformatics Dortmund, - PowerPoint PPT Presentation

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary trees phylogeny: an

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University The Problem Input: Multiple

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth

Small phylogeny problem: character evolution trees Arvind Gupta J an Ma nuch Ladislav

Limit Laws for the Number of Groups formed by Social Animals under the Extra Clustering Model

GPU computing and the tree of life Michael P . Cummings Center for Bioinformatics and

Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus,

Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grnewald

The Biology of Amphibians Mark Mandica Executive Director The Amphibian Foundation

Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik University of Genoa Algebraic

Phylogenetics Introduction to Bioinformatics Dortmund, - PowerPoint PPT Presentation

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary trees phylogeny: an

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics &amp; big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes &amp; RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University The Problem Input: Multiple

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth

Small phylogeny problem: character evolution trees Arvind Gupta J an Ma nuch Ladislav

Limit Laws for the Number of Groups formed by Social Animals under the Extra Clustering Model

GPU computing and the tree of life Michael P . Cummings Center for Bioinformatics and

Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus,

Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grnewald

The Biology of Amphibians Mark Mandica Executive Director The Amphibian Foundation

Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik University of Genoa Algebraic

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)