Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD - PowerPoint PPT Presentation

Comparison and Construction of Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus, Denmark 24 October 2019 1

Publications ▪ Gerth Stølting Brodal and Konstantinos Mampentzidis. Cache Oblivious Algorithms for Computing the Triplet Distance between Trees. In ESA 2017 , Vienna, Austria. ▪ Jesper Jansson, Konstantinos Mampentzidis, Ramesh Rajaby, and Wing-Kin Sung. Computing the Rooted Triplet Distance Between Phylogenetic Networks. In IWOCA 2019 , Pisa, Italy. ▪ Jesper Jansson, Konstantinos Mampentzidis, and Sandhya Thekkumpadan Puthiyaveedu. Building a Small and Informative Phylogenetic Supertree. In WABI 2019 , Niagara Falls, USA. 2

Algorithmic Theory and Practice ▪ Algorithm : sequence of steps for solving a computational problem ▪ Theory : algorithms are first designed & analyzed in a model of computation ▪ Practice : then implemented in a programming language (C, C++, python, …) RAM model I/O model Cache Oblivious model Frigo, Leiserson, Prokop, Ramachandran 1999 John von Neumann 1945 Aggarwal and Vitter 1988 I/O I/O Memory Memory Memory cache cache B B CPU CPU CPU ∞ ∞ ∞ M M Gap between Computer architecture continues Theory and Practice becoming more complicated Design Algorithm Engineering ▪ Term first used by G. F. Italiano who organized the “Workshop on Algorithm Engineering” Analysis Experiments in Venice, Italy, 1997 ▪ bridges the gap between theory and practice Implementation 3

Problems in Phylogenetics Rooted Tree Phylogenetic Rooted Phylogenetic Network (DAG) Reticulation vertices ▪ Different available data/construction algorithms can lead to trees/networks that look different ▪ Quantifying this difference can improve evolutionary inferences ESA 2017 Given two rooted phylogenetic trees T 1 and T 2 over n species, how different are they? IWOCA 2019 Given two rooted phylogenetic networks N 1 and N 2 over n species, how different are they? ▪ How are the trees and networks created to begin with? WABI 2019 Given an input set of biological data, build a rooted phylogenetic tree that best represents it 4

Comparing Phylogenetic Trees Rooted Tree Phylogenetic Rooted Phylogenetic Tree T 1 T 2 QUESTION Given two rooted phylogenetic trees T 1 and T 2 over n species, how different are they? ▪ Tree types: rooted /unrooted, binary / arbitrary degree d ▪ Distance measures: rooted triplet distance , unrooted quartet distance, Robinson-Foulds , … 6

Rooted Triplet Distance (Trees) ▪ A rooted triplet is defined by 3 leaf labels and their induced tree topology ▪ A triplet is induced by a tree T’ if it appears as an embedded subtree in T’ Resolved triplet Fan triplet u T’ u u v v x z w z x y x | z | w xy | z x y z w Rooted Triplet Distance (Trees), Dobson [Combinatorial Mathematics III 1975] Let T 1 and T 2 be two rooted trees built on the same leaf label set Λ of size n Shared triplets = triplets that are induced by both T 1 and T 2 S ( T 1 , T 2 ) = # shared triplets ≤ n 3 Rooted triplet distance D ( T 1 , T 2 ) = n 3 − S ( T 1 , T 2 ) = # non-shared triplets 7

Rooted Triplet Distance (Trees) Rooted Triplet Distance (Trees), Dobson [Combinatorial Mathematics III 1975] Let T 1 and T 2 be two rooted trees built on the same leaf label set Λ of size n Shared triplets = triplets that are induced by both T 1 and T 2 S ( T 1 , T 2 ) = # shared triplets ≤ n 3 Rooted triplet distance D ( T 1 , T 2 ) = n 3 − S ( T 1 , T 2 ) = # non-shared triplets Example shared triplets non-shared triplets T 1 T 2 a 3 a 4 | a 5 a 1 , a 2 , a 3 a 2 , a 3 , a 5 a 3 a 4 | a 1 a 1 , a 3 , a 5 a 2 , a 4 , a 3 a 1 a 5 a 1 | a 2 | a 5 a 1 , a 2 , a 4 a 2 , a 4 , a 5 a 1 a 2 a 5 a 3 a 4 a 3 a 1 , a 4 , a 5 a 2 a 4 D ( T 1 , T 2 ) = 7 8

Previous and New Results Reference Time I/Os Space Non-Binary Trees O( n 2 ) O( n 2 ) O( n 2 ) Critchlow et al. [Sys. Biology 1996] no O( n 2 ) O( n 2 ) O( n 2 ) Bansal et al. [TCS 2011] yes Sand et al. [BMC Bioinform. 2013] O( n ∙ log 2 n ) O( n ∙ log 2 n ) O( n ) no Brodal et al. [SODA 2013] O( n ∙ log n ) O( n ∙ log n ) O( n ∙ log n ) yes O( n ∙ log 3 n ) O( n ∙ log 3 n ) Jansson & Rajaby [JCB 2017] O( n ∙ log n ) yes new [ESA 2017] O( n ∙l og n ) O( n / B ∙ log 2 ( n / M )) O( n ) yes Implementation available ▪ All previous solutions rely heavily on random memory access o Penalized by cache performance o Do not scale to external memory ▪ The new algorithms rely on scanning continuous chunks of memory o Scanning s elements requires O( s / B ) I/Os in the cache oblivious model B B B B B B s o Scale to external memory 9

Previous Approaches – Quadratic Algorithm ▪ Basis for all O( n ∙ polylog n ) results: O( n 2 ) algorithm for binary trees in [BMC Bioinform. 2013] T 1 T 2 arbitrary arbitrary height height (anchor) v u (anchor) s ( u ) = { xy | z , …} 1 2 3 … x y z n-1 n z y x 9 n - 4 2 … 3 7 ▪ Every triplet with leaves x , y , and z is anchored in LCA ( x , y , z ) (anchor node) ▪ s ( u ): set containing all triplets anchored in u ▪ S ( T 1 , T 2 ) = σ u ∈ T 1 σ v ∈ T 2 | s ( u ) ∩ s ( v )| T 1 T 2 arbitrary arbitrary u v height height r l 1 2 3 … n-1 n 9 n - 4 2 … 3 7 | s ( u ) ∩ s ( v )| = l red r blue + l blue r red + r red l blue + r blue l red 2 2 2 2 10

Previous Approaches – Subquadratic Algorithms Hierarchical arbitrary arbitrary v T 1 T 2 decomposition height height u height v HDT ( T 2 ) O(log n ) 1 2 3 … n-1 n x y z z x y 9 n- 4 2 … 3 7 9 n- 4 2 … z x y 3 7 ▪ For u ∈ T 1 the HDT ( T 2 ) maintains σ v ∈ T 2 | s ( u ) ∩ s ( v )| ▪ Each leaf color change in T 1 yields an update to HDT ( T 2 ) Θ( n log n ) updates, with each update corresponding to a leaf to root path Bad I/O performance traversal of HDT ( T 2 ) Reference Time HDT ( T 2 ) O( n ∙ log 2 n ) Sand et al. [BMC Bioinform. 2013] Static Brodal et al. [SODA 2013] O( n ∙ log n ) Dynamic/Contraction Static O( n ∙ log 3 n ) Jansson & Rajaby [JCB 2017] (heavy-light decomposition) 11

The New Algorithm for Binary Trees (ESA 2017) ▪ New order of visiting nodes of T 1 based on DFS traversal of an HDT ( T 1 ) ▪ HDT ( T 1 ) = modified centroid decomposition LCA(x,c’) T 1 T 1 x c c ≤ s s c’ 2 ≤ s ≤ s 2 2 ▪ Lemma 2 height( HDT ( T 1 )) ≤ 2 + 2∙log s = O(log n ) T 1 u 3 HDT ( T 1 ) height u u O(log n ) u 1 u 2 u 1 u 3 u 2 ▪ Order to visit the nodes in T 1 : DFS traversal of HDT ( T 1 ), where the children of a node u are visited from left to right 12

The New Algorithm for Binary Trees (ESA 2017) T 1 HDT ( T 1 ) u height u O(log n ) C u Contract T 2 T 2 T 2 ( u ) For every node u in HDT ( T 1 ) we scan T 2 ( u ) to count σ v ∈ T 2 | s ( u ) ∩ s ( v )| Size O(| C u |) ▪ RAM model: O( n ) time per level of HDT ( T 1 ) → O( n ∙log n ) ▪ To scale to external memory: store every component/contracted tree in memory following a proper layout such that scanning a component/contracted tree of size s takes O( s / B ) I/Os 13

The New Algorithm for General Trees (ESA 2017) 1. Anchor triplets in edges instead of nodes 2. Capture triplets with 4 colors T 1 u O( n 2 ) k k c c z x y w z x y w z 3. Transform T 1 into a binary tree b ( T 1 ) w b ( T 1 ) T 1 O( n ∙ log n ) k c c z x y w z z x y w z 14

RAM Experiments – Time Performance [JCB 2017] [SODA 2013] [JCB 2017] [SODA 2013] new new Binary trees General trees seconds/ n seconds/ n log 2 n log 2 n Source code: https://github.com/kmampent/CacheTD 15

I/O Experiments – Time Performance Binary Trees General Trees n [JCB 2017] [SODA 2013] New n [JCB 2017] [SODA 2013] New Previous best Previous best 2 15 2 15 1s 1s 1s 1s 1s 1s 2 16 2 16 1s 2s 1s 1s 1s 1s 2 17 2 17 1s 4s 1s 1s 3s 1s 2 18 2 18 2s 1m:03s 1s 3s 7s 1s 2 19 2 19 4s 1h:21m 1s 7s 5m:20s 1s 2 20 2 20 9s ≥ 10h 1s 3m:43s ≥ 10h 2s 2 21 2 21 13m:12s 3s ≥ 10h 20s 2 22 2 22 ≥ 10h 9s 2m:02s 2 23 2 23 3m:37s 10m:42s 2 24 2 24 10m:35s 42m:06s Source code: https://github.com/kmampent/CacheTD 16

Rooted Phylogenetic Networks Rooted Tree Phylogenetic Rooted Phylogenetic Network (DAG) Reticulation vertices An “example” of a hybrid animal 18

Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD - PowerPoint PPT Presentation

Comparison and Construction of Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus, Denmark 24 October 2019 1 Publications Gerth Stlting Brodal and Konstantinos Mampentzidis. Cache Oblivious

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Spaces of phylogenetic networks Jonathan Klawitter PhD Exam 5th March, 2020 2 - 1

Phylogenetic Networks Networks Phylogenetic Daniel H. Huson Daniel H. Huson www-

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

CSCE 471/871 Lecture 5: Building Phylogenetic Trees Building trees from pairwise distances

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Balance indices for phylogenetic trees under well-known probability models Universitat de les

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

GPU computing and the tree of life Michael P . Cummings Center for Bioinformatics and

Limit Laws for the Number of Groups formed by Social Animals under the Extra Clustering Model

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth

Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grnewald

The Biology of Amphibians Mark Mandica Executive Director The Amphibian Foundation

Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik University of Genoa Algebraic

The journey of a tropical geometer through four countries Mar a Ang elica Cueto