PhyloGeoTool Exploring large-scale phylogenies in an - PowerPoint PPT Presentation

PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016

Background • Large-scale databases of clinical and demographical information • Opportunities for surveillance for epidemics and outbreak of viral pathogens • Tracking of individual variants with specific characteristics e.g. risk group, drug resistance, … can elaborate their relation to geographic or phylogenetic spread • Computational and methodological possible to infer large phylogenies Fig. 1 Circular tree representation of the dataset

Problems • Efficient visual navigation of these phylogenies in current stand-alone tree viewers is challenging • Characterization of the complementing virus and patient data, associated with sequence clusters, requires adaptation of metadata • Fast and accurate placement of novel sequence data in an existing phylogenetic without reconstructing the phylogeny Fig. 2 Radial tree representation of the dataset

Objectives • Automatic partitioning of a phylogeny in a defined number of clusters • Design of a GUI to provide a concise visualization of the tree of clusters on each different level that also shows their respective position within the entire phylogeny • Represent a summary of different attributes at each partitioning step of the phylogenetic tree. The summary is shown in a histogram while any geographical data is represented within a map • Support for the placement of novel data into the phylogeny without the need for recalculating the whole phylogeny and its intrinsic cluster calculations

Full view of the tool î Fig. 3 Full view of the phylogeotool when hovered over a node

PhyloGeoTool 0.04 Fig. 4 Radial colored tree representation of the dataset Fig. 5 Circular clustered tree representation of the dataset

Investigate cluster 0.04 Fig. 6 Radial colored tree representation of a specific cluster Fig. 7 Circular clustered tree representation of a specific cluster

Investigate cluster 0.04 Fig. 8 Radial colored tree representation of the dataset Fig. 9 Circular clustered tree representation of the dataset

Extra information on each cluster • More detailed information of each cluster • Link tree to csv file • Each column is read as a different attribute • Geographical information (if available) is shown on the world map • Users can add extra information to the csv file themselves

Sample csv file Fig. 14 Sample CSV file with attributes “Year of Birth”, “Gender”, “Country of origin (en)”, “Country of origin (iso), “Ethnic Group” and “Risk Group”

Representation in the tool Fig. 15 Representation of the sample CSV file as summarized data in a histogram

How to cluster (1)? • Start from a rooted tree • Top down iterative clustering approach 1. Take root node (A) of the biggest cluster (root node from tree in case no clusters have been defined yet) 2. Replace biggest cluster by: o Cluster 1 with root node B, which is the first child of A o Cluster 2 with root node C, which is the second child of A 3. In case required amount of clusters hasn’t been reached, go to step 1 and repeat

Starting tree Fig. 16 Phylogenetic tree representation of a random sample dataset with 20 sequences

K = 2 Fig. 17 Visual representation of the sample phylogenetic tree for a clustering with k=2

How to cluster (2)? • Minimizing intra-cluster distances • Maximizing inter-cluster distances • Subtype Diversity Ratio, SDR (Archer et al. , Bioinformatics, 2007) • Ratio of the mean intra-cluster pairwise distance to the mean inter-cluster pairwise distance (Rambaut et al., Nature, 2001) • The clustering with the lowest SDR is the best • Distances taken directly from the phylogenetic tree

Which clustering is the best • Cluster for k=2 to k=50 where k is the number of clusters • For each k, calculate the SDR score • Clustering with lowest SDR value is best clustering • Problem: More clusters mostly means a better clustering as the individual points are grouped in a better way (thus lower SDR). • The aim is to find the balance between the amount of clusters and the best clusters • The second derivative is used to find the biggest drop in SDR value

Future perspectives • Integrated as web-application into EuResist Integrated Data Base (EIDB), • Phylogenetic placement (using the PPlacer software) • ….

Acknowledgements Clinical and Epidemiological Virology, KU LEUVEN Pieter Libin*, Ewout Vanden Eynden*, Anne-Mieke Vandamme, Kristof Theys Computational Evolutionary Virology, KU LEUVEN Guy Baele* Artificial Intelligence Lab, VUB Pieter Libin, Ann Nowe EuResist network and the European HIV coreceptor study panel (eucohiv) VIROGENESIS receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650

PhyloGeoTool Exploring large-scale phylogenies in an - PowerPoint PPT Presentation

PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016 Background Large-scale databases of clinical and demographical

EuResist update Francesca Incardona Arevir-EuResist International meeting Cologne 8-9 May 2015

* Jino KWON Korea Forest Research Institute Korea Forest Service 1 * Overview Overview

Family-joining: A method for constructing generally labeled trees Prabhav Kalaghatgi Max Planck

Docker : devops, shared registries, HPC and emerging use cases Franois Moreews & Olivier

Introduction to Linux Francisco Salavert Torres February 29th, 2016 1 What is GNU/Linux?

Improving the Rectification of Spectral Images Linda Dressel, Paul Barrett, Paul Goudfrooij, and

958A Compact Housing Linear Transducer www.ametekfactoryautomation.com June 2018 1 1 .

HV302 Hands-on Training HV302 Controls Scan Button Rotation Button Scan Button Rotation Button

Michael Peat, Kollin Moore, Matt Rich, Alex Reifert Advisors: Dr. Nicola Elia and Dr. Phillip

February 26, 2015 David P. Shiells, P.E. District Materials Engineer Why Do we Need a Uniform

Helping hands Final presentation Attila van Dijk 4045157 contents -Context -Research and

CHARACTERIZATION OF MULTISTACK CIGS, (CUGA/IN/SE), ON MO COATED SLG BY USING THE IN-LINE

Tyres and Roads: Predicting Friction P. Gruber, E. Fina University of Surrey, Guildford, United

Th The Effect of f Contact Roller Topography on Cleanability in in Roll-to to-Roll l

Simula'ng Infrared Transmission Through a Porous Dielectric Foam

Suyun Ham and John S. Popovics The University of Illinois at Urbana-Champaign 2014 International

Simple Measurements & Buoyancy Force 1 st year physics laboratories University of Ottawa

Overset Grids in STAR-CCM+: Methodology, Applications and Future Developments Eberhard Schreck and

E-Invoicing & New GST Returns 12 th March 2020 Chartered Accountants E - INVOICING

Introduction to Matlab with mathematical and engineering applications Hanumant Singh Shekhawat

Innovating EMDR Group Consultation for initial EMDR Training: a novel Round Robin

Experimental Evaluation of Multipath TCP Schedulers Christoph Paasch 1 , Simone Ferlin 2 , zg

July 8, 2019 h-gac.com h-gac.com Serving Today Planning for Tomorrow Serving Today

Network Topology-aware Traffic Scheduling Emin Gabrielyan cole Polytechnique Fdrale de

Sambuz

Useful Links

Newsletter

Mail Us

PhyloGeoTool Exploring large-scale phylogenies in an - PowerPoint PPT Presentation

PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016 Background Large-scale databases of clinical and demographical

EuResist update Francesca Incardona Arevir-EuResist International meeting Cologne 8-9 May 2015

* Jino KWON Korea Forest Research Institute Korea Forest Service 1 * Overview Overview

Family-joining: A method for constructing generally labeled trees Prabhav Kalaghatgi Max Planck

Docker : devops, shared registries, HPC and emerging use cases Franois Moreews &amp; Olivier

Introduction to Linux Francisco Salavert Torres February 29th, 2016 1 What is GNU/Linux?

Improving the Rectification of Spectral Images Linda Dressel, Paul Barrett, Paul Goudfrooij, and

958A Compact Housing Linear Transducer www.ametekfactoryautomation.com June 2018 1 1 .

HV302 Hands-on Training HV302 Controls Scan Button Rotation Button Scan Button Rotation Button

Michael Peat, Kollin Moore, Matt Rich, Alex Reifert Advisors: Dr. Nicola Elia and Dr. Phillip

February 26, 2015 David P. Shiells, P.E. District Materials Engineer Why Do we Need a Uniform

Helping hands Final presentation Attila van Dijk 4045157 contents -Context -Research and

CHARACTERIZATION OF MULTISTACK CIGS, (CUGA/IN/SE), ON MO COATED SLG BY USING THE IN-LINE

Tyres and Roads: Predicting Friction P. Gruber, E. Fina University of Surrey, Guildford, United

Th The Effect of f Contact Roller Topography on Cleanability in in Roll-to to-Roll l

Simula'ng Infrared Transmission Through a Porous Dielectric Foam

Suyun Ham and John S. Popovics The University of Illinois at Urbana-Champaign 2014 International

Simple Measurements &amp; Buoyancy Force 1 st year physics laboratories University of Ottawa

Overset Grids in STAR-CCM+: Methodology, Applications and Future Developments Eberhard Schreck and

E-Invoicing &amp; New GST Returns 12 th March 2020 Chartered Accountants E - INVOICING

Introduction to Matlab with mathematical and engineering applications Hanumant Singh Shekhawat

Innovating EMDR Group Consultation for initial EMDR Training: a novel Round Robin

Experimental Evaluation of Multipath TCP Schedulers Christoph Paasch 1 , Simone Ferlin 2 , zg

July 8, 2019 h-gac.com h-gac.com Serving Today Planning for Tomorrow Serving Today

Network Topology-aware Traffic Scheduling Emin Gabrielyan cole Polytechnique Fdrale de

Sambuz

Useful Links

Newsletter

Mail Us

Docker : devops, shared registries, HPC and emerging use cases Franois Moreews & Olivier

Simple Measurements & Buoyancy Force 1 st year physics laboratories University of Ottawa

E-Invoicing & New GST Returns 12 th March 2020 Chartered Accountants E - INVOICING