PhyloGeoTool Exploring large-scale phylogenies in an - - PowerPoint PPT Presentation

phylogeotool
SMART_READER_LITE
LIVE PREVIEW

PhyloGeoTool Exploring large-scale phylogenies in an - - PowerPoint PPT Presentation

PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016 Background Large-scale databases of clinical and demographical


slide-1
SLIDE 1

PhyloGeoTool

Exploring large-scale phylogenies in an epidemiological context

Arevir Meeting April 29th, 2016 Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega.

slide-2
SLIDE 2

Background

  • Fig. 1 Circular tree representation of the dataset
  • Large-scale databases of clinical and

demographical information

  • Opportunities for surveillance for epidemics and
  • utbreak of viral pathogens
  • Tracking of individual variants with specific

characteristics e.g. risk group, drug resistance, … can elaborate their relation to geographic or phylogenetic spread

  • Computational and methodological possible to infer

large phylogenies

slide-3
SLIDE 3

Problems

  • Fig. 2 Radial tree representation of the dataset
  • Efficient visual navigation of these phylogenies in

current stand-alone tree viewers is challenging

  • Characterization of the complementing virus and

patient data, associated with sequence clusters, requires adaptation of metadata

  • Fast and accurate placement of novel sequence

data in an existing phylogenetic without reconstructing the phylogeny

slide-4
SLIDE 4

Objectives

  • Automatic partitioning of a phylogeny in a defined number of clusters
  • Design of a GUI to provide a concise visualization of the tree of clusters on

each different level that also shows their respective position within the entire phylogeny

  • Represent a summary of different attributes at each partitioning step of the

phylogenetic tree. The summary is shown in a histogram while any geographical data is represented within a map

  • Support for the placement of novel data into the phylogeny without the need

for recalculating the whole phylogeny and its intrinsic cluster calculations

slide-5
SLIDE 5

Full view of the tool

î

  • Fig. 3 Full view of the phylogeotool when hovered over a node
slide-6
SLIDE 6

0.04

PhyloGeoTool

  • Fig. 4 Radial colored tree representation of the dataset
  • Fig. 5 Circular clustered tree representation of the dataset
slide-7
SLIDE 7

Investigate cluster

  • Fig. 7 Circular clustered tree representation of a specific cluster
  • Fig. 6 Radial colored tree representation of a specific cluster

0.04

slide-8
SLIDE 8

Investigate cluster

  • Fig. 9 Circular clustered tree representation of the dataset
  • Fig. 8 Radial colored tree representation of the dataset

0.04

slide-9
SLIDE 9

Investigate cluster

  • Fig. 11 Circular clustered tree representation of the dataset
  • Fig. 10 Radial colored tree representation of the dataset

0.04

slide-10
SLIDE 10

Investigate cluster

  • Fig. 13 Circular clustered tree representation of the dataset
  • Fig. 12 Radial colored tree representation of the dataset

0.04

slide-11
SLIDE 11

Extra information on each cluster

  • More detailed information of each cluster
  • Link tree to csv file
  • Each column is read as a different attribute
  • Geographical information (if available) is shown on the

world map

  • Users can add extra information to the csv file

themselves

slide-12
SLIDE 12

Sample csv file

  • Fig. 14 Sample CSV file with attributes “Year of Birth”, “Gender”, “Country of origin (en)”, “Country of origin (iso), “Ethnic Group” and “Risk Group”
slide-13
SLIDE 13

Representation in the tool

  • Fig. 15 Representation of the sample CSV file as summarized data in a histogram
slide-14
SLIDE 14

How to cluster (1)?

  • Start from a rooted tree
  • Top down iterative clustering approach
  • 1. Take root node (A) of the biggest cluster (root node from

tree in case no clusters have been defined yet)

  • 2. Replace biggest cluster by:
  • Cluster 1 with root node B, which is the first child of A
  • Cluster 2 with root node C, which is the second child of

A

  • 3. In case required amount of clusters hasn’t been reached,

go to step 1 and repeat

slide-15
SLIDE 15

Starting tree

  • Fig. 16 Phylogenetic tree representation of a random sample dataset with 20 sequences
slide-16
SLIDE 16

K = 2

  • Fig. 17 Visual representation of the sample phylogenetic tree for a clustering with k=2
slide-17
SLIDE 17

K = 3

  • Fig. 18 Visual representation of the sample phylogenetic tree for a clustering with k=3
slide-18
SLIDE 18

K = 4

  • Fig. 19 Visual representation of the sample phylogenetic tree for a clustering with k=4
slide-19
SLIDE 19

How to cluster (2)?

  • Minimizing intra-cluster distances
  • Maximizing inter-cluster distances
  • Subtype Diversity Ratio, SDR (Archer et al., Bioinformatics,

2007)

  • Ratio of the mean intra-cluster pairwise distance to the

mean inter-cluster pairwise distance (Rambaut et al., Nature, 2001)

  • The clustering with the lowest SDR is the best
  • Distances taken directly from the phylogenetic tree
slide-20
SLIDE 20

Which clustering is the best

  • Cluster for k=2 to k=50 where k is the number of clusters
  • For each k, calculate the SDR score
  • Clustering with lowest SDR value is best clustering
  • Problem: More clusters mostly means a better clustering

as the individual points are grouped in a better way (thus lower SDR).

  • The aim is to find the balance between the amount of

clusters and the best clusters

  • The second derivative is used to find the biggest drop in

SDR value

slide-21
SLIDE 21

Future perspectives

  • Integrated as web-application into EuResist Integrated

Data Base (EIDB),

  • Phylogenetic placement (using the PPlacer software)
  • ….
slide-22
SLIDE 22

Acknowledgements

Clinical and Epidemiological Virology, KU LEUVEN Pieter Libin*, Ewout Vanden Eynden*, Anne-Mieke Vandamme, Kristof Theys Computational Evolutionary Virology, KU LEUVEN Guy Baele* Artificial Intelligence Lab, VUB Pieter Libin, Ann Nowe EuResist network and the European HIV coreceptor study panel (eucohiv) VIROGENESIS receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650

slide-23
SLIDE 23

Demo