Evolutionary Analysis From trees to networks Dr. Taoyang Wu School - PowerPoint PPT Presentation

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Degrees ◮ G NNI ( n ) is regular with degree 2( n − 3); (Robinson 1971) ◮ G SPR ( n ) is regular with degree 2( n − 3)(2 n − 7); (Allen&Steel 2001) ◮ G TBR ( n ) is not regular, the maximal degree is obtained by caterpillar trees. (Humphries, 2008) T. Wu Evolutionary Analysis

Degrees ◮ G NNI ( n ) is regular with degree 2( n − 3); (Robinson 1971) ◮ G SPR ( n ) is regular with degree 2( n − 3)(2 n − 7); (Allen&Steel 2001) ◮ G TBR ( n ) is not regular, the maximal degree is obtained by caterpillar trees. (Humphries, 2008) � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Figure: A caterpillar tree T. Wu Evolutionary Analysis

Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) T. Wu Evolutionary Analysis

Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) with � Γ( T ) := dist T ( u , v ) { u , v }⊆ L ( T ) denoting the sume of the distance between all leaves of T. T. Wu Evolutionary Analysis

Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) with � Γ( T ) := dist T ( u , v ) { u , v }⊆ L ( T ) denoting the sume of the distance between all leaves of T. For the vertices in G TBR ( n ): ◮ Maximal degree: Caterpillar Trees ◮ Minimal degree: Semi-regular Trees (see, also, [Szekely-Wang-W, DM 2011]) T. Wu Evolutionary Analysis

A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. T. Wu Evolutionary Analysis

A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if T. Wu Evolutionary Analysis

A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if ◮ they delete different edges in the bisection step, or T. Wu Evolutionary Analysis

A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if ◮ they delete different edges in the bisection step, or ◮ they use different edges in the reconnection step. T. Wu Evolutionary Analysis

The PDA model ◮ The number of trees in T n is ϕ ( n ) := (2 n − 3)!! = 1 · 3 · · · (2 n − 3) T. Wu Evolutionary Analysis

The PDA model ◮ The number of trees in T n is ϕ ( n ) := (2 n − 3)!! = 1 · 3 · · · (2 n − 3) ◮ Under the proportional to distinguishable arrangements (PDA) model, each tree has the same probability to be generated, that is, we have 1 P u ( T ) = (1) ϕ ( n ) for every T in T n . T. Wu Evolutionary Analysis

The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. T. Wu Evolutionary Analysis

The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. T. Wu Evolutionary Analysis

The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. ◮ After obtaining an unlabeled tree with n leaves, we label each of its leaves with a label sampled randomly uniformly (without replacement) from { 1 , · · · , n } . T. Wu Evolutionary Analysis

The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. ◮ After obtaining an unlabeled tree with n leaves, we label each of its leaves with a label sampled randomly uniformly (without replacement) from { 1 , · · · , n } . When branch lengths are ignored, the Yule–Harding model is shown [Aldous,1996] to be equivalent to the trees generated by Kingman’s coalescent process, and so we call it the YHK model. T. Wu Evolutionary Analysis

Subtree Pattern ◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves T. Wu Evolutionary Analysis

Subtree Pattern ◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves Figure: A tree with three cherries and one pitchfork. T. Wu Evolutionary Analysis

Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. T. Wu Evolutionary Analysis

Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. For n ≥ 2, consider the random variables ◮ A n : the number of pitchforks in a random tree; ◮ C n : the number of cherries in a random tree. T. Wu Evolutionary Analysis

Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. For n ≥ 2, consider the random variables ◮ A n : the number of pitchforks in a random tree; ◮ C n : the number of cherries in a random tree. What are the joint distributions of A n and C n ? T. Wu Evolutionary Analysis

Joint distributions: formulae Theorem (W-Choi, 2016) For n > 3 and 1 < b < n, we have P y ( A n +1 = a , C n +1 = b ) = 2 a n P y ( A n = a , C n = b ) + ( a + 1) P y ( A n = a + 1 , C n = b − 1) n + 2( b − a + 1) P y ( A n = a − 1 , C n = b ) n + ( n − a − 2 b + 2) P y ( A n = a , C n = b − 1) . n T. Wu Evolutionary Analysis

Joint distributions: formulae Theorem (W-Choi, 2016) For n > 3 and 1 < b < n, we have P y ( A n +1 = a , C n +1 = b ) = 2 a n P y ( A n = a , C n = b ) + ( a + 1) P y ( A n = a + 1 , C n = b − 1) n + 2( b − a + 1) P y ( A n = a − 1 , C n = b ) n + ( n − a − 2 b + 2) P y ( A n = a , C n = b − 1) . n Note: A similar formula for the PDA model. T. Wu Evolutionary Analysis

Statistical properties ◮ A dynamic approach to computing the joint distributions. T. Wu Evolutionary Analysis

Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. T. Wu Evolutionary Analysis

Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) T. Wu Evolutionary Analysis

Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) ◮ There exists a unique change point for the cherry distributions between the YHK and the PDA models. T. Wu Evolutionary Analysis

Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) ◮ There exists a unique change point for the cherry distributions between the YHK and the PDA models. ◮ Similar results for clade sizes and clan sizes [Zhu-Than-W, 2015]. T. Wu Evolutionary Analysis

Part III: Phylogenetic Networks T. Wu Evolutionary Analysis

The tangled tree of life T. Wu Evolutionary Analysis

From trees to networks Phylogenetic tree is useful, but networks provide a better tool for studying ◮ conflicting signals ◮ recombination ◮ gene flow ◮ hybridization ◮ horizontal gene transfer ◮ · · · T. Wu Evolutionary Analysis

Phylogenetic Networks: Unrooted (11) (11) (3) (3) (4) (4) (7) (1) (12) (1) (6) (5) (15) (9) (8) (10) (10) (13) (7) (2) (14) (14) (2) (8) (6) (13) (5) (12) (9) (15) Figure: A phylogenetic tree and network relating 15 plants species from the genus Solanum ; from [Bastkowski-Moulton-Spillner-Wu, 2015, Bull. Math. Biol. ] T. Wu Evolutionary Analysis

Network thinking: pedigree Figure: A partial pedigree of Prince Charles; from [Gusfield, 2014]. T. Wu Evolutionary Analysis

Recombination Figure: A history with recombination; from [Gusfield, 2014]. T. Wu Evolutionary Analysis

Phylogenetic Networks A (rooted) phylogenetic network: ◮ a directed acyclic graph ◮ a unique root ◮ leaves are labelled by taxa ◮ no vertex with one parent and one child ◮ binary A central problem: How to reconstruct phylogenetic networks? T. Wu Evolutionary Analysis

Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees T. Wu Evolutionary Analysis

Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees ◮ A tree is encoded by its subtrees on three leaves. T. Wu Evolutionary Analysis

Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees ◮ A tree is encoded by its subtrees on three leaves. ◮ A polynomial algorithm to assemble trees [Aho et al. 1981]. T. Wu Evolutionary Analysis

A Quiz! Question: Are networks encoded by their trees? T. Wu Evolutionary Analysis

A Quiz! Question: Are networks encoded by their trees? ρ ρ ρ N T 2 T 1 a b c a b c a b c T. Wu Evolutionary Analysis

Answer Question: Are networks encoded by their trees? ρ ρ ρ ρ N ′ N T 2 T 1 a b c a b c a b c a b c Answer: No. T. Wu Evolutionary Analysis

Another quiz! Question: Are networks encoded by their subnetworks? T. Wu Evolutionary Analysis

Another quiz! Question: Are networks encoded by their subnetworks? f e c f e d c b a f e d c b a Figure: An example of subnetwork. T. Wu Evolutionary Analysis

A nontrivial answer Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol. ) For every n ≥ 3 , there exist two non-isomorphic phylogenetic networks N 1 and N 2 with n leaves such that they display the same set of subnetworks (and the same set of trees). T. Wu Evolutionary Analysis

A nontrivial answer Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol. ) For every n ≥ 3 , there exist two non-isomorphic phylogenetic networks N 1 and N 2 with n leaves such that they display the same set of subnetworks (and the same set of trees). a b c d a b c d T. Wu Evolutionary Analysis

Level-1 networks In [Huber-Moulton, 2013, Algorithmica ], it is shown that level-1 networks are encoded by their subnetworks. a c b e g f h d i j N Figure: level-1 = all undirected cycles are disjoint T. Wu Evolutionary Analysis

Trinets z x y x z x y z x y z y T 1 ( x, y ; z ) N 1 ( x, y ; z ) N 2 ( x, y ; z ) S 1 ( x, y ; z ) z x y x x y z y x y z z N 5 ( x ; y ; z ) N 3 ( x ; y ; z ) N 4 ( x ; y ; z ) S 2 ( x ; y ; z ) Figure: Eight types of level-1 networks on three leaves. T. Wu Evolutionary Analysis

Assembling Trinets Input: A collection of trinets. c a e d a b c c f Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- b c ing the collection of trinets. h e g e f g i Input trinets T. Wu Evolutionary Analysis

Assembling Trinets Input: A collection of trinets. c a e d a b c c f Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- b c ing the collection of trinets. h e g (2)Construct such a network if e f g i it exists. Input trinets T. Wu Evolutionary Analysis

Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; T. Wu Evolutionary Analysis

Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O (3 n poly ( n )) algorithm. T. Wu Evolutionary Analysis

Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O (3 n poly ( n )) algorithm. Question: How about ’real data’ (often noisy and containing conflict signals)? T. Wu Evolutionary Analysis

Trilonet ATCGTCATTCCGG a h ATCGTCATTCCGG c b ATGGTCAATCTGG a e d i ATGGTCAATCTGG a b c c c ATGGTCAATGTCC f ATGGTCAATGTCC j b h h ATCGTCATTCCGG e g e f g i i ATGGTCAATCTGG j j ATGGTCAATGTCC h A dense set of trinets An alignment on X = { a, . . . , j } i Identify a suitable subst of taxa a a y ∗ b c b c e g e g f h d d f h i j j i N Figure: A schematic view of Tri net-based L evel O ne Net work reconstructor, from [Oldman ∗ -Wu ∗ -Iersel-Moutlon, in revision for MBE]. T. Wu Evolutionary Analysis

Trilonet: a case study Giardia_lamblia_ATCC_50803_WB Giardia_intestinalis_isolate_246 Giardia_intestinalis_isolate_303 Giardia_intestinalis_isolate_305 Giardia_intestinalis_isolate_55 Giardia_intestinalis_isolate_JH #H1 Giardia_intestinalis_isolate_335 Figure: The inferred phylogeny of 7 Giardia strains by Trilonet; data from [Cooper et al, Curr. Biol., 2007]. T. Wu Evolutionary Analysis

Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. T. Wu Evolutionary Analysis

Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at https://www.uea.ac.uk/computing/trilonet ◮ Consistent. T. Wu Evolutionary Analysis

Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at https://www.uea.ac.uk/computing/trilonet ◮ Consistent. Future improvement includes ◮ level-k networks ◮ statistical consistency T. Wu Evolutionary Analysis

Part IV: Future Directions T. Wu Evolutionary Analysis

Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes T. Wu Evolutionary Analysis

Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) T. Wu Evolutionary Analysis

Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) ◮ Accounting for non-tree like patterns resulted from ◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS) T. Wu Evolutionary Analysis

Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) ◮ Accounting for non-tree like patterns resulted from ◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS) ◮ Efficient algorithms for searching the network space T. Wu Evolutionary Analysis

Space of phylogenetic networks c b c c d b a d a a b d a c a d a b b c c d b d a c a b a b a c a d a d c c b d d d b d b c b c c d b b c b d d c a a a Figure: Space of level-1 networks with four taxa; from [Huber-Linz-Moulton-Wu, J. Math. Biol., 2016] T. Wu Evolutionary Analysis

Network operation v 1 v 4 v 1 v 4 A C A C v 1 v 1 v 3 v 5 v 5 v 2 v 3 v 6 v 6 v 2 v 4 v 4 v 3 v 3 v 2 v 2 B D B D T ′ N ′ T N (i) (ii) Figure: A generalisation of the NNI operation on networks. T. Wu Evolutionary Analysis

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School - PowerPoint PPT Presentation

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School of Computing Sciences, University of East Anglia Shanghai Jiao Tong University August 2016 T. Wu Evolutionary Analysis Research interests Discrete Mathematics

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &

1 20 July 2007 CERN Seminar 2 Introduction to evolutionary computation Evolutionary

Evolutionary Systems Companion slides for the book Bio-Inspired Artificial Intelligence: Theories,

GP & Push slides for Tom Evolutionary Computation Genetic Programming Evolutionary

Evolutionary Game Theory and Iterated Prisoners Dilemma Jiawei Li Research fellow, ASAP group

Gut Health Data from 3 months of parasite treatment What did I do? 2014 - uBiome #1 (baseline -

CEE/EHS 597B Meeting #2: Treatment for Small Water Systems Dave Reckhow David Reckhow CEE/EHS

The socio-economic impact of large-scale research infrastructures: LHC and CNAO Massimo Florio

Narrative hierarchy Semester projects The Plan The Plan Principles of Complex Systems

CS 486/686 Lecture 9 Probabilities has occurred, it would surely be on the radio news. 2. What is

Learning Goals 1 Practice Questions 1 3 2 The Holmes scenario 2 1 Learning Goals 1

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Small Parsimony Genome 559: Introduction to Statistical and Computational Genomics

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School - PowerPoint PPT Presentation

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School of Computing Sciences, University of East Anglia Shanghai Jiao Tong University August 2016 T. Wu Evolutionary Analysis Research interests Discrete Mathematics

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky &amp; Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &amp;

1 20 July 2007 CERN Seminar 2 Introduction to evolutionary computation Evolutionary

Evolutionary Systems Companion slides for the book Bio-Inspired Artificial Intelligence: Theories,

GP &amp; Push slides for Tom Evolutionary Computation Genetic Programming Evolutionary

Evolutionary Game Theory and Iterated Prisoners Dilemma Jiawei Li Research fellow, ASAP group

Gut Health Data from 3 months of parasite treatment What did I do? 2014 - uBiome #1 (baseline -

CEE/EHS 597B Meeting #2: Treatment for Small Water Systems Dave Reckhow David Reckhow CEE/EHS

The socio-economic impact of large-scale research infrastructures: LHC and CNAO Massimo Florio

Narrative hierarchy Semester projects The Plan The Plan Principles of Complex Systems

CS 486/686 Lecture 9 Probabilities has occurred, it would surely be on the radio news. 2. What is

Learning Goals 1 Practice Questions 1 3 2 The Holmes scenario 2 1 Learning Goals 1

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Small Parsimony Genome 559: Introduction to Statistical and Computational Genomics

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &

GP & Push slides for Tom Evolutionary Computation Genetic Programming Evolutionary