Evolutionary Analysis From trees to networks Dr. Taoyang Wu School - - PowerPoint PPT Presentation

evolutionary analysis
SMART_READER_LITE
LIVE PREVIEW

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School - - PowerPoint PPT Presentation

Evolutionary Analysis From trees to networks Dr. Taoyang Wu School of Computing Sciences, University of East Anglia Shanghai Jiao Tong University August 2016 T. Wu Evolutionary Analysis Research interests Discrete Mathematics


slide-1
SLIDE 1

Evolutionary Analysis

From trees to networks

  • Dr. Taoyang Wu

School of Computing Sciences, University of East Anglia

Shanghai Jiao Tong University August 2016

  • T. Wu

Evolutionary Analysis

slide-2
SLIDE 2

Research interests

◮ Discrete Mathematics

◮ Optimal realisations (DM 2012 & 2015, DAM 2013) ◮ Trees and graphs (JDA 2009, DM 2011, SIAM DM 2014) ◮ Distance problems for permutation groups (DM 2009 & 2010)

◮ Real-world networks: Protein-Protein Interaction

◮ Inferring PPI evolution (TCBB 2013) ◮ Modelling PPI networks (TCS 2013)

◮ Phylogenetics

◮ Tree space (TCBB 2013, BMB 2014, AiAM 2015) ◮ Tree reconciliation (BMC Bioinfor. 2011, COCOA 2013) ◮ Tree shape statistics (TPB 2016; JMB 2015) ◮ Phylogenetic networks (SB 2015; MBE 2016; JMB 2016;

Algorithmica 2016)

  • T. Wu

Evolutionary Analysis

slide-3
SLIDE 3

Outline

1 Introduction 2 Phylogenetic Trees

◮ Tree inference ◮ Combinatorial properties ◮ Statistical properties

  • T. Wu

Evolutionary Analysis

slide-4
SLIDE 4

Outline

1 Introduction 2 Phylogenetic Trees

◮ Tree inference ◮ Combinatorial properties ◮ Statistical properties

3 Phylogenetic Networks

◮ Information bottleneck ◮ Network reconstruction

  • T. Wu

Evolutionary Analysis

slide-5
SLIDE 5

Part I: Introduction

  • T. Wu

Evolutionary Analysis

slide-6
SLIDE 6

Phylogenetic Trees

Figure: Examples of Phylogenetic Trees

  • T. Wu

Evolutionary Analysis

slide-7
SLIDE 7

The Tree of Architecture

Figure: From A History of Architecture on the Comparative Method for the Student, Craftsman, and Amateur, 1954.

  • T. Wu

Evolutionary Analysis

slide-8
SLIDE 8

The Tree of Languages

Figure: Part of The Tree of Languages; from internet.

  • T. Wu

Evolutionary Analysis

slide-9
SLIDE 9

Tree of Flowering Plants

Figure: Molecular phylogeny for 31,749 species of seed plants; from [Zanne et al, Nature, 2014].

  • T. Wu

Evolutionary Analysis

slide-10
SLIDE 10

Deep evolution of N2-fixation

Figure: Angiosperm phylogeny of 3,467 species; from [Werner et al, Nature Communication, 2014].

  • T. Wu

Evolutionary Analysis

slide-11
SLIDE 11

Part II: Phylogenetic Trees

  • T. Wu

Evolutionary Analysis

slide-12
SLIDE 12

Phylogenetic trees

◮ Tree T = (V , E): connected, acyclic graph ◮ Semi-labelled tree: leaves are labelled. ◮ Phylogenetic tree: binary semi-labelled tree ◮ Rooted vs unrooted

  • T. Wu

Evolutionary Analysis

slide-13
SLIDE 13

Phylogenetic trees

◮ Tree T = (V , E): connected, acyclic graph ◮ Semi-labelled tree: leaves are labelled. ◮ Phylogenetic tree: binary semi-labelled tree ◮ Rooted vs unrooted ◮ Motivation: evolution relations in biology etc.

  • T. Wu

Evolutionary Analysis

slide-14
SLIDE 14

Phylogenetic trees

◮ Tree T = (V , E): connected, acyclic graph ◮ Semi-labelled tree: leaves are labelled. ◮ Phylogenetic tree: binary semi-labelled tree ◮ Rooted vs unrooted ◮ Motivation: evolution relations in biology etc.

Figure: A rooted phylogenetic tree (left) and an unrooted phylogenetic tree (right).

  • T. Wu

Evolutionary Analysis

slide-15
SLIDE 15

Tree Space

Definition

Tn and T ∗

n : the collection of rooted and unrooted phylogenetic

trees with leaf set {1, . . . , n}.

  • T. Wu

Evolutionary Analysis

slide-16
SLIDE 16

Tree Space

Definition

Tn and T ∗

n : the collection of rooted and unrooted phylogenetic

trees with leaf set {1, . . . , n}. Example:

1 2 3 1 2 3 1 2 3 1 2 3 4 4 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 5 1 2 3 4 5 5 5

Figure: A glimpse at the tree space

  • T. Wu

Evolutionary Analysis

slide-17
SLIDE 17

Tree Space

Definition

Tn and T ∗

n : the collection of rooted and unrooted phylogenetic

trees with leaf set {1, . . . , n}. Example:

1 2 3 1 2 3 1 2 3 1 2 3 4 4 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 5 1 2 3 4 5 5 5

Figure: A glimpse at the tree space

In general, we have (Schr¨

  • der,1870)

|T ∗

n | = 1 × 3 × · · · × (2n − 5) and |Tn| = 1 × 3 × · · · × (2n − 3)

  • T. Wu

Evolutionary Analysis

slide-18
SLIDE 18

A Central Challenge

Given a dataset on a taxon set X, find an optimal phylogenetic tree to explain the evolutionary relationships.

  • T. Wu

Evolutionary Analysis

slide-19
SLIDE 19

A Central Challenge

Given a dataset on a taxon set X, find an optimal phylogenetic tree to explain the evolutionary relationships. Data

◮ Morphological data (traits) ◮ Genetic data (molecular sequences)

  • T. Wu

Evolutionary Analysis

slide-20
SLIDE 20

A Central Challenge

Given a dataset on a taxon set X, find an optimal phylogenetic tree to explain the evolutionary relationships. Data

◮ Morphological data (traits) ◮ Genetic data (molecular sequences)

Criteria

◮ Maximum Parsimony: minimises the total number of

character-state changes.

  • T. Wu

Evolutionary Analysis

slide-21
SLIDE 21

A Central Challenge

Given a dataset on a taxon set X, find an optimal phylogenetic tree to explain the evolutionary relationships. Data

◮ Morphological data (traits) ◮ Genetic data (molecular sequences)

Criteria

◮ Maximum Parsimony: minimises the total number of

character-state changes.

◮ Maximum Likelihood: maximises the likelihood function. ◮ Minimum Evolution: minimise the sum of the edge length.

  • T. Wu

Evolutionary Analysis

slide-22
SLIDE 22

Computational Challenges

The tree inference problem is NP-hard for

◮ Maximum Parsimony [Foulds-Graham 1982] ◮ Maximum Likelihood [Chor-Tuller 2005] ◮ Minimum Evolution [Bastkowski-Moulton-Spillner-W 2016]

  • T. Wu

Evolutionary Analysis

slide-23
SLIDE 23

Computational Challenges

The tree inference problem is NP-hard for

◮ Maximum Parsimony [Foulds-Graham 1982] ◮ Maximum Likelihood [Chor-Tuller 2005] ◮ Minimum Evolution [Bastkowski-Moulton-Spillner-W 2016]

The search space (i.e., tree-space) is large and complicated.

  • T. Wu

Evolutionary Analysis

slide-24
SLIDE 24

Tree operations

Motivation:

  • Measuring the similarity between two trees;
  • Local searchers for heuristic algorithms.
  • T. Wu

Evolutionary Analysis

slide-25
SLIDE 25

Tree operations

Motivation:

  • Measuring the similarity between two trees;
  • Local searchers for heuristic algorithms.

Three operations:

◮ NNI (Nearest neighbour interchange) ◮ SPR (Subtree prune and regraft) ◮ TBR (Tree bisection and reconnection)

  • T. Wu

Evolutionary Analysis

slide-26
SLIDE 26

NNI

Nearest neighbour interchange (NNI)

D A B C D B C A

NNI

Figure: A schematic representation of the NNI operation

  • T. Wu

Evolutionary Analysis

slide-27
SLIDE 27

SPR

Subtree prune and regraft (SPR)

A B D

u

A B C C D

u e f SPR

Figure: A schematic representation of the SPR operation Note: All degree two vertices are suppressed.

  • T. Wu

Evolutionary Analysis

slide-28
SLIDE 28

TBR

Tree bisection and reconnection (TBR)

A B C D A B C D

e f TBR

Figure: A schematic representation of the TBR operation

  • T. Wu

Evolutionary Analysis

slide-29
SLIDE 29

TBR

Tree bisection and reconnection (TBR)

A B C D A B C D

e f TBR

Figure: A schematic representation of the TBR operation

A TBR operation consists of two steps:

◮ Bisection: deleting e;

  • T. Wu

Evolutionary Analysis

slide-30
SLIDE 30

TBR

Tree bisection and reconnection (TBR)

A B C D A B C D

e f TBR

Figure: A schematic representation of the TBR operation

A TBR operation consists of two steps:

◮ Bisection: deleting e; ◮ Reconnection: inserting f ;

  • T. Wu

Evolutionary Analysis

slide-31
SLIDE 31

TBR

Tree bisection and reconnection (TBR)

A B C D A B C D

e f TBR

Figure: A schematic representation of the TBR operation

A TBR operation consists of two steps:

◮ Bisection: deleting e; ◮ Reconnection: inserting f ;

Note: All degree two vertices are suppressed.

  • T. Wu

Evolutionary Analysis

slide-32
SLIDE 32

TBR Graphs

GTBR(n) = (Vn, En) with

◮ Vn : the trees in T ∗ n ; ◮ En : two trees T1 and T2 are adjacent if there exists a TBR

  • peration θ such that T1 = θ(T2).
  • T. Wu

Evolutionary Analysis

slide-33
SLIDE 33

TBR Graphs

GTBR(n) = (Vn, En) with

◮ Vn : the trees in T ∗ n ; ◮ En : two trees T1 and T2 are adjacent if there exists a TBR

  • peration θ such that T1 = θ(T2).

Similarly, we can define GNNI(n) and GSPR(n).

  • T. Wu

Evolutionary Analysis

slide-34
SLIDE 34

TBR Graphs

GTBR(n) = (Vn, En) with

◮ Vn : the trees in T ∗ n ; ◮ En : two trees T1 and T2 are adjacent if there exists a TBR

  • peration θ such that T1 = θ(T2).

Similarly, we can define GNNI(n) and GSPR(n). Note that all operations are symmetry and

NNI ⊆ SPR ⊆ TBR,

that is, any NNI operation is a SPR operatoin while any SPR

  • peration is a TBR operation.
  • T. Wu

Evolutionary Analysis

slide-35
SLIDE 35

Degrees

◮ GNNI(n) is regular with degree 2(n − 3); (Robinson 1971)

  • T. Wu

Evolutionary Analysis

slide-36
SLIDE 36

Degrees

◮ GNNI(n) is regular with degree 2(n − 3); (Robinson 1971) ◮ GSPR(n) is regular with degree 2(n − 3)(2n − 7); (Allen&Steel

2001)

  • T. Wu

Evolutionary Analysis

slide-37
SLIDE 37

Degrees

◮ GNNI(n) is regular with degree 2(n − 3); (Robinson 1971) ◮ GSPR(n) is regular with degree 2(n − 3)(2n − 7); (Allen&Steel

2001)

◮ GTBR(n) is not regular, the maximal degree is obtained by

caterpillar trees. (Humphries, 2008)

  • T. Wu

Evolutionary Analysis

slide-38
SLIDE 38

Degrees

◮ GNNI(n) is regular with degree 2(n − 3); (Robinson 1971) ◮ GSPR(n) is regular with degree 2(n − 3)(2n − 7); (Allen&Steel

2001)

◮ GTBR(n) is not regular, the maximal degree is obtained by

caterpillar trees. (Humphries, 2008)

  • Figure: A caterpillar tree
  • T. Wu

Evolutionary Analysis

slide-39
SLIDE 39

Our result

Theorem (Humphries-W, TCBB 2013)

For each vertex T ∈ T ∗

n with n ≥ 3, its degree in GTBR(n) is

4Γ(T) − (8n2 − 18n + 6)

  • T. Wu

Evolutionary Analysis

slide-40
SLIDE 40

Our result

Theorem (Humphries-W, TCBB 2013)

For each vertex T ∈ T ∗

n with n ≥ 3, its degree in GTBR(n) is

4Γ(T) − (8n2 − 18n + 6) with Γ(T) :=

  • {u,v}⊆L(T)

distT(u, v) denoting the sume of the distance between all leaves of T.

  • T. Wu

Evolutionary Analysis

slide-41
SLIDE 41

Our result

Theorem (Humphries-W, TCBB 2013)

For each vertex T ∈ T ∗

n with n ≥ 3, its degree in GTBR(n) is

4Γ(T) − (8n2 − 18n + 6) with Γ(T) :=

  • {u,v}⊆L(T)

distT(u, v) denoting the sume of the distance between all leaves of T. For the vertices in GTBR(n):

◮ Maximal degree: Caterpillar Trees ◮ Minimal degree: Semi-regular Trees (see, also,

[Szekely-Wang-W, DM 2011])

  • T. Wu

Evolutionary Analysis

slide-42
SLIDE 42

A key lemma

Lemma

For two “distinct” TBR operations θ and θ′, θ(T) = θ′(T) implies that both θ and θ′ are NNI operations.

  • T. Wu

Evolutionary Analysis

slide-43
SLIDE 43

A key lemma

Lemma

For two “distinct” TBR operations θ and θ′, θ(T) = θ′(T) implies that both θ and θ′ are NNI operations. Note: Here two TBR operations are distinct if

  • T. Wu

Evolutionary Analysis

slide-44
SLIDE 44

A key lemma

Lemma

For two “distinct” TBR operations θ and θ′, θ(T) = θ′(T) implies that both θ and θ′ are NNI operations. Note: Here two TBR operations are distinct if

◮ they delete different edges in the bisection step, or

  • T. Wu

Evolutionary Analysis

slide-45
SLIDE 45

A key lemma

Lemma

For two “distinct” TBR operations θ and θ′, θ(T) = θ′(T) implies that both θ and θ′ are NNI operations. Note: Here two TBR operations are distinct if

◮ they delete different edges in the bisection step, or ◮ they use different edges in the reconnection step.

  • T. Wu

Evolutionary Analysis

slide-46
SLIDE 46

The PDA model

◮ The number of trees in Tn is

ϕ(n) := (2n − 3)!! = 1 · 3 · · · (2n − 3)

  • T. Wu

Evolutionary Analysis

slide-47
SLIDE 47

The PDA model

◮ The number of trees in Tn is

ϕ(n) := (2n − 3)!! = 1 · 3 · · · (2n − 3)

◮ Under the proportional to distinguishable arrangements (PDA)

model, each tree has the same probability to be generated, that is, we have Pu(T) = 1 ϕ(n) (1) for every T in Tn.

  • T. Wu

Evolutionary Analysis

slide-48
SLIDE 48

The YHK model

Under the Yule–Harding model [Yule 1925, Harding 1971],

◮ Beginning with a two leafed tree, we “grow” it by repeatedly

splitting a leaf into two new leaves.

  • T. Wu

Evolutionary Analysis

slide-49
SLIDE 49

The YHK model

Under the Yule–Harding model [Yule 1925, Harding 1971],

◮ Beginning with a two leafed tree, we “grow” it by repeatedly

splitting a leaf into two new leaves.

◮ The splitting leaf is chosen randomly and uniformly among all

the present leaves in the current tree.

  • T. Wu

Evolutionary Analysis

slide-50
SLIDE 50

The YHK model

Under the Yule–Harding model [Yule 1925, Harding 1971],

◮ Beginning with a two leafed tree, we “grow” it by repeatedly

splitting a leaf into two new leaves.

◮ The splitting leaf is chosen randomly and uniformly among all

the present leaves in the current tree.

◮ After obtaining an unlabeled tree with n leaves, we label each

  • f its leaves with a label sampled randomly uniformly (without

replacement) from {1, · · · , n}.

  • T. Wu

Evolutionary Analysis

slide-51
SLIDE 51

The YHK model

Under the Yule–Harding model [Yule 1925, Harding 1971],

◮ Beginning with a two leafed tree, we “grow” it by repeatedly

splitting a leaf into two new leaves.

◮ The splitting leaf is chosen randomly and uniformly among all

the present leaves in the current tree.

◮ After obtaining an unlabeled tree with n leaves, we label each

  • f its leaves with a label sampled randomly uniformly (without

replacement) from {1, · · · , n}. When branch lengths are ignored, the Yule–Harding model is shown [Aldous,1996] to be equivalent to the trees generated by Kingman’s coalescent process, and so we call it the YHK model.

  • T. Wu

Evolutionary Analysis

slide-52
SLIDE 52

Subtree Pattern

◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves

  • T. Wu

Evolutionary Analysis

slide-53
SLIDE 53

Subtree Pattern

◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves

Figure: A tree with three cherries and one pitchfork.

  • T. Wu

Evolutionary Analysis

slide-54
SLIDE 54

Subtree Pattern II

Given a phylogenetic tree T, let

◮ A(T): the number of pitchforks; ◮ C(T): the number of cherries.

  • T. Wu

Evolutionary Analysis

slide-55
SLIDE 55

Subtree Pattern II

Given a phylogenetic tree T, let

◮ A(T): the number of pitchforks; ◮ C(T): the number of cherries.

For n ≥ 2, consider the random variables

◮ An: the number of pitchforks in a random tree; ◮ Cn: the number of cherries in a random tree.

  • T. Wu

Evolutionary Analysis

slide-56
SLIDE 56

Subtree Pattern II

Given a phylogenetic tree T, let

◮ A(T): the number of pitchforks; ◮ C(T): the number of cherries.

For n ≥ 2, consider the random variables

◮ An: the number of pitchforks in a random tree; ◮ Cn: the number of cherries in a random tree.

What are the joint distributions of An and Cn?

  • T. Wu

Evolutionary Analysis

slide-57
SLIDE 57

Joint distributions: formulae

Theorem (W-Choi, 2016)

For n > 3 and 1 < b < n, we have

Py(An+1 = a, Cn+1 = b) = 2a n Py(An = a, Cn = b) + (a + 1) n Py(An = a + 1, Cn = b − 1) + 2(b − a + 1) n Py(An = a − 1, Cn = b) + (n − a − 2b + 2) n Py(An = a, Cn = b − 1).

  • T. Wu

Evolutionary Analysis

slide-58
SLIDE 58

Joint distributions: formulae

Theorem (W-Choi, 2016)

For n > 3 and 1 < b < n, we have

Py(An+1 = a, Cn+1 = b) = 2a n Py(An = a, Cn = b) + (a + 1) n Py(An = a + 1, Cn = b − 1) + 2(b − a + 1) n Py(An = a − 1, Cn = b) + (n − a − 2b + 2) n Py(An = a, Cn = b − 1).

Note: A similar formula for the PDA model.

  • T. Wu

Evolutionary Analysis

slide-59
SLIDE 59

Statistical properties

◮ A dynamic approach to computing the joint distributions.

  • T. Wu

Evolutionary Analysis

slide-60
SLIDE 60

Statistical properties

◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint

(and the marginal) distributions.

  • T. Wu

Evolutionary Analysis

slide-61
SLIDE 61

Statistical properties

◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint

(and the marginal) distributions.

◮ The cherry distributions are log-concave. That is, for n > 2

and 1 < k < n, we have Py(Cn = k)2 ≥ Py(Cn = k + 1)Py(Cn = k − 1)

  • T. Wu

Evolutionary Analysis

slide-62
SLIDE 62

Statistical properties

◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint

(and the marginal) distributions.

◮ The cherry distributions are log-concave. That is, for n > 2

and 1 < k < n, we have Py(Cn = k)2 ≥ Py(Cn = k + 1)Py(Cn = k − 1)

◮ There exists a unique change point for the cherry distributions

between the YHK and the PDA models.

  • T. Wu

Evolutionary Analysis

slide-63
SLIDE 63

Statistical properties

◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint

(and the marginal) distributions.

◮ The cherry distributions are log-concave. That is, for n > 2

and 1 < k < n, we have Py(Cn = k)2 ≥ Py(Cn = k + 1)Py(Cn = k − 1)

◮ There exists a unique change point for the cherry distributions

between the YHK and the PDA models.

◮ Similar results for clade sizes and clan sizes [Zhu-Than-W,

2015].

  • T. Wu

Evolutionary Analysis

slide-64
SLIDE 64

Part III: Phylogenetic Networks

  • T. Wu

Evolutionary Analysis

slide-65
SLIDE 65

The tangled tree of life

  • T. Wu

Evolutionary Analysis

slide-66
SLIDE 66

From trees to networks

Phylogenetic tree is useful, but networks provide a better tool for studying

◮ conflicting signals ◮ recombination ◮ gene flow ◮ hybridization ◮ horizontal gene transfer ◮ · · ·

  • T. Wu

Evolutionary Analysis

slide-67
SLIDE 67

Phylogenetic Networks: Unrooted

(1) (7) (10) (2) (13) (9) (14) (8) (15) (5) (12) (6) (11) (3) (4) (1) (4) (3) (11) (7) (9) (13) (2) (8) (15) (12) (14) (10) (6) (5)

Figure: A phylogenetic tree and network relating 15 plants species from the genus Solanum; from [Bastkowski-Moulton-Spillner-Wu, 2015, Bull.

  • Math. Biol.]
  • T. Wu

Evolutionary Analysis

slide-68
SLIDE 68

Network thinking: pedigree

Figure: A partial pedigree of Prince Charles; from [Gusfield, 2014].

  • T. Wu

Evolutionary Analysis

slide-69
SLIDE 69

Recombination

Figure: A history with recombination; from [Gusfield, 2014].

  • T. Wu

Evolutionary Analysis

slide-70
SLIDE 70

Phylogenetic Networks

A (rooted) phylogenetic network:

◮ a directed acyclic graph ◮ a unique root ◮ leaves are labelled by taxa ◮ no vertex with one parent

and one child

◮ binary

A central problem: How to reconstruct phylogenetic networks?

  • T. Wu

Evolutionary Analysis

slide-71
SLIDE 71

Assembling trees: Supertree

a b c d e

b e a Input trees a b c a b d a b c d d e e c e

  • T. Wu

Evolutionary Analysis

slide-72
SLIDE 72

Assembling trees: Supertree

a b c d e

b e a Input trees a b c a b d a b c d d e e c e

◮ A tree is encoded by its subtrees on three leaves.

  • T. Wu

Evolutionary Analysis

slide-73
SLIDE 73

Assembling trees: Supertree

a b c d e

b e a Input trees a b c a b d a b c d d e e c e

◮ A tree is encoded by its subtrees on three leaves. ◮ A polynomial algorithm to assemble trees [Aho et al. 1981].

  • T. Wu

Evolutionary Analysis

slide-74
SLIDE 74

Assembling trees: Supertree

a b c d e

b e a Input trees a b c a b d a b c d d e e c e

◮ A tree is encoded by its subtrees on three leaves. ◮ A polynomial algorithm to assemble trees [Aho et al. 1981].

  • T. Wu

Evolutionary Analysis

slide-75
SLIDE 75

A Quiz!

Question: Are networks encoded by their trees?

  • T. Wu

Evolutionary Analysis

slide-76
SLIDE 76

A Quiz!

Question: Are networks encoded by their trees?

T1 ρ a b c T2 ρ a b c

N

ρ a b c

  • T. Wu

Evolutionary Analysis

slide-77
SLIDE 77

Answer

Question: Are networks encoded by their trees?

ρ N ′ a b c T1 ρ a b c T2 ρ a b c

N

ρ a b c

Answer: No.

  • T. Wu

Evolutionary Analysis

slide-78
SLIDE 78

Another quiz!

Question: Are networks encoded by their subnetworks?

  • T. Wu

Evolutionary Analysis

slide-79
SLIDE 79

Another quiz!

Question: Are networks encoded by their subnetworks?

c f e d c b a d b a f e c f e

Figure: An example of subnetwork.

  • T. Wu

Evolutionary Analysis

slide-80
SLIDE 80

A nontrivial answer

Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol.)

For every n ≥ 3, there exist two non-isomorphic phylogenetic networks N1 and N2 with n leaves such that they display the same set of subnetworks (and the same set of trees).

  • T. Wu

Evolutionary Analysis

slide-81
SLIDE 81

A nontrivial answer

Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol.)

For every n ≥ 3, there exist two non-isomorphic phylogenetic networks N1 and N2 with n leaves such that they display the same set of subnetworks (and the same set of trees).

d a b c d a b c

  • T. Wu

Evolutionary Analysis

slide-82
SLIDE 82

Level-1 networks

In [Huber-Moulton, 2013, Algorithmica], it is shown that level-1 networks are encoded by their subnetworks.

a b c d i e f g h j N

Figure: level-1 = all undirected cycles are disjoint

  • T. Wu

Evolutionary Analysis

slide-83
SLIDE 83

Trinets

T1(x, y; z)

x y z x y z x y z x y z z z z x x x z y y y y x

S1(x, y; z) S2(x; y; z) N2(x, y; z) N5(x; y; z) N3(x; y; z) N4(x; y; z) N1(x, y; z)

Figure: Eight types of level-1 networks on three leaves.

  • T. Wu

Evolutionary Analysis

slide-84
SLIDE 84

Assembling Trinets

a e c e g b a b c c d f e f c h g i Input trinets

Input: A collection of trinets. Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- ing the collection of trinets.

  • T. Wu

Evolutionary Analysis

slide-85
SLIDE 85

Assembling Trinets

a e c e g b a b c c d f e f c h g i Input trinets

Input: A collection of trinets. Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- ing the collection of trinets. (2)Construct such a network if it exists.

  • T. Wu

Evolutionary Analysis

slide-86
SLIDE 86

Incomplete data

In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica], we show that when some trinet is missing, then

◮ the trinet assembling problem is NP-hard;

  • T. Wu

Evolutionary Analysis

slide-87
SLIDE 87

Incomplete data

In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica], we show that when some trinet is missing, then

◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O(3npoly(n)) algorithm.

  • T. Wu

Evolutionary Analysis

slide-88
SLIDE 88

Incomplete data

In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica], we show that when some trinet is missing, then

◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O(3npoly(n)) algorithm.

Question: How about ’real data’ (often noisy and containing conflict signals)?

  • T. Wu

Evolutionary Analysis

slide-89
SLIDE 89

Trilonet

a b c d i e f g h j N a b c d j i e f g h ATCGTCATTCCGG ATGGTCAATCTGG ATGGTCAATGTCC ATCGTCATTCCGG ATGGTCAATCTGG ATGGTCAATGTCC a b c h i j An alignment on X = {a, . . . , j} h i j ATCGTCATTCCGG ATGGTCAATCTGG ATGGTCAATGTCC h i j y∗ a e c e g b a b c c d f e f h g i A dense set of trinets

Identify a suitable subst of taxa

Figure: A schematic view of Trinet-based Level One Network reconstructor, from [Oldman∗-Wu∗-Iersel-Moutlon, in revision for MBE].

  • T. Wu

Evolutionary Analysis

slide-90
SLIDE 90

Trilonet: a case study

Giardia_lamblia_ATCC_50803_WB #H1 Giardia_intestinalis_isolate_246 Giardia_intestinalis_isolate_55 Giardia_intestinalis_isolate_JH Giardia_intestinalis_isolate_335 Giardia_intestinalis_isolate_303 Giardia_intestinalis_isolate_305

Figure: The inferred phylogeny of 7 Giardia strains by Trilonet; data from [Cooper et al, Curr. Biol., 2007].

  • T. Wu

Evolutionary Analysis

slide-91
SLIDE 91

Trilonet

Trilonet is an algorithm for inferring level-1 network:

◮ Constructing a network directly from sequence data (without

using breaking points or gene trees).

◮ Efficient, and robust for noisy data.

  • T. Wu

Evolutionary Analysis

slide-92
SLIDE 92

Trilonet

Trilonet is an algorithm for inferring level-1 network:

◮ Constructing a network directly from sequence data (without

using breaking points or gene trees).

◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at

https://www.uea.ac.uk/computing/trilonet

◮ Consistent.

  • T. Wu

Evolutionary Analysis

slide-93
SLIDE 93

Trilonet

Trilonet is an algorithm for inferring level-1 network:

◮ Constructing a network directly from sequence data (without

using breaking points or gene trees).

◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at

https://www.uea.ac.uk/computing/trilonet

◮ Consistent.

Future improvement includes

◮ level-k networks ◮ statistical consistency

  • T. Wu

Evolutionary Analysis

slide-94
SLIDE 94

Part IV: Future Directions

  • T. Wu

Evolutionary Analysis

slide-95
SLIDE 95

Network models and inference

More realistic models:

◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes

  • T. Wu

Evolutionary Analysis

slide-96
SLIDE 96

Network models and inference

More realistic models:

◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes

Reconstructing networks

◮ Rigorous statistical frameworks (Maximal Likelihood or Bayesian)

  • T. Wu

Evolutionary Analysis

slide-97
SLIDE 97

Network models and inference

More realistic models:

◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes

Reconstructing networks

◮ Rigorous statistical frameworks (Maximal Likelihood or Bayesian) ◮ Accounting for non-tree like patterns resulted from

◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS)

  • T. Wu

Evolutionary Analysis

slide-98
SLIDE 98

Network models and inference

More realistic models:

◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes

Reconstructing networks

◮ Rigorous statistical frameworks (Maximal Likelihood or Bayesian) ◮ Accounting for non-tree like patterns resulted from

◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS)

◮ Efficient algorithms for searching the network space

  • T. Wu

Evolutionary Analysis

slide-99
SLIDE 99

Space of phylogenetic networks

a d c b a b d c a c d b a d b c a b d c a d c b a d c b a b d c a b d c b a c d b a d c d a b c a c d b a c d b a c d b

Figure: Space of level-1 networks with four taxa; from [Huber-Linz-Moulton-Wu, J. Math. Biol., 2016]

  • T. Wu

Evolutionary Analysis

slide-100
SLIDE 100

Network operation

(ii) A B C D (i) A B C D v1 v2 v4 v3 v4 v2 v3 v1 v5 v2 v3 v6 v1 v4 v5 v2 v3 v6 v1 v4

T T ′ N N ′

Figure: A generalisation of the NNI operation on networks.

  • T. Wu

Evolutionary Analysis

slide-101
SLIDE 101

Space of phylogenetic networks: II

a c d b a d b c a b d c a c d b a c d b a b d c N0(X) N1(X) N2(X) b a d c b a c d d a b c a b d c a b d c a d c b a d c b a b d c a d c b b a c d

Figure: Space of networks with four taxa; from [Huber-Moulton-Wu, J. Theoretical Biol., in press]

  • T. Wu

Evolutionary Analysis

slide-102
SLIDE 102

Conclusion

◮ An exciting research field

◮ scientific curiosity ◮ practical impact

  • T. Wu

Evolutionary Analysis

slide-103
SLIDE 103

Conclusion

◮ An exciting research field

◮ scientific curiosity ◮ practical impact

◮ A growing field of research

◮ new types of data (big data, time series) ◮ new applications (cancer evolution, culture evolution)

  • T. Wu

Evolutionary Analysis

slide-104
SLIDE 104

Conclusion

◮ An exciting research field

◮ scientific curiosity ◮ practical impact

◮ A growing field of research

◮ new types of data (big data, time series) ◮ new applications (cancer evolution, culture evolution)

◮ A genuine multi-disciplinary area

◮ mathematics (combinatorics, optimisation, probability,

statistics)

◮ computer science (algorithms, data science) ◮ biology etc

  • T. Wu

Evolutionary Analysis

slide-105
SLIDE 105

Thanks

◮ Vincent Moulton and Katharina Huber (UEA) ◮ Mike Steel (Canterbury, NZ) ◮ Kwok Poi Choi (NUS)

  • T. Wu

Evolutionary Analysis

slide-106
SLIDE 106

Thanks

◮ Vincent Moulton and Katharina Huber (UEA) ◮ Mike Steel (Canterbury, NZ) ◮ Kwok Poi Choi (NUS) ◮ Leo van Iersel (TU Delft, the Netherlands), Celine

Scornavacca (Montpellier, France), Simone Linz (Auckland, NZ), Andereas Spillner (Greifswald, Germany), Cuong Than (Tubingen, Germany), Joe Zhu (Oxford)

  • T. Wu

Evolutionary Analysis

slide-107
SLIDE 107

Thanks

◮ Vincent Moulton and Katharina Huber (UEA) ◮ Mike Steel (Canterbury, NZ) ◮ Kwok Poi Choi (NUS) ◮ Leo van Iersel (TU Delft, the Netherlands), Celine

Scornavacca (Montpellier, France), Simone Linz (Auckland, NZ), Andereas Spillner (Greifswald, Germany), Cuong Than (Tubingen, Germany), Joe Zhu (Oxford)

◮ Previous and current students: Sarah Baskowski (TGAC) and

James Oldman (UEA)

  • T. Wu

Evolutionary Analysis

slide-108
SLIDE 108

Thanks

◮ Vincent Moulton and Katharina Huber (UEA) ◮ Mike Steel (Canterbury, NZ) ◮ Kwok Poi Choi (NUS) ◮ Leo van Iersel (TU Delft, the Netherlands), Celine

Scornavacca (Montpellier, France), Simone Linz (Auckland, NZ), Andereas Spillner (Greifswald, Germany), Cuong Than (Tubingen, Germany), Joe Zhu (Oxford)

◮ Previous and current students: Sarah Baskowski (TGAC) and

James Oldman (UEA)

◮ Your attention !

  • T. Wu

Evolutionary Analysis