The status of cophylogenetic analysis Michael Charleston University - - PowerPoint PPT Presentation

the status of cophylogenetic analysis
SMART_READER_LITE
LIVE PREVIEW

The status of cophylogenetic analysis Michael Charleston University - - PowerPoint PPT Presentation

The status of cophylogenetic analysis Michael Charleston University of Sydney Phylomania 2010.11.04-05 MAC (USyd) The status of cophylogenetic analysis Phylomania 1 / 50 Part I Background MAC (USyd) The status of cophylogenetic analysis


slide-1
SLIDE 1

The status of cophylogenetic analysis

Michael Charleston

University of Sydney

Phylomania 2010.11.04-05

MAC (USyd) The status of cophylogenetic analysis Phylomania 1 / 50

slide-2
SLIDE 2

Part I Background

MAC (USyd) The status of cophylogenetic analysis Phylomania 2 / 50

slide-3
SLIDE 3

Some motivation

  • 1985

1990 1995 2000 2005 5 10 15 20 25 year number

Figure 1: Numbers of papers cited in PubMed with co[-](speciat|diverg)* in the title or abstract

About 75% of emergent human diseases are zoonoses, (SARS, HIV, Ebola, H1N1, . . . ). Understanding where an

  • rganism came from (e.g.,

invading pests) can tell us how better to combat them.

MAC (USyd) The status of cophylogenetic analysis Phylomania 3 / 50

slide-4
SLIDE 4

Different systems can coevolve at the macroscopic level

codivergence/ cospeciation duplication/ independent speciation duplication/ independent speciation loss/ extinction horizontal transfer/ host switch

hosts and their parasites or pathogens; whole organisms and their genes;

vicariant speciation vicariant speciation invasion

geographical areas and the species which inhabit them.

MAC (USyd) The status of cophylogenetic analysis Phylomania 4 / 50

slide-5
SLIDE 5

Introduction

The goal is to determine, for two groups of ecologically linked taxa, what were the evolutionary paths they took with respect to each other. We aim to answer questions like: How long is the association between host and parasite? Did they cospeciate? Were there host switches or lateral gene transfers? What kind of risk of cross-infection does this pathogen present to its sister species?

MAC (USyd) The status of cophylogenetic analysis Phylomania 5 / 50

slide-6
SLIDE 6

Problem Instance

Given a host phylogeny H an associate phylogeny P known associations ϕ of the tips of P with those of H We call a problem instance a tanglegram, such as T = (H, P, ϕ). The object is to find out the ancestral relationships between P and H. This mostly comes down to an optimization problem.

MAC (USyd) The status of cophylogenetic analysis Phylomania 6 / 50

slide-7
SLIDE 7

Coevolutionary events

codivergence extinction miss the boat host-switch unsuccessful host switch duplication Host Pathogen Untraceable ghost failure to diverge

MAC (USyd) The status of cophylogenetic analysis Phylomania 7 / 50

slide-8
SLIDE 8

codivergence extinction miss the boat host-switch unsuccessful host switch duplication Host Pathogen Untraceable ghost failure to diverge

Definition

A codivergence event occurs when internal vertices p ∈ V (P) and h ∈ V (H) are coincident, and the children of p diversify on the children

  • f h.

MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

slide-9
SLIDE 9

codivergence extinction miss the boat host-switch unsuccessful host switch duplication Host Pathogen Untraceable ghost failure to diverge

Definition

A duplication occurs when p is associated with an arc of H rather than a vertex; this corresponds to a speciation or divergence of p that is independent of a divergence event in the host.

MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

slide-10
SLIDE 10

codivergence extinction miss the boat host-switch unsuccessful host switch duplication Host Pathogen Untraceable ghost failure to diverge

Definition

A host switch occurs for some arc (p, q) ∈ A(P) where p is associated with a location in H that is contemporary with, but not ancestral to, the location in H with which q is associated.

MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

slide-11
SLIDE 11

codivergence extinction miss the boat host-switch unsuccessful host switch duplication Host Pathogen Untraceable ghost failure to diverge

↑ ←

Definition

A loss occurs as the result of one of three things, which are indistinguishable: extinction of some p, failure to track both hosts after a host divergence event (“missing the boat”) and simple failure to sample the pathogen p.

MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

slide-12
SLIDE 12

What we can recover

Ronquist confirmed in 2002[9] that these are the only four types of recoverable event for this problem:

1 codivergence, 2 duplication, 3 loss, and 4 host switching

All methods (attempt to) recover codivergence, but not all can recover host switching. Some only recover codivergence, duplication and loss. We would like also to recover failure to diverge events, where a parasite

  • f a speciating host continues to parasitize it, without divergence.

MAC (USyd) The status of cophylogenetic analysis Phylomania 9 / 50

slide-13
SLIDE 13

Event costs

We can assign a cost to each event type, subject to simple constraints:[2] the biological rule, c < d, ℓ, w (for codivergence, duplication, loss and host switch respectively) and the pragmatic rule, 0 ≤ c, d, w, ℓ (which allows a dynamic program to solve the optimization problem).

MAC (USyd) The status of cophylogenetic analysis Phylomania 10 / 50

slide-14
SLIDE 14

Event costs

Jane 1 & 2 use dynamic programming to minimise total cost, with event costs prescribed. Event Jane Cost TreeMap Cost Cospeciation c Duplication d 1 1 Host Switch w 1 1 Loss/Sorting ℓ 2 1 TreeMap puts default costs on events but does not normally use them: it finds a Pareto front of solutions, and has worst-case exponential running time. Jane 1 uses an O(n7) algorithm to find minimal cost reconstructions; Jane 2 has an algorithm down to O(n3). The penalty in O(n7) → O(n3) is a loss of approximately 0.1% in performance, which is definitely acceptable.

MAC (USyd) The status of cophylogenetic analysis Phylomania 11 / 50

slide-15
SLIDE 15

Part II Everything Is Against Us

MAC (USyd) The status of cophylogenetic analysis Phylomania 12 / 50

slide-16
SLIDE 16

Myzled by the implickit paradidgem

gophers associations lice

bursarius ewingi cavator panamensis bottae minor cherriei cherriei talpoides thomomyus hispidus chapini actuosi wardi heterodus costaricensis underwoodi setzeriB setzeriA

p < 0.01; much apparent congruence & codivergence

Farenholz’ Rule[5], that “parasite phylogeny mirrors host phylogeny,” has misled us. The classic cophylogeny example

  • f gophers and lice

(left) is wonderful, but in fact such cases are rare: more and more studies are showing lack of evidence for codivergence, despite similarity.

MAC (USyd) The status of cophylogenetic analysis Phylomania 13 / 50

slide-17
SLIDE 17

This is more common

3

Many studies look for congruence with inappropriate tools

host associations parasite

Program crashes: ∴ problem with program

MAC (USyd) The status of cophylogenetic analysis Phylomania 14 / 50

slide-18
SLIDE 18

Empirical evidence of complexity

1 10 100 1000 10000 3 4 5 6 7 # maps POpt Feasible # taxa 2

The number of feasible maps increases rapidly for even modest numbers of taxa. The number of maps in the Pareto front – those which could be optimal for some scheme of event costs – also increases quickly.

from Charleston 2003[3]

MAC (USyd) The status of cophylogenetic analysis Phylomania 15 / 50

slide-19
SLIDE 19

Empirical evidence of complexity

1 10 100 1000 10000 2 4 6 8 10 12 14 n = 2 n = 3 n = 4 n = 5 n = 6 n = 7

degree of fit (min. NCEs)

The number of feasible maps is also highly correlated with the degree of incongruence.

from Charleston 2003[3]

MAC (USyd) The status of cophylogenetic analysis Phylomania 16 / 50

slide-20
SLIDE 20

Cophylogeny mapping is NPC

We begin with the Generalized Cophylogeny Reconstruction Problem (Gcrp). This is a 6-tuple (H = (VH, EH), P = (VP , EP ), tH, tP , ϕ, κ) where H is the host network, P the parasite network, tH and tP are timing functions for H and P that map each vertex to a set of permitted times, ϕ is defined as before, and κ is a 4-tuple cost vector κ = (c, d, w, ℓ) for codivergence, duplication, host switch and loss respectively. The objective is to find a mapping Φ : P → H that extends ϕ, can be constructed using the usual events with respect to the timing functions, and is of minimum total cost.

MAC (USyd) The status of cophylogenetic analysis Phylomania 17 / 50

slide-21
SLIDE 21

Gcrp

Theorem

Gcrp is solvable in polynomial time for the set of instances (H = (VH, EH), P = (VP , EP ), tH, tP , ϕ, κ) such that (i) P is a tree and (ii) for all v ∈ VH, |tH(v)| = 1 (Proof is by construction of a polynomial time algorithm for this case using a dynamic program. See Libeskind-Hadas & Charleston[7] for details.)

MAC (USyd) The status of cophylogenetic analysis Phylomania 18 / 50

slide-22
SLIDE 22

Gcrdp

We first define the Generalized Cophylogeny Reconstruction Decision Problem (Gcrdp) as follows: Instance: Given (H = (VH, EH), P = (VP , EP ), tH, tP , ϕ, κ) and a cost K. Question: Does there exist a reconstruction whose cost is K or less?

Theorem

The decision problem associated with Gcrp is NP-complete for the set

  • f instances (H = (VH, EH), P = (VP , EP ), tH, tP , ϕ, κ) such that (i) P

is a tree and (ii) for all v ∈ V (H), |tH(v)| ≤ 2. (Proof is by reduction to 3-Sat: see Libeskind-Hadas & Charleston[7] for details.)

MAC (USyd) The status of cophylogenetic analysis Phylomania 19 / 50

slide-23
SLIDE 23

Cophylogeny mapping is NP-Complete

Ovadia et al. define the Cophylogeny Reconstruction Decision Problem as follows:

Definition

An instance of the Cophylogeny Reconstruction Decision Problem (Crdp) is a 4-tuple (H, P, ϕ, B) where H and P are rooted host and parasite trees, ϕ : L(P) → L(H) maps the tips of P to the tips of H, and B is a 4-tuple (BC, BD, BS, BL) of upper bounds on the number of cospeciation, duplication, loss and host switch events respectively. The decision question is: Does there exist a mapping Φ that extends ϕ and whose cost is strictly less than B?

MAC (USyd) The status of cophylogenetic analysis Phylomania 20 / 50

slide-24
SLIDE 24

Cophylogeny mapping is NP-Complete

Theorem

The Crdp is NP-complete. (Proof is by showing that a related problem, where ψ : L(P) → 2L(H), is NP-complete by a reduction from 3-Sat; then such an instance can be transformed into a corresponding instance of Crdp with the same answer, thus implying Crdp is NP-complete. See Ovadia et al.[8] for details.)

MAC (USyd) The status of cophylogenetic analysis Phylomania 21 / 50

slide-25
SLIDE 25

Cophylogeny is probably in APX-Hard

A related problem is the Lateral Gene Transfer Problem, LGTP.

Definition

A lateral transfer scheme for a species tree S is a pair (S′, A′) where S′ is a subdivision of S and A ⊆ {x, y : x, y ∈ V (S′) \ V (S), x = y} such that

1 the mixed graph S′ ∪ ǫ(A′) does not contain a directed mixed cycle; 2 the tail of each arc A′ has in-degree 1 and out-degree 2 in S′ ∪ A′; 3 the head of each arc in A′ has in-degree 2 and out-degree 1 in

S′ ∪ A′. The lateral transfer scheme indicates where gene transfers could have

  • ccured in the species tree.

A scheme is called “α-active” if there are at most α gene copies on any

  • ne lineage of the species tree at any one time.

MAC (USyd) The status of cophylogenetic analysis Phylomania 22 / 50

slide-26
SLIDE 26

Cophylogeny is probably hard to approximate

Sensu DasGupta et al. (2005), the LGT for a species tree S and gene tree G is to find a gene transfer scheme that has minimal cost.

Theorem

There does not exist a polynomial time approximation scheme for the 1-active LGT problem with performance guarantee of 1 + ǫ where ǫ ≥ 3/370024

Theorem

There does not exist a polynomial time approximation scheme for the α-active LGT problem with performance guarantee of 1 + ǫ where ǫ ≥ 3/378068 with α ≥ 1. (See DasGupta et al. 2005[4] for details.) This suggests that the cophylogeny reconstruction problem probably doesn’t have a polynomial time approximation scheme (PTAS) / is APX-Hard.

MAC (USyd) The status of cophylogenetic analysis Phylomania 23 / 50

slide-27
SLIDE 27

Part III Method Comparison

MAC (USyd) The status of cophylogenetic analysis Phylomania 24 / 50

slide-28
SLIDE 28

Parsimony/event-based methods

Brooks came up with an early solution, coined Brooks’ Parsimony Analysis (BPA)[1]. BPA recodes the known associations ϕ and the parasite/pathogen trees H, P as binary characters and then puts them into a parsimony-based tree reconstruction method.

MAC (USyd) The status of cophylogenetic analysis Phylomania 25 / 50

slide-29
SLIDE 29

BPA method

Brooks’ Parsimony Analysis works by

1 assigning IDs to internal nodes of both P and H, and 2 using those to create an “alignment” of binary characters, 3 which are then used by a parsimony program to find the most

parsimonious assignment of states to the internal branches of H.

4 The pattern of these states is then interpreted to determine what

(co)evolutionary events must have taken place.

MAC (USyd) The status of cophylogenetic analysis Phylomania 26 / 50

slide-30
SLIDE 30

Generating Unique Paths

Algorithm 1: BPAPathAssign (T)

1

/∗ T is a bifurcating tree ∗/

1

let n = |L(T)|

2

let k = |V (T)|

3

without loss of generality let the nodes be labelled v1 . . . vk

4

let S be a list of binary strings (S1, . . . , Sk), initially all 0’s

5

for each (vi ∈ L(T)) do {

6

· for (j = 1 up to k) do {

7

· · if (vj is on the path from vi to the root) then {

8

· · · Si,j ← 1

9

· · }

10

· }

11

}

MAC (USyd) The status of cophylogenetic analysis Phylomania 27 / 50

slide-31
SLIDE 31

BPA implementation

1 There is no freely available implementation of Brooks’ Parsimony

Analysis (BPA) method, or Secondary BPA (SBPA) which was proposed later as a fix for some of the issues with BPA.

2 An implementation was created to obey Brooks’ descriptions in

the literature (there is no pseudocode description of BPA/SBPA in the literature either).

3 BPA works by assigning IDs to nodes in both H and P, based on

their position in the tree. This is from tip to root, and takes in worst case O(n2) runtime.

4 In order to increase efficiency we1 implemented a randomized

version of SBPA, Randomized SBPA, with hoped-for runtime of Θ(n log n).

1i.e., Ben MAC (USyd) The status of cophylogenetic analysis Phylomania 28 / 50

slide-32
SLIDE 32

Encoding order matters

Code SBPA Randomized SBPA 1 100001001 100000011 100001001 100001001 2 010000111 010001011 010001001 010000011 3 001000111 001001011 010000011 010001001 4 000100011 000100101 000100111 000100111 5 000011001 000010101 000010111 000010111 These sequences can be2 interpreted post-hoc to infer the coevolutionary events.

2= have to be MAC (USyd) The status of cophylogenetic analysis Phylomania 29 / 50

slide-33
SLIDE 33

BPA Fail

H associations P e t c s d r b q a p 1 2 3 4 5 2 3 1 4 5 7 6 8 7 6 8 9 9

SBPA suggests 1 codivergence, 3 duplications and 3 host switches.

H associations P e t c s d r b q a p 2 3 4 1 5 2 3 4 1 5 7 8 9 6 6 9 8 7

SBPA suggests 2 codivergences, 2 duplications and 2 host switches.

MAC (USyd) The status of cophylogenetic analysis Phylomania 30 / 50

slide-34
SLIDE 34

ParaFit4

ParaFit[6, 10] works by converting both trees to distance matrices Algorithm 2: ParaFit

1

let T = (H, P, ϕ) be a tanglegram

2

A ← the associations ϕ expressed as a matrix

3

B ← the principal coordinates matrix of P

4

C ← the principal coordinates matrix of H

5

D ← CATB

6

M ← A B C D

  • The test statistic is trace(DTD) = (d2

ij) and is gained by

recalculating D using the original B and C matrices, and randomizing the associations represented by A3.

3NB: in AxParaFit the random number seed is the same each time! 4And its faster version, AxParaFit[10] MAC (USyd) The status of cophylogenetic analysis Phylomania 31 / 50

slide-35
SLIDE 35

ParaFit confidence

  • 5

10 15 20 50 60 70 80 90 100

AxParaFit confidence

ntax % confidence

  • Confidence is expressed

as a percentage, as 100% × (1 − p) 250 tanglegrams were created at random for each trial red: random

Host1 Host1 associ… associ… Para1 Para1 x s w r v q u p

blue: identical

Host1 Host1 associ… associ… Para1 Para1 x s w r v q u p

MAC (USyd) The status of cophylogenetic analysis Phylomania 32 / 50

slide-36
SLIDE 36

Part IV Beetles, Bugs & Butterflies

MAC (USyd) The status of cophylogenetic analysis Phylomania 33 / 50

slide-37
SLIDE 37

The beetles: Arthropoda. The bugs: Wolbachia bacteria

Data: A huge sample with hundreds of arthropods and wolbachia infecting them, data collected by Patricia Sim˜

  • es in Tahiti, Moorea,

Raiatea.

Raiatea Tahiti Moorea

MAC (USyd) The status of cophylogenetic analysis Phylomania 34 / 50

slide-38
SLIDE 38

Wolbachia on Moorea

Host_Moorea associations Wolbachia_Moorea

sp453 wm357 sp648 wm356 sp967 wm354 wm353 sp39 wm351 sp79 wm350 sp697 wm213 sp656 wm212 sp590 wm205 wm209 sp1001 wm349 sp1096 wm347 sp4 wm348 sp763 wm346 sp764 wm345 sp493 wm340 sp301 wm342 sp1125 wm200 sp1163 wm203 sp31 wm202 sp7 wm337 sp299 wm338 sp439 wm339 sp17 wm331 sp1159 wm330 sp956 wm334 sp651 wm437 sp604 wm436 sp483 wm333 sp96 wm435 sp1097 wm332 wm235 sp99 wm234 sp35 wm232 sp958 wm228 sp21 wm286 sp304 wm229 sp682 wm285 sp262 wm227 sp1012 wm284 wm283 sp327 wm327 wm188 wm328 wm282 sp97 wm186 sp732 wm281 sp702 wm325 sp6 wm185 sp452 wm280 wm329 sp617 wm189 wm322 wm321 wm324 sp593 wm323 wm223 wm224 sp729 wm221 sp182 wm219 wm299 sp258 wm190 wm215 sp1090 wm193 sp482 wm296 wm295 wm216 sp32 wm192 wm195 sp1164 wm298 sp792 wm194 sp577 wm297 sp1053 wm218 sp1023 wm197 wm196 wm315 sp960 wm294 wm199 wm293 wm198 wm318 sp215 wm253 wm308 sp305 wm392 wm254 sp92 wm393 wm307 sp255 wm390 sp649 wm252 wm304 sp302 wm257 sp675 wm258 wm420 sp359 wm303 sp625 wm423 sp427 wm306 sp588 wm256 wm422 sp207 wm305 sp640 wm399 wm396 wm397 sp1117 wm395 wm310 sp137 wm413 wm414 sp376 wm415 wm248 wm416 wm417 wm418 sp980 wm419 sp289 wm380 wm241 wm381 wm382 sp254 wm243 wm434 sp1033 wm244 wm433 sp484 wm246 sp131 wm432 sp566 wm247 sp480 wm431 sp291 wm387 wm388 wm389 wm384 sp803 wm426 wm301 sp267 wm238 wm427 wm302 wm424 sp942 wm425 wm300 sp1016 wm428 wm429 sp962 wm401 sp428 wm277 sp371 wm278 wm275 wm370 wm276 wm273 sp908 wm274 sp716 wm374 sp693 wm375 wm270 sp472 wm378 wm379 wm376 wm377 sp606 wm369 wm266 wm412 sp964 wm411 wm410 wm268 wm269 wm263 wm264 sp687 wm265 wm360 wm361 wm362 wm260 sp1160 wm363 wm364 wm365 sp321 wm366 sp415 wm367 sp272 wm368 wm358 wm408 sp405 wm407 wm404 wm259 sp1057 wm402 wm403 sp579 sp578 sp571 sp572 sp1133 sp434 sp435 sp1129 sp441 sp589 sp444 sp1142 sp582 sp583 sp445 sp585 sp446 sp586 sp447 sp1144 sp448 sp1010 sp559 sp454 sp455 sp558 sp1011 sp555 sp1014 sp1013 sp458 sp554 sp459 sp456 sp1113 sp552 sp457 sp550 sp1109 sp1105 sp1106 sp1008 sp1009 sp1021 sp460 sp461 sp462 sp1025 sp463 sp464 sp465 sp467 sp468 sp469 sp564 sp565 sp1121 sp1120 sp1118 sp1119 sp1116 sp1114 sp1115 sp1019 sp1175 sp1030 sp1031 sp470 sp1032 sp1171 sp674 sp673 sp670 sp671 sp1026 sp678 sp679 sp676 sp677 sp1047 sp488 sp684 sp685 sp680 sp688 sp689 sp598 sp597 sp596 sp595 sp594 sp1150 sp1151 sp1152 sp1050 sp1153 sp1052 sp1054 sp1055 sp497 sp599 sp690 sp691 sp797 sp694 sp795 sp793 sp794 sp791 sp1149 sp790 sp1166 sp1061 sp1161 sp1062 sp1063 sp306 sp1059 sp1072 sp1089 sp1088 sp1085 sp1084 sp1087 sp1086 sp1081 sp1082 sp1094 sp1091 sp94 sp95 sp91 sp93 sp85 sp86 sp89 sp9 sp44 sp43 sp978 sp977 sp979 sp36 sp973 sp975 sp972 sp971 sp295 sp190 sp396 sp55 sp394 sp393 sp194 sp52 sp399 sp193 sp398 sp397 sp197 sp198 sp195 sp196 sp49 sp987 sp392 sp199 sp48 sp984 sp292 sp290 sp284 sp285 sp287 sp63 sp66 sp68 sp67 sp275 sp278 sp73 sp959 sp72 sp69 sp969 sp271 sp161 sp163 sp164 sp166 sp169 sp168 sp12 sp13 sp10 sp11 sp158 sp157 sp23 sp990 sp991 sp995 sp189 sp14 sp999 sp185 sp187 sp186 sp30 sp172 sp170 sp33 sp28 sp179 sp25 sp175 sp645 sp126 sp789 sp506 sp507 sp326 sp504 sp222 sp325 sp221 sp505 sp220 sp502 sp500 sp329 sp501 sp229 sp228 sp226 sp320 sp637 sp112 sp639 sp114 sp115 sp631 sp211 sp314 sp210 sp313 sp213 sp212 sp317 sp319 sp219 sp218 sp310 sp214 sp217 sp312 sp147 sp143 sp140 sp141 sp665 sp663 sp766 sp662 sp661 sp149 sp202 sp201 sp200 sp347 sp206 sp205 sp204 sp343 sp340 sp208 sp133 sp134 sp135 sp136 sp655 sp654 sp657 sp751 sp756 sp653 sp755 sp758 sp138 sp757 sp759 sp339 sp336 sp335 sp338 sp337 sp331 sp334 sp333 sp330 sp748 sp749 sp744 sp743 sp936 sp602 sp741 sp933 sp934 sp600 sp605 sp932 sp603 sp402 sp541 sp401 sp361 sp540 sp407 sp928 sp365 sp266 sp369 sp263 sp547 sp264 sp261 sp739 sp735 sp736 sp737 sp800 sp738 sp731 sp801 sp802 sp733 sp734 sp804 sp944 sp946 sp947 sp943 sp351 sp352 sp259 sp354 sp355 sp356 sp939 sp357 sp358 sp537 sp256 sp251 sp728 sp723 sp519 sp724 sp381 sp380 sp910 sp101 sp626 sp623 sp624 sp911 sp388 sp249 sp389 sp247 sp248 sp387 sp385 sp424 sp619 sp909 sp423 sp383 sp241 sp422 sp242 sp719 sp1100 sp1101 sp240 sp1102 sp245 sp528 sp527 sp526 sp805 sp508 sp409 sp375 sp915 sp377 sp239 sp378 sp418 sp608 sp413 sp607 sp919 sp412 sp372 sp373 sp609 sp414 sp707 sp706 sp514 sp709 sp411 sp516 sp379 sp515 sp234 sp517

Moorea (original)

MAC (USyd) The status of cophylogenetic analysis Phylomania 35 / 50

slide-39
SLIDE 39

Wolbachia on Moorea

With missing associations removed it’s not much better:

Host_Moorea associations Wolbachia_Moorea

sp453 wm357 sp648 wm356 sp967 wm354 wm353 sp39 wm351 sp79 wm350 sp697 wm213 sp656 wm212 sp590 wm205 wm209 sp1001 wm349 sp1096 wm347 sp4 wm348 sp763 wm346 sp764 wm345 sp493 wm340 sp301 wm342 sp1125 wm200 sp1163 wm203 sp31 wm202 sp7 wm337 sp299 wm338 sp439 wm339 sp17 wm331 sp1159 wm330 sp956 wm334 sp651 wm437 sp604 wm436 sp483 wm333 sp96 wm435 sp1097 wm332 wm235 sp99 wm234 sp35 wm232 sp958 wm228 sp21 wm286 sp304 wm229 sp682 wm285 sp262 wm227 sp1012 wm284 wm283 sp327 wm327 wm188 wm328 wm282 sp97 wm186 sp732 wm281 sp702 wm325 sp6 wm185 sp452 wm280 wm329 sp617 wm189 wm322 wm321 wm324 sp593 wm323 wm223 wm224 sp729 wm221 sp182 wm219 wm299 sp258 wm190 wm215 sp1090 wm193 sp482 wm296 wm295 wm216 sp32 wm192 wm195 sp1164 wm298 sp792 wm194 sp577 wm297 sp1053 wm218 sp1023 wm197 wm196 wm315 sp960 wm294 wm199 wm293 wm198 wm318 sp215 wm253 wm308 sp305 wm392 wm254 sp92 wm393 wm307 sp255 wm390 sp649 wm252 wm304 sp302 wm257 sp675 wm258 wm420 sp359 wm303 sp625 wm423 sp427 wm306 sp588 wm256 wm422 sp207 wm305 sp640 wm399 wm396 wm397 sp1117 wm395 wm310 sp137 wm413 wm414 sp376 wm415 wm248 wm416 wm417 wm418 sp980 wm419 sp289 wm380 wm241 wm381 wm382 sp254 wm243 wm434 sp1033 wm244 wm433 sp484 wm246 sp131 wm432 sp566 wm247 sp480 wm431 sp291 wm387 wm388 wm389 wm384 sp803 wm426 wm301 sp267 wm238 wm427 wm302 wm424 sp942 wm425 wm300 sp1016 wm428 wm429 sp962 wm401 sp428 wm277 sp371 wm278 wm275 wm370 wm276 wm273 sp908 wm274 sp716 wm374 sp693 wm375 wm270 sp472 wm378 wm379 wm376 wm377 sp606 wm369 wm266 wm412 sp964 wm411 wm410 wm268 wm269 wm263 wm264 sp687 wm265 wm360 wm361 wm362 wm260 sp1160 wm363 wm364 wm365 sp321 wm366 sp415 wm367 sp272 wm368 wm358 wm408 sp405 wm407 wm404 wm259 sp1057 wm402 wm403

Moorea (cleaned)

MAC (USyd) The status of cophylogenetic analysis Phylomania 36 / 50

slide-40
SLIDE 40

Wolbachia on Raiatea

Host_Raiatea associations Wolbachia_Raiatea

sp31 wr543 sp649 wr544 sp291 wr547 sp299 wr548 sp585 wr541 sp35 wr540 sp671 wr539 sp1150 wr533 wr532 sp1007 wr531 sp376 wr538 sp480 wr537 sp670 wr536 sp1112 wr535 sp97 wr530 wr528 sp920 wr529 sp490 wr564 sp958 wr567 sp963 wr566 wr560 wr563 wr562 wr559 sp960 wr556 sp686 wr555 sp350 wr554 wr553 wr552 sp1159 wr551 wr550 sp1052 wr491 sp244 wr490 sp21 wr492 wr494 sp617 wr496 sp492 wr498 wr502 sp96 wr503 sp1147 wr505 sp1146 wr580 wr581 wr584 sp405 wr489 wr585 sp967 wr582 wr487 wr583 wr488 sp1044 wr588 sp964 wr485 wr589 sp675 wr486 wr586 wr587 wr481 wr482 wr476 sp289 wr572 wr479 sp682 wr472 wr576 sp487 wr474 wr577 wr475 sp1097 wr470 sp750 wr526 sp752 wr520 sp698 wr522 wr523 wr468 wr465 wr518 wr517 wr469 sp1107 wr519 wr460 wr463 sp124 wr464 wr461 wr515 wr516 wr511 wr512 wr510 wr454 wr456 wr457 sp497 wr509 wr590 sp509 wr508 wr459 wr507 wr506 wr452 wr453

Raiatea

MAC (USyd) The status of cophylogenetic analysis Phylomania 37 / 50

slide-41
SLIDE 41

Wolbachia on Moorea

Host_Tahiti associations Wolbachia_Tahiti

sp618 wt641 sp282 wt788 sp594 wt789 sp490 wt642 sp679 wt786 sp960 wt640 sp1002 wt787 sp17 wt649 sp128 wt780 sp273 wt781 sp1159 wt647 sp433 wt648 sp592 wt645 sp1113 wt784 sp89 wt646 sp611 wt785 sp97 wt782 sp174 wt643 wt644 sp503 wt783 wt701 sp496 wt702 wt703 sp1150 wt705 sp721 wt805 sp686 wt775 sp31 wt776 sp297 wt777 sp477 wt630 sp713 wt778 sp231 wt631 sp72 wt779 sp692 wt636 sp304 wt637 wt638 wt639 sp344 wt770 sp989 wt632 sp640 wt771 sp785 wt772 sp328 wt633 sp585 wt634 sp498 wt773 sp646 wt774 sp235 wt635 wt629 wt660 sp612 wt768 sp961 wt663 sp606 wt766 sp439 wt664 wt661 sp494 wt764 sp714 wt662 wt765 sp615 wt762 sp563 wt667 sp716 wt763 wt760 wt665 wt761 wt666 sp921 wt800 sp489 wt757 sp90 wt758 wt759 sp581 wt650 sp39 wt804 wt803 wt754 sp952 wt755 sp495 wt802 wt652 sp955 wt756 wt801 sp644 wt750 wt655 sp1049 wt751 sp617 wt657 wt752 sp1134 wt658 wt659 wt739 sp1110 wt689 wt688 sp954 wt687 sp131 wt686 sp224 wt685 sp957 wt683 sp81 wt682 sp1022 wt681 wt680 sp305 wt604 wt606 sp1167 wt605 sp782 wt600 sp638 wt741 wt602 sp82 wt740 sp752 wt601 sp279 wt743 wt742 wt745 sp1095 wt744 sp747 wt747 sp1012 wt746 sp1004 wt749 sp720 wt748 sp309 wt728 wt729 sp651 wt677 wt676 wt679 wt678 wt675 wt674 sp656 wt730 wt734 wt733 sp1033 wt732 wt731 sp570 wt738 wt737 sp1001 wt736 wt599 wt717 wt718 wt719 wt618 sp444 wt593 sp293 wt594 sp42 wt595 sp584 wt596 wt622 wt621 sp35 wt624 sp1130 wt623 sp998 wt626 wt625 sp697 wt627 wt725 wt724 sp643 wt727 sp487 wt721 sp761 wt720 sp767 wt723 sp346 wt620 wt722 wt699 wt794 wt793 wt796 wt795 sp966 wt790 wt792 sp96 wt706 wt707 sp628 wt791 sp621 wt691 sp1005 wt690 wt607 wt692 sp695 wt608 sp642 wt695 sp287 wt798 sp1097 wt797 wt697 wt799 sp1132 wt696 sp569 wt610 wt617 sp281 wt616 wt615 wt614 sp308 wt715 wt714 wt713 sp291 wt711 sp529 wt710

Tahiti

MAC (USyd) The status of cophylogenetic analysis Phylomania 38 / 50

slide-42
SLIDE 42

The butterflies: Heliconius

Heliconius butterflies have a complex mimic/model system. Here we show two clades of the Heliconius genus, being different races

  • f the target/model species erato and the mimic species melpomene.

associations associations

eraFreera11 melFrethe21 eraFreera21 eraFreera31 eraFreera41 eraFrehyd41 melFremel41 eraFrehyd51 eraFrehyd61 eraTrihyd31 melTrimel51 eraWescyr11 melWescyt11 eraPeremm11 melPeragl41 eraPeremm21 eraPeremm31 eraColhyd21 melColmel21 melPeragl21 eraPerfav11 melPerama31 eraPerfav21 eraEasety11 p26 eraPanpet11 melPanros41 eraPanpet21 melPerama11 eraBraphy11 melBranan11 melPanros21 eraCospet11 melCosros11 melFrethe11 melPeragl11 melPeragl31 eraPanhyd11 melPanmel14 melPanros51 melPerama22 melPerama41 melPeragl51 melPanmel11 melPanros31 melPerama21

eraEasety11: species (erato

  • r melpomene), Geographical

region, race (based on pattern), clade number We can identify the (race×region) tips with almost no difficulty (there result in two different resolutions), into mimicry complexes: just the race and the region matter.

MAC (USyd) The status of cophylogenetic analysis Phylomania 39 / 50

slide-43
SLIDE 43

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-44
SLIDE 44

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-45
SLIDE 45

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-46
SLIDE 46

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-47
SLIDE 47

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-48
SLIDE 48

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-49
SLIDE 49

Heliconius mimicry complexes

  • eraFreera

melFrethe eraPanpet melPanros eraEasety p-26 eraColhyd melColmel eraPerfav melPerama eraBraphy melBranan eraPanhyd melPanmel eraFrehyd melFremel eraPeremm melPeragl eraCospet melCosros eraWescyr melWescyt eraTrihyd melTrimel

MAC (USyd) The status of cophylogenetic analysis Phylomania 40 / 50

slide-50
SLIDE 50

Possible histories

eraPanhyd eraFrehyd eraEasety eraPeremm eraCospet eraPanpet eraTrihyd eraColhyd eraFreera eraWescyr eraBraphy eraPerfav map 1/4-31.204105meliconia->erato 14 Codivergences 7 Duplications 1 Host switch 13 Losses MAC (USyd) The status of cophylogenetic analysis Phylomania 41 / 50

slide-51
SLIDE 51

Possible histories

eraPanhyd eraFrehyd eraEasety eraPeremm eraCospet eraPanpet eraTrihyd eraColhyd eraFreera eraWescyr eraBraphy eraPerfav map 2/4-31.204105meliconia->erato 14 Codivergences 7 Duplications 1 Host switch 13 Losses MAC (USyd) The status of cophylogenetic analysis Phylomania 41 / 50

slide-52
SLIDE 52

Possible histories

eraPanhyd eraFrehyd eraEasety eraPeremm eraCospet eraPanpet eraTrihyd eraColhyd eraFreera eraWescyr eraBraphy eraPerfav map 3/4-29.781637meliconia->erato 16 Codivergences 6 Duplications 13 Losses MAC (USyd) The status of cophylogenetic analysis Phylomania 41 / 50

slide-53
SLIDE 53

Possible histories

eraPanhyd eraFrehyd eraEasety eraPeremm eraCospet eraPanpet eraTrihyd eraColhyd eraFreera eraWescyr eraBraphy eraPerfav map 4/4-31.096415meliconia->erato 12 Codivergences 8 Duplications 2 Host switches 13 Losses MAC (USyd) The status of cophylogenetic analysis Phylomania 41 / 50

slide-54
SLIDE 54

Congruence found!

Randomizing both P and ϕ in Jane 2 we can estimate significance. All 1000 randomizations of P have higher cost than the original: p < ≈0.0005 in both cases.

MAC (USyd) The status of cophylogenetic analysis Phylomania 42 / 50

slide-55
SLIDE 55

Part V Closing

MAC (USyd) The status of cophylogenetic analysis Phylomania 43 / 50

slide-56
SLIDE 56

Summary

There often isn’t the kind of quality of data that we initially hoped for. Codivergence is not the norm. There are more algorithms emerging, and more programs becoming available; these will need thorough testing. There has been some interesting progress in understanding the complexity of the cophylogeny mapping problem. There remain many challenging open questions in biology, mathematics & statistics, and computer science in this area. Thanks!

MAC (USyd) The status of cophylogenetic analysis Phylomania 44 / 50

slide-57
SLIDE 57

Thanks to. . .

Jennifer Hoyal Cuthill (mimicry analysis) Patricia Sim˜

  • es

(wolbachia)

INRIA (France)

eee

Ben Drinkwater (simulations)

Australian Research Council

$$$

MAC (USyd) The status of cophylogenetic analysis Phylomania 45 / 50

slide-58
SLIDE 58

References I

  • D. R. Brooks.

How to do BPA, really. Journal of Biogeography, 28:345–358, 2001.

  • M. A. Charleston.

Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Mathematical Biosciences, 149(2):191–223, May 1998.

  • M. A. Charleston.

Recent results in cophylogeny mapping., volume 54 of Advances in Parasitology, pages 303–330. Elsevier Academic Press, Amsterdam, 2003.

  • B. DasGupta, S. Ferrarini, U. Gopalakrishnan, and N. R. Paryani.

Inapproximability results for the lateral gene transfer problem. In ICTCS, pages 182–195, 2005.

  • H. Fahrenholz.

Ectoparasiten und abstammungslehre. Zoologischer Anzeiger, 41:371–374, 1913.

  • P. Legendre, Y. Desdevises, and E. Bazin.

A statistical test for host-parasite coevolution. Systematic Biology, 51(2):217–234, 2002.

  • R. Libeskind-Hadas and M. Charleston.

On the computational complexity of the reticulate cophylogeny reconstruction problem. Journal of Computational Biology, 16(1):05–117, 2009. doi:10.1089/cmb.2008.0084. MAC (USyd) The status of cophylogenetic analysis Phylomania 46 / 50

slide-59
SLIDE 59

References II

  • Y. Ovadia, D. Fielder, C. Conow, and R. Libeskind-Hadas.

The cophylogeny reconstruction problem is np-complete. Journal of Computational Biology, 2010. doi:10.1089/cmb.2009.0240.

  • F. Ronquist.

Parsimony analysis of coevolving species associations. In R. D. M. Page, editor, Tangled trees: phylogeny, cospeciation, and coevolution, pages 22–64. Chicago University Press, Chicago, 2002.

  • A. Stamatakis, A. F. Auch, J. Meier-Kolthoff, and M. G¨
  • ker.

Axpcoords & parallel axparafit: statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics, 8:405, 2007. MAC (USyd) The status of cophylogenetic analysis Phylomania 47 / 50

slide-60
SLIDE 60

Work underway

Implementing the discrete likelihood model in TreeMap∗ Implementing the optimization problem as a Linear Program/SAT

and other methods

MAC (USyd) The status of cophylogenetic analysis Phylomania 48 / 50

slide-61
SLIDE 61

Desirable

Faster heuristics for optimizing solutions Consensus methods

MAC (USyd) The status of cophylogenetic analysis Phylomania 49 / 50

slide-62
SLIDE 62

Open questions

Is there a bigger ǫ such that the LGT problem can be solved to within 1 + ǫ in polynomial time? What to do about multi-host parasites?

MAC (USyd) The status of cophylogenetic analysis Phylomania 50 / 50