the status of cophylogenetic analysis
play

The status of cophylogenetic analysis Michael Charleston University - PowerPoint PPT Presentation

The status of cophylogenetic analysis Michael Charleston University of Sydney Phylomania 2010.11.04-05 MAC (USyd) The status of cophylogenetic analysis Phylomania 1 / 50 Part I Background MAC (USyd) The status of cophylogenetic analysis


  1. The status of cophylogenetic analysis Michael Charleston University of Sydney Phylomania 2010.11.04-05 MAC (USyd) The status of cophylogenetic analysis Phylomania 1 / 50

  2. Part I Background MAC (USyd) The status of cophylogenetic analysis Phylomania 2 / 50

  3. Some motivation About 75% of emergent human diseases are zoonoses , (SARS, HIV, Ebola, H1N1, . . . ). ● ● 25 ● Understanding where an organism came from (e.g., 20 ● ● ● ● invading pests) can tell us how number 15 ● ● better to combat them. ● ● 10 ● 5 ● ● ● ● ● ● ● ● ● ● ● 0 ● ● 1985 1990 1995 2000 2005 year Figure 1: Numbers of papers cited in PubMed with co[-](speciat|diverg)* in the title or abstract MAC (USyd) The status of cophylogenetic analysis Phylomania 3 / 50

  4. Different systems can coevolve at the macroscopic level horizontal transfer/ host switch invasion vicariant speciation loss/ extinction duplication/ independent speciation vicariant speciation codivergence/ duplication/ cospeciation independent speciation hosts and their parasites or pathogens; geographical areas and the species whole organisms and their genes; which inhabit them. MAC (USyd) The status of cophylogenetic analysis Phylomania 4 / 50

  5. Introduction The goal is to determine, for two groups of ecologically linked taxa, what were the evolutionary paths they took with respect to each other. We aim to answer questions like: How long is the association between host and parasite? Did they cospeciate? Were there host switches or lateral gene transfers? What kind of risk of cross-infection does this pathogen present to its sister species? MAC (USyd) The status of cophylogenetic analysis Phylomania 5 / 50

  6. Problem Instance Given a host phylogeny H an associate phylogeny P known associations ϕ of the tips of P with those of H We call a problem instance a tanglegram , such as T = ( H, P, ϕ ). The object is to find out the ancestral relationships between P and H . This mostly comes down to an optimization problem. MAC (USyd) The status of cophylogenetic analysis Phylomania 6 / 50

  7. Coevolutionary events miss the boat extinction codivergence failure to diverge ghost host-switch Host unsuccessful Pathogen host switch Untraceable duplication MAC (USyd) The status of cophylogenetic analysis Phylomania 7 / 50

  8. miss the boat extinction codivergence failure to diverge ghost → host-switch Host unsuccessful Pathogen host switch Untraceable duplication Definition A codivergence event occurs when internal vertices p ∈ V ( P ) and h ∈ V ( H ) are coincident, and the children of p diversify on the children of h . MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

  9. miss the boat extinction ↓ codivergence failure to diverge ghost host-switch Host unsuccessful Pathogen host switch Untraceable duplication Definition A duplication occurs when p is associated with an arc of H rather than a vertex; this corresponds to a speciation or divergence of p that is independent of a divergence event in the host. MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

  10. miss the boat extinction codivergence failure to diverge ghost → host-switch Host unsuccessful Pathogen host switch Untraceable duplication Definition A host switch occurs for some arc ( p, q ) ∈ A ( P ) where p is associated with a location in H that is contemporary with, but not ancestral to, the location in H with which q is associated. MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

  11. ← miss the boat extinction ↑ codivergence failure to diverge ghost host-switch Host unsuccessful Pathogen host switch Untraceable duplication Definition A loss occurs as the result of one of three things, which are indistinguishable: extinction of some p , failure to track both hosts after a host divergence event (“missing the boat”) and simple failure to sample the pathogen p . MAC (USyd) The status of cophylogenetic analysis Phylomania 8 / 50

  12. What we can recover Ronquist confirmed in 2002 [9] that these are the only four types of recoverable event for this problem: 1 codivergence, 2 duplication, 3 loss, and 4 host switching All methods (attempt to) recover codivergence, but not all can recover host switching. Some only recover codivergence, duplication and loss. We would like also to recover failure to diverge events, where a parasite of a speciating host continues to parasitize it, without divergence. MAC (USyd) The status of cophylogenetic analysis Phylomania 9 / 50

  13. Event costs We can assign a cost to each event type, subject to simple constraints: [2] the biological rule, c < d, ℓ, w (for codivergence, duplication, loss and host switch respectively) and the pragmatic rule, 0 ≤ c, d, w, ℓ (which allows a dynamic program to solve the optimization problem). MAC (USyd) The status of cophylogenetic analysis Phylomania 10 / 50

  14. Event costs Jane 1 & 2 use dynamic programming to minimise total cost, with event costs prescribed. Event Jane Cost TreeMap Cost Cospeciation c 0 0 Duplication d 1 1 Host Switch w 1 1 Loss/Sorting ℓ 2 1 TreeMap puts default costs on events but does not normally use them: it finds a Pareto front of solutions, and has worst-case exponential running time. Jane 1 uses an O ( n 7 ) algorithm to find minimal cost reconstructions; Jane 2 has an algorithm down to O ( n 3 ). The penalty in O ( n 7 ) → O ( n 3 ) is a loss of approximately 0.1% in performance, which is definitely acceptable. MAC (USyd) The status of cophylogenetic analysis Phylomania 11 / 50

  15. Part II Everything Is Against Us MAC (USyd) The status of cophylogenetic analysis Phylomania 12 / 50

  16. Myzled by the implickit paradidgem Farenholz’ Rule [5] , that “parasite phylogeny mirrors host phylogeny,” has misled gophers associations lice talpoides wardi us. minor The classic bottae thomomyus cophylogeny example bursarius actuosi of gophers and lice ewingi hispidus (left) is wonderful, but chapini in fact such cases are cavator panamensis rare: more and more setzeriA underwoodi studies are showing setzeriB cherriei lack of evidence for cherriei codivergence , despite heterodus costaricensis similarity . p < 0 . 01; much apparent congruence & codivergence MAC (USyd) The status of cophylogenetic analysis Phylomania 13 / 50

  17. This is more common Many studies look for congruence with inappropriate tools 3 host associations parasite Program crashes: ∴ problem with program MAC (USyd) The status of cophylogenetic analysis Phylomania 14 / 50

  18. Empirical evidence of complexity 10000 The number of feasible Feasible POpt maps increases rapidly 1000 for even modest numbers of taxa. # maps The number of maps in 100 the Pareto front – those which could be optimal 10 for some scheme of event costs – also increases quickly. 1 7 2 3 4 5 6 from Charleston 2003 [3] # taxa MAC (USyd) The status of cophylogenetic analysis Phylomania 15 / 50

  19. Empirical evidence of complexity 10000 1000 100 The number of feasible 10 n = 2 n = 3 maps is also highly n = 4 n = 5 correlated with the n = 6 n = 7 degree of incongruence. 1 0 2 4 6 8 10 12 14 from Charleston 2003 [3] degree of fit (min. NCEs) MAC (USyd) The status of cophylogenetic analysis Phylomania 16 / 50

  20. Cophylogeny mapping is NPC We begin with the Generalized Cophylogeny Reconstruction Problem ( Gcrp ). This is a 6-tuple ( H = ( V H , E H ) , P = ( V P , E P ) , t H , t P , ϕ, κ ) where H is the host network , P the parasite network , t H and t P are timing functions for H and P that map each vertex to a set of permitted times, ϕ is defined as before, and κ is a 4-tuple cost vector κ = ( c, d, w, ℓ ) for codivergence, duplication, host switch and loss respectively. The objective is to find a mapping Φ : P �→ H that extends ϕ , can be constructed using the usual events with respect to the timing functions, and is of minimum total cost. MAC (USyd) The status of cophylogenetic analysis Phylomania 17 / 50

  21. Gcrp Theorem Gcrp is solvable in polynomial time for the set of instances ( H = ( V H , E H ) , P = ( V P , E P ) , t H , t P , ϕ, κ ) such that (i) P is a tree and (ii) for all v ∈ V H , | t H ( v ) | = 1 (Proof is by construction of a polynomial time algorithm for this case using a dynamic program. See Libeskind-Hadas & Charleston [7] for details.) MAC (USyd) The status of cophylogenetic analysis Phylomania 18 / 50

  22. Gcrdp We first define the Generalized Cophylogeny Reconstruction Decision Problem ( Gcrdp ) as follows: Instance: Given ( H = ( V H , E H ) , P = ( V P , E P ) , t H , t P , ϕ, κ ) and a cost K . Question: Does there exist a reconstruction whose cost is K or less? Theorem The decision problem associated with Gcrp is NP-complete for the set of instances ( H = ( V H , E H ) , P = ( V P , E P ) , t H , t P , ϕ, κ ) such that (i) P is a tree and (ii) for all v ∈ V ( H ) , | t H ( v ) | ≤ 2 . (Proof is by reduction to 3- Sat : see Libeskind-Hadas & Charleston [7] for details.) MAC (USyd) The status of cophylogenetic analysis Phylomania 19 / 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend