Transmission tree reconstruction by augmentation of internal - - PowerPoint PPT Presentation

transmission tree reconstruction by augmentation of
SMART_READER_LITE
LIVE PREVIEW

Transmission tree reconstruction by augmentation of internal - - PowerPoint PPT Presentation

Transmission tree reconstruction by augmentation of internal phylogeny nodes Matthew Hall Li Ka Shing Institute for Health Information and Discovery, University of Oxford February 2017 Matthew Hall (Oxford) Transmission tree reconstruction


slide-1
SLIDE 1

Transmission tree reconstruction by augmentation of internal phylogeny nodes

Matthew Hall

Li Ka Shing Institute for Health Information and Discovery, University of Oxford

February 2017

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 1 / 29

slide-2
SLIDE 2

The relationship of the phylogeny to the transmission tree

Let T be a time-tree (rooted, with branch lengths in units of time). Let V be its node set of size n. Suppose the isolates at the tips of T come from a set of H of hosts. Initial assumptions:

Complete sampling of the epidemic since the TMRCA No superinfection or reinfection Transmission is a complete bottleneck

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 2 / 29

slide-3
SLIDE 3

The relationship of the phylogeny to the transmission tree

The transmission tree N (a DAG whose nodes are the members of H, depicting which host infected which other) can be represented by a map d : V → H taking each node to a host (tips to the host they were sampled from). Visualised by collapsing the nodes in the preimage of each h ∈ H under d to a single node.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 3 / 29

slide-4
SLIDE 4

The relationship of the phylogeny to the transmission tree

The transmission tree N (a DAG whose nodes are the members of H, depicting which host infected which other) can be represented by a map d : V → H taking each node to a host (tips to the host they were sampled from). Visualised by collapsing the nodes in the preimage of each h ∈ H under d to a single node.

J A B G F H C I E D

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 3 / 29

slide-5
SLIDE 5

The simplest version

Assume that the phylogeny and transmission tree coincide; internal nodes are transmission events. This implies no within-host diversity and necessitates no more than one tip per host. If n is internal with children nC1 and nC2, then either d(n) = d(nC1) or d(n) = d(nC2). Trivially 2n−1 transmission trees for a fixed T .

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 4 / 29

slide-6
SLIDE 6

Within-host diversity

If within-host diversity is assumed then internal nodes are coalescences of two lineages within a host. The subgraph induced by the preimage of d for any host must be connected. An extra set of parameters q represent the infection times. Question: How many transmission trees for a fixed T ? (Depends on the topology.)

With one tip per host? With ≥ 1 tip per host? (Sometimes 0.)

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 5 / 29

slide-7
SLIDE 7

Simultaneous MCMC reconstruction of phylogeny and transmission tree

In either case we get an (injective but not surjective) map z from the set of possible ds to the space of transmission trees. Thus an MCMC method that samples from the posterior distribution

  • f phylogenies with internal node augmentation obeying either set of

rules simultaneously samples from the posterior distribution of transmission trees. Not only a method for reconstructing N, but a population model (tree prior) for reconstruction of T that is more realistic for an

  • utbreak than the standard unstructured coalescent models.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 6 / 29

slide-8
SLIDE 8

Decomposition

Let S be the sequence data and φ the various model parameters. Without within-host diversity: p(T , d, φ|S) = p(S|T )p(T , d|φ)p(φ) p(S) p(S|T ) is the standard phylogenetic likelihood and p(T , d|φ) the probability of observing the augmented tree under a transmission model. With within-host diversity: p(T , d, q, φ|S) = p(S|T )p(T |N, q, φ)p(N, q|φ)p(φ) p(S) p(N, q|φ) is the probability of the transmission tree and its timings as above; p(T |N, q, φ) is the probability of the within-host mini-phylogenies under a coalescent process.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 7 / 29

slide-9
SLIDE 9

MCMC implementation

Hall et al., 2015 implemented simultaneous reconstruction of both trees in BEAST, with MCMC proposals that respect the rules of node augmentation. Several other approaches (e.g. Didelot et al., 2014, Morelli et al., 2012; Ypma et al., 2013; Klinkenberg et al., 2017) with recent work on the incomplete sampling problem (Didelot et al., 2016; Lau et al., 2016).

i) ii) iii) ii) i) iii) iv)

50% 50%

Exchange Subtree slide Wilson-Balding

50% 50%

Exchange Subtree slide Wilson-Balding

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 8 / 29

slide-10
SLIDE 10

43617 tips

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 9 / 29

slide-11
SLIDE 11

The BEEHIVE study

NGS short-read sequence data acquired from samples taken from European (and one African) HIV cohort studies.

Some cohorts go back to the early epidemic in the 1980s

Current data from 3138 individuals Epidemiology: age, gender, date of first positive test, countries of

  • rigin and infection, risk group, ART dates, etc.

Sequences from one time point only (with a few exceptions) Rather than making a consensus sequence from each host’s reads, we want to use everything.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 10 / 29

slide-12
SLIDE 12

Phyloscanner: phylogenetic analysis of NGS pathogen data

sequencing mapping to references

Idea: align all short reads from all hosts to a reference genome and slide a window across the genome, building a phylogeny for the reads

  • verlapping each window.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 11 / 29

slide-13
SLIDE 13

Phyloscanner: phylogenetic analysis of NGS pathogen data

Identical reads from a single host are merged but the duplicate counts kept as tip traits We use RAxML for reconstruction Tips are not associated with each other across different windows, but hosts are.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 12 / 29

slide-14
SLIDE 14

The topological signal of transmission

  • Once we have many tips from

each host, transmission has a topological signal. Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. Starts to look like a parsimony problem.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29

slide-15
SLIDE 15

The topological signal of transmission

  • Once we have many tips from

each host, transmission has a topological signal. Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. Starts to look like a parsimony problem.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29

slide-16
SLIDE 16

The topological signal of transmission

  • Once we have many tips from

each host, transmission has a topological signal. Direct transmission is suggested when the clade from the infectee is not monophyletic (Romero-Severson et al., 2016) but in general we only see the direction of transmission from the topology. Starts to look like a parsimony problem.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 13 / 29

slide-17
SLIDE 17

Challenges reconstructing transmission from this data

43617 tips Datasets:

Enormous size Contamination present Coverage is uneven

Epidemiology:

Sampling is incomplete Multiple infections present Bottleneck at transmission may be wide (IDUs)

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 14 / 29

slide-18
SLIDE 18

Transmission tree reconstruction using parsimony

For a fixed tree, we aim to: Reconstruct hosts from those represented in the tips to internal nodes in the tree

But also allow reconstruction to “a host outside the dataset’ as required by incomplete sampling

Minimise the number of infection events amongst hosts in the

  • dataset. . .

. . . except, penalise reconstructions which suggest an unreasonable amount of genetic diversity stemming from a single infection event.

Identify multiple infections and contaminations

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 15 / 29

slide-19
SLIDE 19

Transmission tree reconstruction using parsimony

Suppose the T has nodes V and we are reconstructing characters from the set of states S. The cost function c(p, q; i, j) determines the cost of transitioning from state i to state j along the branch from p to q (in the direction away from the root). We take a node- and edge- dependent c on a known tree; costs for transitions vary depending on the states involved and the branch on which they occur. The lowest cost reconstruction is found with the Sankhoff algorithm.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 16 / 29

slide-20
SLIDE 20

Transmission tree reconstruction using parsimony

If reads are taken from n hosts h1, . . . , hn making up the study population, we use the hi as states along with an “unsampled” state u. Assume that the root of the tree was in the unsampled state (using an outgroup if required). Tips from outside the study population (the

  • utgroup, other reference sequences, contaminants) are assigned u as

a state. We are interested in minimising the cost of infections of members of the study population, but not hosts outside that population, so c(p, q; h, u) = 0 for all p, q, h. Reconstruction starts by putting a u on the root. (Otherwise it will always be cheap to reconstruct some hi to the root, plausible or no.)

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 17 / 29

slide-21
SLIDE 21

Transmission tree reconstruction using parsimony

Let l(p, i) be the sum of the branch lengths in the subtree rooted at p pruned so that only tips from i remain, or ∞ if there are no such tips. Then we take c(p, q; i, j) = 1 + kl(p, j) with k ∈ R+ a tunable parameter. This penalises reconstructions with unreasonable amounts of within-host diversity (right).

  • r

r

  • p

p q q s s h u l1 l2

Left: C = 1 + k(l1 + l2) Right: C = 2

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 18 / 29

slide-22
SLIDE 22

Transmission tree reconstruction using parsimony

  • c(p, q; h, u) = c(p, q, h, u) for all p, q, h. Above trees have equal C.

Choose to break ties always towards h (for dense sampling), always towards u (conservative), or based on branch lengths.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 19 / 29

slide-23
SLIDE 23

Parsimony for detection of contaminants

Small numbers of contaminant reads are frequent, where a virus from the wrong host has been sequenced The parsimony reconstruction does double duty to detect these Find “multiple introductions” where the tips in one split have very low read counts, and ignore those tips

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 20 / 29

slide-24
SLIDE 24

Without the assumptions of a single infection event per host and complete sampling, the “transmission tree” has multiple nodes per host and unsampled nodes.

  • A

B1 B2

  • B

Augmented phylogeny "Transmission tree"

U A

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 21 / 29

slide-25
SLIDE 25

Example

Known HIV transmission chain (Lemey et al., 2005, Vrancken et al., 2014). Reconstruction on RAxML trees for env and pol genes

Host True env pol A B? U U B A? U A C B B B D C D D E C C C F A U A G F U F H B B B I B B B K E E E L C C C I E F B A G L D C K H

?

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 22 / 29

slide-26
SLIDE 26

Full phyloscanner results

Full output from phyloscanner is a separate phylogeny for the reads in each genome window We would like to use these in a manner similar to bootstrapping, to indicate support for topological relationships The phylogenies are not trivially comparable across windows as the tips are not the same The hosts are the same, so the trasmission trees are more comparable, but:

Some hosts are absent from some windows due to sequencing problems One or more nodes for unsampled regions Potentially multiple nodes per host

Question: Can we nonetheless give a summary or median transmission tree?

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 23 / 29

slide-27
SLIDE 27

Classification of relationships

For the time being, concentrate on classifying pairwise relationships between hosts on each window Contiguity: is the subgraph induced by the nodes from the pair, with perhaps some unsampled nodes, connected? Descent: Are all the nodes from one patient ancestral to those from the other? Mean patristic distance in the phylogeny Then count relationships across windows

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 24 / 29

slide-28
SLIDE 28

Improved HIV cluster detection

By setting a threshold on patristic distance we can refine the procedures for identifying HIV clusters with likely directionality.

5 7 1 5 1 3 2 3 3 1 1 1 2 1

BEE0221-1 BEE0239-1 BEE0220-1 BEE0260-1 Matthew Hall (Oxford) Transmission tree reconstruction February 2017 25 / 29

slide-29
SLIDE 29

Improved HIV cluster detection

7 6 5 1 2 2 1 7 7 9 7 1 3 2 3 6 2 8 9 9 5 3 8 1 4 2 1 5 5 3 1 1 6 6 1 2 0 2 4 6 1 3 1 1 1 3 1 1 6 7 1 1 7 2 6 2 1 2 1 6 6 2 4 5 6 3 1 7 3 1 3 9 9 3 6 6 1 7 2 1 1 1 2 8 7 3 6 4 2 22 1 7 3 8 1 4 3 6 9 1 0 1 1 6 6 9 1 4 1 2 3 2 1 1 3 7 5 3 3 6 4 1 8 2 9 3 2 2 4 5 1 2 5 6 2 3 3 6 3 4 7 1 5 1 0 2 5 2 7 8 3 1 1 7 2 5 1 9 1 0 6 1 5 6 3 4 6 1 5 1 2 1 0 5 1 3 1 1 4 3 1 3 7 2 1 3 9 6 1 9 2 2 2 7 2 0 1 1 1 6 2 4 6 7 1 2 1 6 2 7 1 4 6 4 1 0 1 4 6 1 7 1 0 1 8 1 1 4 6 1 3 7 1 2 3 2 1 7 1 6 4 3 8 1 0 2 1 6 1 1 6 5 1 1 1 1 5 1 4 6 6 1 3 7 1 8 9 1 1 6 2 2 6 3 2 5 6 5 7 2 5 2 1 4 3 5 4 5 3 6 3 1 1 7 3 5 7 1 1 4 1 2 7 7 1 8 3 2 8 3 1 2 6 4 4 2 3 3 4 2 1 0 3 5 5 1 2 1 0 1 1 7 2 3 1 2 4 3 1 0 2 1 8 2 2 2 4 1 1 1 4 2 9 8 2 1 5 2 4 1 4 5 4 5 2 4 4 1 1 1 2 2 1 5 1 1 1 7 6 5 4 1 2 1 3 4 1 0 1 7 7 2 4 8 1 1 2 4 4 5 3 2 4 3 6 2 1 5 4 1 4 2 1 8 9 3 1 4 1 3 9 7 1 2 7 1 2 6 1 1 1 9 8 5 2 1 4 1 1 0 1 3 6 1 4 4 3 6 7 1 1 1 1 3 1 8 BEE0795-1 BEE0254-1 BEE0248-1 BEE0726-1 BEE0093-1 BEE0015-1 BEE0201-1 BEE0197-1 BEE0095-1 BEE0098-1 BEE0110-1 BEE0577-1 BEE0327-1 BEE0442-1 BEE0323-1 BEE0394-1 BEE0037-1 BEE0036-1 BEE0198-1 BEE0843-1 BEE0840-1 BEE0240-1 BEE0826-1 BEE0728-1 BEE0032-1 BEE0175-1 BEE0009-1 BEE0215-1 BEE0804-1 BEE0527-1 BEE0564-1 BEE0738-1 BEE0023-1 BEE0112-1 BEE0351-1 BEE0321-1 BEE0427-1 BEE0331-1 BEE0426-1 BEE0809-1 BEE0109-1 BEE0102-1 BEE0205-1 BEE0100-1 BEE0259-1 BEE0042-1 BEE0529-1 BEE0234-1 BEE0096-1 BEE0241-1 BEE0243-1 BEE0193-1 BEE0200-1 BEE0146-1 BEE0117-1 BEE0174-1 BEE0735-1 BEE0043-1 BEE0062-1 BEE0115-1 BEE0018-1 BEE0347-1 BEE0567-1 BEE0122-1 BEE0542-1 BEE0541-1 BEE0019-1 BEE0141-1 BEE0803-1 BEE0260-1 BEE0791-1 BEE0237-1 BEE0592-1 BEE0181-1 BEE0017-1 BEE0758-1 BEE0139-1 BEE0057-1 BEE0107-1 BEE0014-1 BEE0767-1 BEE0114-1 BEE0108-1 BEE0345-1 BEE0206-1 BEE0395-1 BEE0799-1 BEE0368-1 BEE0223-1 BEE0012-1 BEE0464-1 BEE0474-1 BEE0341-1 BEE0160-1 BEE0161-1 BEE0164-1 BEE0573-1 BEE0154-1 BEE0454-1 BEE0532-1 BEE0130-1 BEE0736-1 BEE0807-1 BEE0290-1 BEE0322-1 BEE0470-1 BEE0213-1 BEE0662-1 BEE0305-1 BEE0031-1 BEE0534-1 BEE0273-1 BEE0555-1 BEE0794-1 BEE0808-1 BEE0131-1 BEE0092-1 BEE0222-1 BEE0545-1 BEE0162-1 BEE0144-1 BEE0572-1 BEE0877-1 BEE0065-1 BEE0231-1 BEE0097-1 BEE0121-1 BEE0465-1 BEE0780-1 BEE0398-1 BEE0539-1 BEE0797-1 BEE0385-1 BEE0473-1 BEE0221-1 BEE0050-1 BEE0220-1 BEE0560-1 BEE0734-1 BEE0040-1 BEE0258-1 BEE0284-1 BEE0849-1 BEE0867-1 BEE0053-1 BEE0301-1 BEE0353-1 BEE0696-1 BEE0789-1 BEE0709-1 BEE0168-1 BEE0547-1 BEE0723-1 BEE0148-1 BEE0574-1 BEE0016-1 BEE0851-1 BEE0230-1 BEE0225-1 BEE0725-1 BEE0182-1 BEE0178-1 BEE0552-1 BEE0669-1 BEE0059-1 BEE0710-1 BEE0657-1 BEE0219-1 BEE0285-1 BEE0330-1 BEE0798-1 BEE0120-1 BEE0214-1 BEE0584-1 BEE0538-1 BEE0733-1 BEE0686-1 BEE0034-1 BEE0543-1 BEE0557-1 BEE0253-1 BEE0147-1 BEE0335-1 BEE0318-1 BEE0599-1 BEE0263-1 BEE0049-1 BEE0247-1 BEE0255-1 BEE0233-1 BEE0406-1 BEE0428-1 BEE0086-1 BEE0085-1 BEE0886-1 BEE0381-1 BEE0865-1 BEE0854-1 BEE0212-1 BEE0885-1 BEE0563-1 BEE0287-1 BEE0386-1 BEE0409-1 BEE0437-1 BEE0821-1 BEE0741-1 BEE0815-1 BEE0811-1 BEE0196-1 BEE0142-1 BEE0227-1 BEE0153-1 BEE0265-1

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 26 / 29

slide-30
SLIDE 30

Conclusions

The transmission tree can be viewed as an augmentation of the internal nodes of a phylogeny with host information, subject to various sets of rules. Considerable recent work in reconstructing it for smaller datasets, usually using MCMC. Phyloscanner allows reconstruction with big, NGS datasets. Work remains to be done for rigorous treatment of the output.

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 27 / 29

slide-31
SLIDE 31

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 28 / 29

slide-32
SLIDE 32

Acknowledgements

Oxford Christophe Fraser Chris Wymant Imperial Oliver Ratmann Edinburgh Andrew Rambaut Mark Woolhouse

Matthew Hall (Oxford) Transmission tree reconstruction February 2017 29 / 29