Family-joining: A method for constructing generally labeled trees - - PowerPoint PPT Presentation

family joining a method for constructing generally
SMART_READER_LITE
LIVE PREVIEW

Family-joining: A method for constructing generally labeled trees - - PowerPoint PPT Presentation

Family-joining: A method for constructing generally labeled trees Prabhav Kalaghatgi Max Planck Institute for Informatics Saarbrcken AREVIR, Cologne, April 29 2016 A phylogenetic tree is a model of evolutionary relationship Chang et al. Mol


slide-1
SLIDE 1

Family-joining: A method for constructing generally labeled trees

Prabhav Kalaghatgi

Max Planck Institute for Informatics Saarbrücken

AREVIR, Cologne, April 29 2016

slide-2
SLIDE 2

A phylogenetic tree is a model of evolutionary relationship

Chang et al. Mol Biol Evol; 2002 Prabhav Kalaghatgi 2/19

slide-3
SLIDE 3

Assumptions of current phylogenetic methods

Leaf labeled trees

O1 O2 O3 O8 O9 L7 L3 L4 O4 L5 O5 O6 O7 L6 L1 L2

Observed species Unobserved ancestors

Prabhav Kalaghatgi 3/19

slide-4
SLIDE 4

Assumptions of current phylogenetic methods

Leaf labeled trees

L1 L3 L4 L6 L7 O4 O6 O7 O8 O9 O2 O3 O1 O1 O2 O3 O8 O9 L7 L3 L4 O4 L5 O5 O6 O7 L6 L1 L2 O5

Generally labeled trees Observed species Unobserved ancestors Observed ancestor

Prabhav Kalaghatgi 3/19

slide-5
SLIDE 5

Assumptions of current phylogenetic methods

Leaf labeled trees

L1 L3 L4 L6 L7 O4 O6 O7 O8 O9 O2 O3 O1 O1 O2 O3 O8 O9 L7 L3 L4 O4 L5 O5 O6 O7 L6 L1 L2 O5

Generally labeled trees Observed species Unobserved ancestors Unobserved ancestors Observed ancestor

Prabhav Kalaghatgi 3/19

slide-6
SLIDE 6

Relationship types: parent-child and siblings

∆i,j = Avgk(djk − dik − dij) sibling parent-child

Select parent-child over sibling if

Prabhav Kalaghatgi 4/19

slide-7
SLIDE 7

Family-joining (FJ) method

O1 O2 O3 L1 L2 O4 L3 O5 O6 O9 O7 O8 1 2 4 3 1 2 1 3 5 1 1 O9 O2 O3 O4 O5 O6 O7 O8 O1 O2 O3 O4 O5 O6 O7 O8 3 8 8 9 10 8 12 7 9 9 10 11 9 13 8 6 7 8 6 10 5 1 6 4 8 3 7 5 9 4 4 8 3 6 1 5 O1 0 O9

Distances based on this tree Tree simulated for illustration

Prabhav Kalaghatgi 5/19

slide-8
SLIDE 8

Family-joining (FJ) method

O1 O9 O8 O7 O6 O5 O4 O3 O2

Unresolved tree topology

O9 O2 O3 O4 O5 O6 O7 O8 O1 O2 O3 O4 O5 O6 O7 O8 3 8 8 9 10 8 12 7 9 9 10 11 9 13 8 6 7 8 6 10 5 1 6 4 8 3 7 5 9 4 4 8 3 6 1 5 O1 0 O9

Tree-additve distances A

Prabhav Kalaghatgi 5/19

slide-9
SLIDE 9

Family-joining (FJ) method

O2 O1 O3 O4 O5 O6 O7 O8 O9 L1

O1, O2 Siblings with latent parent

O9 O3 O4 O5 O6 O7 O8 O3 O4 O5 O6 O7 O8 6 7 8 6 10 5 1 6 4 8 3 7 5 9 4 4 8 3 6 1 5 O9 L1 L1 7 7 8 9 7 11 6

B

Prabhav Kalaghatgi 5/19

slide-10
SLIDE 10

Family-joining (FJ) method

O1 O2 O3 O6 O7 O4 O5 O8 O9 L1

O4, O5 Parent-child

O9 O3 O4 O6 O7 O8 O3 O4 O6 O7 O8 6 8 6 10 5 6 4 8 3 4 8 3 6 1 5 O9 L1 L1 7 7 9 7 11 6

C

Prabhav Kalaghatgi 5/19

slide-11
SLIDE 11

Family-joining (FJ) method

D

O1 O2 O3 O6 O7 O4 O5 O8 O9 L1 L2

L1, O3 Siblings with latent parent

O9 O4 O6 O7 O8 O4 O6 O7 O8 6 4 8 3 4 8 3 6 1 5 O9 L2 L2 3 5 3 7 2

Prabhav Kalaghatgi 5/19

slide-12
SLIDE 12

Family-joining (FJ) method

E

O9 O6 O7 O8 O6 O7 O8 4 8 3 6 1 5 O9 L3 L3 4 2 6 1 O1 O2 O3 O6 O7 O8 O9 L1 L2 O4 O5 L3

L2, O4 Siblings with latent parent

Prabhav Kalaghatgi 5/19

slide-13
SLIDE 13

Family-joining (FJ) method

F

O9 O7 O8 O7 O8 6 1 5 O9 O1 O2 O3 L1 L2 O4 L3 O5 O9 O6 O8 O7

L3,O6 (O9) Siblings with observed parent

Prabhav Kalaghatgi 5/19

slide-14
SLIDE 14

Family-joining (FJ) method

G

O1 O2 O3 L1 L2 O4 L3 O5 O6 O9 O7 O8

O7, O8 (O9) Siblings with observed parent

Prabhav Kalaghatgi 5/19

slide-15
SLIDE 15

Family-joining (FJ) method

H

O1 O2 O3 L1 L2 O4 L3 O5 O6 O9 O7 O8 1 2 4 3 1 2 1 3 5 1 1

OLS branch length estimates

Prabhav Kalaghatgi 5/19

slide-16
SLIDE 16

Related methods

Recursive grouping (RG; Choi et al. 2011 JMLR) Chow-Liu recursive grouping (CLRG; Choi et al. 2011 JMLR) Neighbor-joining with edge contraction (NJc; Choi et al. 2011 JMLR) Sampled ancestors (SA; Gavryushkina et al. 2014 PLoS Comput Biol)

Prabhav Kalaghatgi 6/19

slide-17
SLIDE 17

Simulated data

160 taxa Varying proportion of latent vertices 1000 nt long sequences GTR + Γ 100 replicates BIC for threshold selection

Prabhav Kalaghatgi 7/19

slide-18
SLIDE 18

Robinson-Foulds distance

Fraction of latent vertices Normalized Robinson−Foulds distance 0.0 0.2 0.4 0.6 0.8 1.0

  • FJ

NJc RG CLRG SA RF = 1 − |S∩S ^| |S∪S ^| 0.5 0.37 0.25(d) 0.12 Prabhav Kalaghatgi 8/19

slide-19
SLIDE 19

Precision and Recall

Fraction of latent vertices Precision 0.0 0.2 0.4 0.6 0.8 1.0

  • FJ

NJc RG CLRG SA Precision = |S∩S ^| |S ^| 0.5 0.37 0.25(d) 0.12 Fraction of latent vertices Recall 0.0 0.2 0.4 0.6 0.8 1.0

  • FJ

NJc RG CLRG SA

  • Recall = |S∩S

^| |S| 0.5 0.37 0.25(d) 0.12

Prabhav Kalaghatgi 9/19

slide-20
SLIDE 20

Validation using the Belgian HIV-1 C transmission chain data

A B F G C I H D L E K

Vranken et al. 2014 PLoS Comput Biol publicly available at LANL

Prabhav Kalaghatgi 10/19

slide-21
SLIDE 21

Validation using the Belgian HIV-1 C transmission chain data

A B F G C I H D L E K

Vranken et al. 2014 PLoS Comput Biol publicly available at LANL 11 hosts 181 env seqs Sequences at multiple time points per host

Prabhav Kalaghatgi 10/19

slide-22
SLIDE 22

Unrooted generally labeled tree

  • Host

A B C D E F G H I K L latent Prabhav Kalaghatgi 11/19

slide-23
SLIDE 23

Inferring the location of the root

  • Sampling year

1990 1992 1994 1996 1998 2000 2002 2004 2006 Prabhav Kalaghatgi 12/19

slide-24
SLIDE 24

Rooted phylogenetic tree

subs/site 0.00 0.02 0.04 0.06 0.08 0.10

host A B C D E F G H I K L

Prabhav Kalaghatgi 13/19

slide-25
SLIDE 25

Ancestral state reconstruction

subs/site 0.00 0.02 0.04 0.06 0.08 0.10

host A B C D E F G H I K L

Prabhav Kalaghatgi 14/19

slide-26
SLIDE 26

Compatibility with transmission events

subs/site 0.00 0.02 0.04 0.06 0.08 0.10

host A B C D E F G H I K L

A B A G A F B C B H B I C D E K C L C E

Prabhav Kalaghatgi 15/19

slide-27
SLIDE 27

Summary and Outlook

FJ has 93% precision and 90% recall on simulated data High precision implies that most branches are reliable FJ tree is compatible with 9/10 transmission events

Prabhav Kalaghatgi 16/19

slide-28
SLIDE 28

Summary and Outlook

FJ has 93% precision and 90% recall on simulated data High precision implies that most branches are reliable FJ tree is compatible with 9/10 transmission events Improve reconstruction accuracy, and speed

Prabhav Kalaghatgi 16/19

slide-29
SLIDE 29

Summary and Outlook

FJ has 93% precision and 90% recall on simulated data High precision implies that most branches are reliable FJ tree is compatible with 9/10 transmission events Improve reconstruction accuracy, and speed Reliability of migration history depends on – How complete the

location data is

– How well the population

is sampled

Prabhav Kalaghatgi 16/19

slide-30
SLIDE 30

Prabhav Kalaghatgi 17/19

slide-31
SLIDE 31

Acknowledgements

Thomas Lengauer Nico Pfeifer Jochim Büch

Prabhav Kalaghatgi 18/19

slide-32
SLIDE 32

Questions?

Prabhav Kalaghatgi 19/19