PhySIC IST : cleaning source trees to infer more informative - - PowerPoint PPT Presentation

physic ist cleaning source trees to infer more
SMART_READER_LITE
LIVE PREVIEW

PhySIC IST : cleaning source trees to infer more informative - - PowerPoint PPT Presentation

Introduction VETO methods with desirable proprieties STC preprocess PhySIC IST : cleaning source trees to infer more informative supertrees. Celine Scornavacca , Vincent Berry, Vincent Ranwez et Emmanuel J.P. Douzery LIRMM, UMR CNRS 5506


slide-1
SLIDE 1

1/ 29

Introduction VETO methods with desirable proprieties STC preprocess

PhySIC IST: cleaning source trees to infer more informative supertrees.

Celine Scornavacca, Vincent Berry, Vincent Ranwez et Emmanuel J.P. Douzery

LIRMM, UMR CNRS 5506 ISEM, UMR CNRS 5554 University of Montpellier II

June 16, 2008

Celine Scornavacca PhySIC IST

slide-2
SLIDE 2

2/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Reconstruction of phylogenies

Input: (different) source datasets

Celine Scornavacca PhySIC IST

slide-3
SLIDE 3

2/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Reconstruction of phylogenies

Input: (different) source datasets

  • utput: A large phylogeny

Celine Scornavacca PhySIC IST

slide-4
SLIDE 4

3/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Reconstruction of phylogenies for multiple datasets

Two main approaches

Supermatrix approach: assembling datasets

Celine Scornavacca PhySIC IST

slide-5
SLIDE 5

3/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Reconstruction of phylogenies for multiple datasets

Two main approaches

Supermatrix approach: assembling datasets Supertree approach: assembling trees

Celine Scornavacca PhySIC IST

slide-6
SLIDE 6

4/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Interest of supertrees

Supertrees are useful for: Combining heterogeneous data Obtaining a phylogeny using several genes:

◮ Avoids having to deal with too much missing data ◮ Evolutionary models adapted for each gene sequence

Pointing out problematic areas of the phylogeny

◮ agreement and disagreement among input trees. ◮ measuring taxon overlap Celine Scornavacca PhySIC IST

slide-7
SLIDE 7

5/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Supertree methods

VOTE vs VETO methods

Supertree methods can be classified into two categories, depending on the way they deal with incongruent data:

Celine Scornavacca PhySIC IST

slide-8
SLIDE 8

5/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Supertree methods

VOTE vs VETO methods

Supertree methods can be classified into two categories, depending on the way they deal with incongruent data: Vote methods resolve conflicts, opting for the resolution that maximizes their optimization criteria. worrying feature: this approach can lead to propose clades contradicting all source trees.

! " # $ % ! " $ # % &' &( ! # $ % &) ! " # $ %

Celine Scornavacca PhySIC IST

slide-9
SLIDE 9

6/ 29

Introduction VETO methods with desirable proprieties STC preprocess Combining data for phylogenetic inferences Vote methods Veto methods

Supertree methods

VOTE vs VETO methods

Veto methods do not allow the resulting supertree to contain clades that a source tree would vote against.

◮ pruning some taxa:

OR

◮ proposing multifurcations

worrying feature: this approach can lead to propose unresolved supertrees.

! " # $ % ! " # $ % ! " $ # % &' &( ! # $ % &)

Celine Scornavacca PhySIC IST

slide-10
SLIDE 10

7/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC

A VETO method with desirable proprieties

The resulting supertree does not contain relationships contradicting the source trees (non-contradiction property, denoted by PC);

Celine Scornavacca PhySIC IST

slide-11
SLIDE 11

7/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC

A VETO method with desirable proprieties

The resulting supertree does not contain relationships contradicting the source trees (non-contradiction property, denoted by PC);

!" ! " # $ % !# " ! $ # % & $!" ! # % $ " &

Celine Scornavacca PhySIC IST

slide-12
SLIDE 12

8/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC

A VETO method with desirable proprieties

The resulting supertree only contains relationships that are present in a source tree or collectively induced by several source trees (induction property, denoted by PI).

Celine Scornavacca PhySIC IST

slide-13
SLIDE 13

8/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC

A VETO method with desirable proprieties

The resulting supertree only contains relationships that are present in a source tree or collectively induced by several source trees (induction property, denoted by PI).

!" ! " # $ % !# # & ! ' $!" ! ' # & " $ % Celine Scornavacca PhySIC IST

slide-14
SLIDE 14

9/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC

A VETO method with UNdesirable proprieties

BUT, when T contains numerous contradictions or small overlap, the supertrees built with PhySIC can be highly unresolved.

Celine Scornavacca PhySIC IST

slide-15
SLIDE 15

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

Celine Scornavacca PhySIC IST

slide-16
SLIDE 16

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; Celine Scornavacca PhySIC IST

slide-17
SLIDE 17

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; ◮ returns a supertree T that still respects PC and PI by: Celine Scornavacca PhySIC IST

slide-18
SLIDE 18

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; ◮ returns a supertree T that still respects PC and PI by: ⋆ allowing multifurcations; Celine Scornavacca PhySIC IST

slide-19
SLIDE 19

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; ◮ returns a supertree T that still respects PC and PI by: ⋆ allowing multifurcations;

AND

⋆ pruning rogue taxa: Celine Scornavacca PhySIC IST

slide-20
SLIDE 20

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; ◮ returns a supertree T that still respects PC and PI by: ⋆ allowing multifurcations;

AND

⋆ pruning rogue taxa:

! " # $ %

Celine Scornavacca PhySIC IST

slide-21
SLIDE 21

10/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

An improved version of PhySIC

To cope with this, we propose a second method that:

◮ maximizes the information contained in the produced supertree; ◮ returns a supertree T that still respects PC and PI by: ⋆ allowing multifurcations;

AND

⋆ pruning rogue taxa: Celine Scornavacca PhySIC IST

slide-22
SLIDE 22

11/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

PhySIC IST (PHYlogenetic Signal with Induction and non-Contradiction Inserting a Subset of Taxa) is an algorithm that

  • perates successive insertions of taxa on a backbone tree.

Celine Scornavacca PhySIC IST

slide-23
SLIDE 23

11/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

PhySIC IST (PHYlogenetic Signal with Induction and non-Contradiction Inserting a Subset of Taxa) is an algorithm that

  • perates successive insertions of taxa on a backbone tree.

the order of the insertions has to be chosen carefully!

Celine Scornavacca PhySIC IST

slide-24
SLIDE 24

12/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

We order taxa in decreasing priority order The first taxa to be inserted are those present in as much source trees as possible and involved in as few contradictions as possible We build the backbone tree

! " # $ % &

'()*()+,

! "

Celine Scornavacca PhySIC IST

slide-25
SLIDE 25

13/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Supports

Within which region of the backbone tree can a taxon s be inserted without contradicting T1 and T2?

! " # $ % &!$'&()%*+,%%*- $ % !

!

  • .

" # $ &

!

!

  • /

!

Celine Scornavacca PhySIC IST

slide-26
SLIDE 26

14/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

  • ne best supported position (PI) and all trees agree (PC)

Celine Scornavacca PhySIC IST

slide-27
SLIDE 27

14/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

  • ne best supported position (PI) and all trees agree (PC)

more than one best supported position and/not all trees agree (PI and PC???)

Celine Scornavacca PhySIC IST

slide-28
SLIDE 28

14/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

  • ne best supported position (PI) and all trees agree (PC)

more than one best supported position and/not all trees agree (PI and PC???)

! " # $

%

& ! " # $

%

& ' ! " # $

%

& ' Celine Scornavacca PhySIC IST

slide-29
SLIDE 29

14/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

Outline of PhySIC IST

  • ne best supported position (PI) and all trees agree (PC)

more than one best supported position and/not all trees agree (PI and PC???)

! " # $

%

& ! " # $

%

& '

()

Celine Scornavacca PhySIC IST

slide-30
SLIDE 30

15/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

CIC criterion

We need to evaluate the amount of information of a tree.

Celine Scornavacca PhySIC IST

slide-31
SLIDE 31

15/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

CIC criterion

We need to evaluate the amount of information of a tree. We use a variant of the CIC criterion (Thorley, Wilkinson, Charleston 1998) that also takes into account missing taxa and we define it as:

Celine Scornavacca PhySIC IST

slide-32
SLIDE 32

15/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

PhySIC IST

CIC criterion

We need to evaluate the amount of information of a tree. We use a variant of the CIC criterion (Thorley, Wilkinson, Charleston 1998) that also takes into account missing taxa and we define it as: CIC(T, n) = − lg number of permitted binary trees with n taxa number of possible binary trees with n taxa

  • "

$ "

  • $

" $

  • "

$ %

"

$ "

✄ ✂

$

✂ ✄

" $

"

$ %

Celine Scornavacca PhySIC IST

slide-33
SLIDE 33

16/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST Celine Scornavacca PhySIC IST

slide-34
SLIDE 34

17/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

Large-scale simulations

Average CIC values

MRP △, PhySIC , PhySIC IST

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50

d = 25% d = 50%

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 MRP PhySIC PhySIC_IST

d = 75% mixed d

Celine Scornavacca PhySIC IST

slide-35
SLIDE 35

18/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

Large-scale simulations

Average percentage of type I error

MRP △, PhySIC , PhySIC IST

0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50

d = 25% d = 50%

0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 MRP PhySIC PhySIC_IST

d = 75% mixed d

Celine Scornavacca PhySIC IST

slide-36
SLIDE 36

19/ 29

Introduction VETO methods with desirable proprieties STC preprocess Physic PhySIC IST

The improvement of PhySIC IST on PhySIC

The improvement of PhySIC IST on PhySIC is a consequence of three fundamental differences between them: the new version operates successive insertions of taxa on a backbone and is not based on a revised version of the Build algorithm (unlike PhySIC) the two methods do not have the same optimization criterion

◮ PhySIC => nb of triplets ◮ PhySIC IST => CIC

PhySIC IST can propose non-plenary supertrees

Celine Scornavacca PhySIC IST

slide-37
SLIDE 37

20/ 29

Introduction VETO methods with desirable proprieties STC preprocess

1

Introduction Combining data for phylogenetic inferences Vote methods Veto methods

2

VETO methods with desirable proprieties Physic PhySIC IST

3

STC preprocess

Celine Scornavacca PhySIC IST

slide-38
SLIDE 38

21/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Limits of veto methods

As the amount of available information continues to increase, the number of conflicts between source trees increases

MRP △, PhySIC , PhySIC IST

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50

informativeness

0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50

inaccuracy

Celine Scornavacca PhySIC IST

slide-39
SLIDE 39

22/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Vote VS veto methods?

! " # ! " #

$ $% &'() *+)

! " #

$%

Celine Scornavacca PhySIC IST

slide-40
SLIDE 40

22/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Vote VS veto methods?

! " # ! " #

$ $% &'() *+)

! " # ! " #

$ $% &+) (()

Celine Scornavacca PhySIC IST

slide-41
SLIDE 41

22/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Vote VS veto methods?

! " # ! " #

$ $% &'() *+)

! " # ! " #

$ $% &+) (()

IDEA: flexible liberal(voting) preprocessing of the input trees before a veto approach.

Celine Scornavacca PhySIC IST

slide-42
SLIDE 42

23/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Source Tree Correction (STC) preprocess

We want to drop the statistically less supported alterative(s), if any exists.

! !" !""

# $ % # $ % # $ % &' ( (

Celine Scornavacca PhySIC IST

slide-43
SLIDE 43

24/ 29

Introduction VETO methods with desirable proprieties STC preprocess

STC preprocess

After that, the STC preprocess modifies the source trees (PhySIC IST), forcing them not to contain the dropped resolutions.

Celine Scornavacca PhySIC IST

slide-44
SLIDE 44

24/ 29

Introduction VETO methods with desirable proprieties STC preprocess

STC preprocess

After that, the STC preprocess modifies the source trees (PhySIC IST), forcing them not to contain the dropped resolutions. Each modified tree may contain either new multifurcations, or lack some of its former taxa.

Celine Scornavacca PhySIC IST

slide-45
SLIDE 45

24/ 29

Introduction VETO methods with desirable proprieties STC preprocess

STC preprocess

After that, the STC preprocess modifies the source trees (PhySIC IST), forcing them not to contain the dropped resolutions. Each modified tree may contain either new multifurcations, or lack some of its former taxa. A threshold α is chosen by the user.

!"#$ !$#"

α

Celine Scornavacca PhySIC IST

slide-46
SLIDE 46

25/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Large-scale simulations (α = 0.05)

Average CIC values

MRP △, PhySIC , PhySIC IST , STC+ PhySIC and STC+ PhySIC IST

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50

d = 25% d = 50%

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 10 15 20 25 30 35 40 45 50 MRP PhySIC STC + PhySIC PhySIC_IST STC + PhySIC_IST

d = 75% mixed d

Celine Scornavacca PhySIC IST

slide-47
SLIDE 47

26/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Large-scale simulations (α = 0.05)

Average percentage of type I error

MRP △, PhySIC , PhySIC IST , STC+ PhySIC and STC+ PhySIC IST

0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50

d = 25% d = 50%

0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 0.5 1 1.5 2 2.5 3 10 15 20 25 30 35 40 45 50 MRP PhySIC STC + PhySIC PhySIC_IST STC + PhySIC_IST

d = 75% mixed d

Celine Scornavacca PhySIC IST

slide-48
SLIDE 48

27/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Conclusions

PhySIC IST: new version of PhySIC

◮ more informative but still reliable supertrees

STC: a statistical preprocess of the source trees to detect and correct artifactual positions of taxa This approach has the advantage of separating the liberal resolution

  • f conflicts in the data from the assemblage of the supertree.

◮ feedback of the source trees

Test STC+ PhySIC IST on biological datasets

Celine Scornavacca PhySIC IST

slide-49
SLIDE 49

28/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Conclusions

http://www.atgc-montpellier.fr/physic ist/

Celine Scornavacca PhySIC IST

slide-50
SLIDE 50

29/ 29

Introduction VETO methods with desirable proprieties STC preprocess

Thanks

Olivier Gascuel and Vincent Lefort C´ eline Brochier, Vincent Daubin and F´ er´ edric Delsuc

Celine Scornavacca PhySIC IST