The Average Consensus Procedure: Franois-Joseph Lapointe , Universit - - PowerPoint PPT Presentation

the average consensus procedure
SMART_READER_LITE
LIVE PREVIEW

The Average Consensus Procedure: Franois-Joseph Lapointe , Universit - - PowerPoint PPT Presentation

Overlapping Sets of Taxa Combination of Weighted Trees Containing Identical or The Average Consensus Procedure: Franois-Joseph Lapointe , Universit de Montral , Montreal, Canada Guy Cucumel , Universit du Qubec Montral , Montreal,


slide-1
SLIDE 1

The Average Consensus Procedure:

Combination of Weighted Trees Containing Identical or Overlapping Sets of Taxa

François-Joseph Lapointe, Université de Montréal, Montreal, Canada Guy Cucumel, Université du Québec à Montréal, Montreal, Canada

slide-2
SLIDE 2

Consensus Trees

The Problem Suppose we have multiple phylogenies on overlapping leaf sets, how can we combine them effectively?

*Modified to show the overlapping leaf sets

slide-3
SLIDE 3

Criteria

Intuitively, we want the best possible tree, i.e. the one which is closest to every input tree, in topology and branch lengths. To understand the problem, it is important to define which criteria are important.

  • Tree Topology
  • Branch Lengths
slide-4
SLIDE 4

Notion of Tree Distance: Identical Leaf Set

Let’s suppose two trees are defined on the same leaf set. How “close” are they to one another? Let S = {1, 2, . . . , n} be the taxa, and T1 and T2 be two trees on S. Define ∆(T1, T2) :=

n

i=1 n

j=1

( d1(i, j) − d2(i, j) )2 (1) Where dℓ(i, j) is the distance from i to j in Tℓ. Note. It is important to normalize the branch lengths for this to make

  • sense. Assume all branch lengths in [0, 1].
slide-5
SLIDE 5

Optimal Consensus Tree: Identical Leaf Set

What is the best tree? Given k trees, T1, T2, . . . Tk on S, we want Tc that minimizes

k

ℓ=1

∆(Tc, Tℓ) =

k

ℓ=1 n

i=1 n

j=1

( dc(i, j) − dℓ(i, j) )2 (2) But how do we compute this? Claim Let d(i, j) be the average over all trees Tℓ of dℓ(i, j), then to optimize ∑k

ℓ=1 ∆(Tc, Tℓ) it suffices to optimize n

i=1 n

j=1

( dc(i, j) − d(i, j) )2

slide-6
SLIDE 6

k

ℓ=1

∆(Tc, Tℓ) =

k

ℓ=1 n

i=1 n

j=1

( dc(i, j) − dℓ(i, j) )2 =

k

ℓ=1 n

i=1 n

j=1

[ dc(i, j)2 − 2dc(i, j)dℓ(i, j) + dℓ(i, j)2] =

n

i=1 n

j=1

[ k · dc(i, j)2 − 2dc(i, j) ∑

ℓ dℓ(i, j) + ∑ ℓ

( dℓ(i, j)2)] = k ·

n

i=1 n

j=1

[ dc(i, j)2 − 2dc(i, j)d(i, j) + d(i, j)

2 − d(i, j) 2 + 1 k

( dℓ(i, j)2)] = k ·

n

i=1 n

j=1

[( dc(i, j) − d(i, j) )2 − g(T1, . . . , Tk) ] = f(k) ·

n

i=1 n

j=1

[( dc(i, j) − d(i, j) )2] − f(T1, . . . , Tk) So optimizing ∑k

ℓ=1 ∆(Tc, Tℓ) is the same as optimizing ∑n i=1

∑n

j=1

( dc(i, j) − d(i, j) )2.

slide-7
SLIDE 7

Nonidentical Leaf Sets

So we just want the tree whose leaf distances are closest to d(i, j), in the least-squares metric. This can be done via a heuristic search using e.g. PHYLIP What if there is missing data for d(i, j)? We take the weighted average

  • d(i, j)

= 1 N(i, j)

k

ℓ=1

dℓ(i, j) Where N(i, j) is the number of times i and j appear in the same tree. What if N(i, j) = 0? We can fill in the entries assuming additive using four-point condition — [Landry, Lapointe, Kirsch, ‘96]

slide-8
SLIDE 8

Analysis

The authors ran their method on the two trees from earlier to get this average consensus supertree

The resulting tree is not too different from the input trees, and correctly predicts a clade that was conjectured by the authors who produced the original trees.

slide-9
SLIDE 9

Questions?