Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. - - PowerPoint PPT Presentation

weighted quartets phylogenetics
SMART_READER_LITE
LIVE PREVIEW

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. - - PowerPoint PPT Presentation

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output C D A A C D A B C D E D B A E B E Def: a


slide-1
SLIDE 1

Weighted Quartets Phylogenetics

Yunan Luo

  • E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087
slide-2
SLIDE 2

Problem: quartet-based supertree

A B C D E A B C D A B D E A C D E

Input Output

Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q. Goal: find the largest compatible subset of the given quartet set. NP-hard

slide-3
SLIDE 3

Outline

  • Background: Quartet MaxCut (QMC)
  • Weighted Quartet MaxCut (wQMC)
  • Results of wQMC
slide-4
SLIDE 4

Background: Quartet MaxCut (QMC)

Example: cut in a graph A C B D cut C = ( {A, B}, {C, D} )

2 3 5 1

weight of cut, w(C) = 3 + 1 = 4

Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

slide-5
SLIDE 5

Quartet MaxCut (QMC): a heuristic method

Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

Given a set of species (taxa) X, QMC builds a graph G(Q) = (V, E). Edge: For every quartet q in Q, add to G edges related to every pair of leaves in q.

  • bad edges: edges that link adjacent sister leaves
  • good edges: other (four) pairs

Node: V = X

1 2 3 4

slide-6
SLIDE 6

Quartet graph

1 2 3 4 1 3 2 4 1 2 3 4

Put together

Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

slide-7
SLIDE 7

Quartet MaxCut (QMC) algorithm

1 2 3 4

  • Find a cut C in the quartet graph that

maximizes the ratio between the good and bad edges in C

  • The cut defines a split (U, X\U) over

the taxa set X

  • Apply recursively on U and X\U, until

the subset size is <= 4

  • Every split defines an edge in the

construction

Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

slide-8
SLIDE 8

Outline

  • Background: Quartet MaxCut (QMC)
  • Weighted Quartet MaxCut (wQMC)
  • Results of wQMC
slide-9
SLIDE 9

Contribution of this paper

  • A weighted extension of QMC
  • A scheme for associating weights to quartets
  • A new measure of tree similarity
slide-10
SLIDE 10

A weighted extension of QMC

  • Recall QMC:
  • Find a cut C in the quartet graph that maximizes the

ratio between the number of good and bad edges in C

  • Now, suppose we are given a set of quartets with

associated weights

  • Question: what is natural extension of QMC to handle

weighted quartets?

  • Find a cut C in the quartet graph that maximizes the ratio

between the total weight of good and bad edges in C

slide-11
SLIDE 11

Prioritize between quartets

1 2 3 4 1 3 4 5 1 4 2 5 1 3 2 5

1.0 1.0 0.1 0.1

1 3 4 2 5 1 2 3 4 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights Construction with weights

Satisfies 3 quartets Sum of weights 1.2 Satisfies 2 quartets Sum of weights 2.0

slide-12
SLIDE 12

A scheme for associating weights

Let d1 = dab + dcd d2 = dac + dbd d3 = dad + dbc We assume that d1≤d2 ≤d3 a b c d The weight function of quartet q=ac|cd is defined as

3 1 3 2 3

( ) e ) p( ( x ) d d d w d d q    Remarks:

  • Note that d3-d1 is the twice the length of the internal edge. The quartet weight

increases as the internal edge is longer and the split is more significant

  • Weight becomes 0 if the quartet is unresolved, i.e., d3-d1=0.
  • d3-d2  0, data more reliable, weight becomes larger
  • In a tree, d3-d2 = 0, we have

1 3

( ) 1 d w q d  

slide-13
SLIDE 13

A new measure of tree similarity

Existing measure: Qfit measure (Estabrook 1985)

# shared quartets # all possible quartets Qfit 

New measure: wQfit measure (this paper)

1 2 1 2

( , ) ( ) ( )

q q

wQfit w q w q w  

1 2 1 2

2 1 q q q q       

For quartets: where For trees:

1, 2, 1 2 1, 1, 2, 2,

( , ) ( , ) 2 ( , ) ( , )

s s q s s T q s s q s s s

wQfit wQfit wQf T T T T T T T it wQ T fit  

  

where s is a subset of input species X, and |s|=4

1,s

T

is the quartet of tree T1 induced by s

slide-14
SLIDE 14

Properties of wQfit

1, 2, 1 2 1, 1, 2, 2,

( , ) ( , ) 2 ( , ) ( , )

s s q s s T q s s q s s s

wQfit wQfit wQf T T T T T T T it wQ T fit  

  

  • Two trees T1 = T2 if and only if wQfit(T1, T2) = 1
  • For any two trees T1 and T2 on the same input species X, |wQfit(T1, T2)| ≤ 1
  • Given a weighted tree T1. T2 is obtained by assigning a random permutation of

input species X to the leaves of T1, then E[wQfit(T1, T2)] = 0

slide-15
SLIDE 15

Outline

  • Background: Quartet MaxCut (QMC)
  • Weighted Quartet MaxCut (wQMC)
  • Results of wQMC
slide-16
SLIDE 16

Performance of wQMC

RF (Robinson and Foulds 1981): # different splits between two trees Rewire: randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor: for a taxa set of size n, the number of input quartets is nk, where k is called qrt-num-factor.

Observations: wQMC can reconstruct a tree that is highly similar to the original, even when receiving noisy input

slide-17
SLIDE 17

Comparison between Qfit and wQfit

Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence

  • n the quality of quartets.

Example:

  • 30% quartets disagree with the constructed tree. Qfit score for this is 70%.
  • We expect this fraction to be mainly composed unreliable quartets
  • Their total weight should be smaller, e.g., 10%.
  • We expect the wQfit score to reflect the low level of confidence in the wrong quartets, e.g.,

wQfit=90%

Observations:

wQfit augments information to the score by segregating quartets according to quality.

slide-18
SLIDE 18

Comparison between QMC and wQMC

Observations:

  • Weights reflect confidence in quartet data, allowing wQMC to prioritize

correct quartets, esp. for noisy data.

  • Lightweight quartets are more prone to exhibit a wrong topology.
slide-19
SLIDE 19

Thank you!