Weighted Quartets Phylogenetics
Yunan Luo
- E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087
Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. - - PowerPoint PPT Presentation
Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output C D A A C D A B C D E D B A E B E Def: a
A B C D E A B C D A B D E A C D E
Input Output
Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q. Goal: find the largest compatible subset of the given quartet set. NP-hard
Example: cut in a graph A C B D cut C = ( {A, B}, {C, D} )
2 3 5 1
weight of cut, w(C) = 3 + 1 = 4
Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Given a set of species (taxa) X, QMC builds a graph G(Q) = (V, E). Edge: For every quartet q in Q, add to G edges related to every pair of leaves in q.
Node: V = X
1 2 3 4
1 2 3 4 1 3 2 4 1 2 3 4
Put together
Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
1 2 3 4
maximizes the ratio between the good and bad edges in C
the taxa set X
the subset size is <= 4
construction
Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
1 2 3 4 1 3 4 5 1 4 2 5 1 3 2 5
1.0 1.0 0.1 0.1
1 3 4 2 5 1 2 3 4 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights Construction with weights
Satisfies 3 quartets Sum of weights 1.2 Satisfies 2 quartets Sum of weights 2.0
Let d1 = dab + dcd d2 = dac + dbd d3 = dad + dbc We assume that d1≤d2 ≤d3 a b c d The weight function of quartet q=ac|cd is defined as
3 1 3 2 3
( ) e ) p( ( x ) d d d w d d q Remarks:
increases as the internal edge is longer and the split is more significant
1 3
( ) 1 d w q d
Existing measure: Qfit measure (Estabrook 1985)
# shared quartets # all possible quartets Qfit
New measure: wQfit measure (this paper)
1 2 1 2
( , ) ( ) ( )
q q
wQfit w q w q w
1 2 1 2
2 1 q q q q
For quartets: where For trees:
1, 2, 1 2 1, 1, 2, 2,
( , ) ( , ) 2 ( , ) ( , )
s s q s s T q s s q s s s
wQfit wQfit wQf T T T T T T T it wQ T fit
where s is a subset of input species X, and |s|=4
1,s
T
is the quartet of tree T1 induced by s
1, 2, 1 2 1, 1, 2, 2,
( , ) ( , ) 2 ( , ) ( , )
s s q s s T q s s q s s s
wQfit wQfit wQf T T T T T T T it wQ T fit
input species X to the leaves of T1, then E[wQfit(T1, T2)] = 0
RF (Robinson and Foulds 1981): # different splits between two trees Rewire: randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor: for a taxa set of size n, the number of input quartets is nk, where k is called qrt-num-factor.
Observations: wQMC can reconstruct a tree that is highly similar to the original, even when receiving noisy input
Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence
Example:
wQfit=90%
Observations:
wQfit augments information to the score by segregating quartets according to quality.
Observations:
correct quartets, esp. for noisy data.