weighted quartets phylogenetics
play

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. - PowerPoint PPT Presentation

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output C D A A C D A B C D E D B A E B E Def: a


  1. Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087

  2. Problem: quartet-based supertree Input Output C D A A C D A B C D E D B A E B E Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q . Goal: find the largest compatible subset of the given quartet set. NP-hard

  3. Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC

  4. Background: Quartet MaxCut (QMC) Example: cut in a graph 3 A C cut C = ( { A , B }, { C , D } ) 5 2 weight of cut, w ( C ) = 3 + 1 = 4 B D 1 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

  5. Quartet MaxCut (QMC): a heuristic method Given a set of species (taxa) X , QMC builds a graph G ( Q ) = ( V , E ) . Node : V = X Edge : For every quartet q in Q , add to G edges related to every pair of leaves in q . - bad edges : edges that link adjacent sister leaves - good edges : other (four) pairs 2 1 4 3 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

  6. Quartet graph 2 1 1 3 Put together 3 4 3 1 2 4 4 2 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

  7. Quartet MaxCut (QMC) algorithm • Find a cut C in the quartet graph that maximizes the ratio between the good and bad edges in C 1 3 • The cut defines a split ( U , X\U ) over the taxa set X • Apply recursively on U and X\U , until the subset size is <= 4 2 4 • Every split defines an edge in the construction Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

  8. Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC

  9. Contribution of this paper • A weighted extension of QMC • A scheme for associating weights to quartets • A new measure of tree similarity

  10. A weighted extension of QMC • Recall QMC: • Find a cut C in the quartet graph that maximizes the ratio between the number of good and bad edges in C • Now, suppose we are given a set of quartets with associated weights • Question: what is natural extension of QMC to handle weighted quartets? • Find a cut C in the quartet graph that maximizes the ratio between the total weight of good and bad edges in C

  11. Prioritize between quartets 1 3 1 4 1 1 2 2 0.1 0.1 1.0 1.0 2 4 3 5 4 5 3 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights Construction with weights 4 2 3 1 4 1 3 5 2 5 Satisfies 3 quartets Satisfies 2 quartets Sum of weights 1.2 Sum of weights 2.0

  12. A scheme for associating weights Let c a d 1 = d ab + d cd d 2 = d ac + d bd d 3 = d ad + d bc b d We assume that d 1 ≤ d 2 ≤ d 3 The weight function of quartet q=ac|cd is defined as  ( d d )  3 1 w ( q )  e x p( d d d ) 3 2 3 Remarks: • Note that d 3 -d 1 is the twice the length of the internal edge. The quartet weight increases as the internal edge is longer and the split is more significant • Weight becomes 0 if the quartet is unresolved, i.e., d 3 -d 1 =0. • d 3 -d 2  0, data more reliable, weight becomes larger • In a tree, d 3 -d 2 = 0, we have d   1 w q ( ) 1 d 3

  13. A new measure of tree similarity Existing measure: Qfit measure (Estabrook 1985) # shared quartets Qfit  # all possible quartets New measure: wQfit measure (this paper)   2 q q      1 2 wQfit q q ( , q ) w q w ( ) ( w ) where For quartets:  1 2 1 2  1 q q 1 2  2 wQfit ( T , T )  s q 1, s 2, s wQfit ( , T T ) For trees:   T 1 2  wQf it ( T , T ) wQ fit ( T , T ) s q 1, s 1, s s q 2, s 2, s where s is a subset of input species X , and | s |=4 T is the quartet of tree T 1 induced by s 1, s

  14. Properties of wQfit  2 wQfit ( T , T )  s q 1, s 2, s wQfit ( , T T )   T 1 2  wQf it ( T , T ) wQ fit ( T , T ) s q 1, s 1, s s q 2, s 2, s • Two trees T 1 = T 2 if and only if wQfit( T 1 , T 2 ) = 1 • For any two trees T 1 and T 2 on the same input species X, |wQfit( T 1 , T 2 )| ≤ 1 • Given a weighted tree T 1 . T 2 is obtained by assigning a random permutation of input species X to the leaves of T 1 , then E [wQfit( T 1 , T 2 )] = 0

  15. Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC

  16. Performance of wQMC RF (Robinson and Foulds 1981): # different splits between two trees Rewire : randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor : for a taxa set of size n , the number of input quartets is n k , where k is called qrt-num-factor . Observations: wQMC can reconstruct a tree that is highly similar to the original, even when receiving noisy input

  17. Comparison between Qfit and wQfit Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence on the quality of quartets. Example: • 30% quartets disagree with the constructed tree. Qfit score for this is 70%. • We expect this fraction to be mainly composed unreliable quartets • Their total weight should be smaller, e.g., 10%. • We expect the wQfit score to reflect the low level of confidence in the wrong quartets, e.g., wQfit=90% Observations: wQfit augments information to the score by segregating quartets according to quality.

  18. Comparison between QMC and wQMC Observations: • Weights reflect confidence in quartet data, allowing wQMC to prioritize correct quartets, esp. for noisy data. • Lightweight quartets are more prone to exhibit a wrong topology.

  19. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend