phylogenetics
play

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by - PowerPoint PPT Presentation

Weighted Quartets Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation Computationally Difficult to analyze large datasets Solution? Divide and Conquer Step 1: Construct a set of subtrees


  1. Weighted Quartets Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta

  2. Motivation • Computationally Difficult to analyze large datasets • Solution? • Divide and Conquer • Step 1: Construct a set of subtrees (quartets) by accurate phylogenetic methods • Step 2: Amalgamating the subtrees into a unified tree by a supertree method

  3. Motivation (cont.) Maximum Quartet Consistency • Input: A set of quartets Q Output: Tree T* such that the number of quartets in set Q which are satisfied by T* is maximized • NP-Hard • Need good heuristics to solve this problem

  4. Quartet Max Cut • Input: A set of quartets Q, set of Taxa X Output: Tree T* (approximate solution to MQC) • Divide and Conquer Amalgamation Technique • Operates on the taxa set by partitioning it into parts, based on some optimization criterion • Operate on the sub problems induced by each part • Merges the sub-solutions into a complete solution. • Each Partition represents a bipartition in the final tree • Robust, doesn ’ t need all quartets

  5. • At each recursion step taxon set X is partitioned into two parts P=(Y, X\Y) • ab | cd  Q is unaffected by a partition P , if all { a , b , c , d } are in one part of P . a b c d • ab | cd is satisfied by P if some part contains precisely a and b , or some part contains precisely c and d . a b c d • ab | cd is violated by P if some part contains a,c or a,d or b,c or b,d and the other part contains the other two. a d b c • Otherwise, some part contains only one of { a , b , c , d } In this case ab | cd is deferred . a b d c

  6. Quartet Max Cut (cont.) • At every step of the algorithm, some quartets are satisfied, some violated, and some continue to the next steps (i.e. either deferred or unaffected ). • Greedy Approach • A plausible strategy is to maximize the ratio between satisfied and violated quartets at every step. • No Theoretical Guarantees!!

  7. Quartet Max Cut (cont.) • Given the set of quartets Q over a taxa set X , we build a graph G Q =( X , E ) with E as follows: • For every ab | cd  Q we add the 6 edges to E . • The “ crossing ” edges ac , ad , bc , bd are good edges . • The edges ab, cd are bad edges .

  8. Q : G Q : Bad Edges , Good Edges 8

  9. Quartet Max Cut (cont.) • A cut in G Q corresponds to a partition of the taxa set into two parts. Given a cut C =( Y , X \ Y ) in the graph: • A satisfied quartet contributes 4 good edges to the cut • A deferred contributes 2 good edges and 1 bad edge • A violated contributes 2 good edges and 2 bad edges • We want to find a cut C* maximizing C* = 𝒃𝒔𝒉𝒏𝒃𝒚 𝑫 (| good edges | -  | bad edges |) |𝒉𝒑𝒑𝒆 𝒇𝒆𝒉𝒇𝒕| •  is dynamically chosen such that C* maximizes ρ (C*) = |𝒄𝒃𝒆 𝒇𝒆𝒉𝒇𝒕|

  10. Q = { 12|34 , 13|45 } G Q : The cut {125}, {34} satisfies 12|34 but violates 13|45. 10

  11. Quartet Max Cut (cont.) • Problems? • Each quartet has same weight • What if we have confidence values for each quartet ? (prior knowledge, confidence based on avg. branch length) • Possible Solution? • Consider only quartets having high confidence • Loss of information • BAD • Need a better Amalgamation technique

  12. Satisfies last 3 quartets Which one is better? Satisfies first 2 quartets

  13. Weighted Quartet Max Cut • Intuition: Add confidence of quartets as weights to graph • Build Graph G Q similar to QMC • For each edge in G Q Weight of edge = Weight of Mother Quartet • We want to find a cut C* maximizing C* = 𝒃𝒔𝒉𝒏𝒃𝒚 𝑫 (|weight of good edges | -  |weight of bad edges |)

  14. Definitions • Weight of a Quartet given a model tree (𝒆 𝒊 −𝒆 𝒎 ) 𝒙 𝒓 = 𝒇 (𝒆𝒊−𝒆𝒏) ∗𝒆 𝒊 𝑒 𝑚 , 𝑒 𝑛 , 𝑒 ℎ represent the three pair wise sums • Qfit • Similarity measure between two trees based on quartets common to compared trees • WQfit • Novel similarity measure defined by the authors • Takes into account both shared quartets and their weights to calculate similarity

  15. Simulation • Number of quartets used #𝑟𝑠𝑢 = 𝑜 𝑙 where k = qrt-num-factor • Rewire • Choose a quartet randomly based on its confidence • (low confidence) high probability of selection • Randomly change the topology of the chosen quartet • Weight of rewired quartets / Total weight = Ratio of rewire

  16. Results

  17. Results (cont.)

  18. Results (cont.)

  19. Results (cont.)

  20. Results (cont.) • Cynobacterial dataset (HGT is evident) • Compared wQMC to embedded quartets method • Embedded Quartets Method (Zhaxybayeva et al., 2006) • Construct a tree for every gene • Get induced quartet from every gene trees • Get ML score for each quartet • Remove low confidence quartets • Run MRP to get super tree • wQMC tree matched the Embedded Quartets method 1128 genes, 214,729 quartets

  21. Questions?

  22. Thank You

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend