Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by - - PowerPoint PPT Presentation

phylogenetics
SMART_READER_LITE
LIVE PREVIEW

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by - - PowerPoint PPT Presentation

Weighted Quartets Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation Computationally Difficult to analyze large datasets Solution? Divide and Conquer Step 1: Construct a set of subtrees


slide-1
SLIDE 1

Weighted Quartets Phylogenetics

Eliran Avni, Reuven Cohen, Sagi Snir

Presentation by Ashu Gupta

slide-2
SLIDE 2

Motivation

  • Computationally Difficult to analyze large datasets
  • Solution?
  • Divide and Conquer
  • Step 1: Construct a set of subtrees (quartets) by

accurate phylogenetic methods

  • Step 2:

Amalgamating the subtrees into a unified tree by a supertree method

slide-3
SLIDE 3

Motivation (cont.)

Maximum Quartet Consistency

  • Input:

A set of quartets Q Output: Tree T* such that the number of quartets in set Q which are satisfied by T* is maximized

  • NP-Hard
  • Need good heuristics to solve this problem
slide-4
SLIDE 4

Quartet Max Cut

  • Input:

A set of quartets Q, set of Taxa X

Output:

Tree T* (approximate solution to MQC)

  • Divide and Conquer Amalgamation Technique
  • Operates on the taxa set by partitioning it into parts, based on some optimization criterion
  • Operate on the sub problems induced by each part
  • Merges the sub-solutions into a complete solution.
  • Each Partition represents a bipartition in the final tree
  • Robust, doesn’t need all quartets
slide-5
SLIDE 5
  • At each recursion step taxon set X is partitioned into two parts P=(Y, X\Y)
  • ab|cd Q is unaffected by a partition P, if all {a,b,c,d} are in one part of P.
  • ab|cd is satisfied by P if some part contains precisely a and b, or some part contains precisely c and d.
  • ab|cd is violated by P if some part contains a,c or a,d or b,c or b,d and the other part contains the
  • ther two.
  • Otherwise, some part contains only one of {a,b,c,d}

In this case ab|cd is deferred.

a b c d a b

c d

a d

b c

a b c

d

slide-6
SLIDE 6

Quartet Max Cut (cont.)

  • At every step of the algorithm, some quartets are satisfied, some

violated, and some continue to the next steps (i.e. either deferred or unaffected).

  • Greedy Approach
  • A plausible strategy is to maximize the ratio between satisfied and violated

quartets at every step.

  • No Theoretical Guarantees!!
slide-7
SLIDE 7

Quartet Max Cut (cont.)

  • Given the set of quartets Q over a taxa set X, we build a graph GQ

=(X,E) with E as follows:

  • For every ab|cd Q we add the 6 edges to E.
  • The “crossing” edges ac, ad, bc, bd are good edges.
  • The edges ab, cd are bad edges.
slide-8
SLIDE 8

8

Q: GQ:

Bad Edges , Good Edges

slide-9
SLIDE 9

Quartet Max Cut (cont.)

  • A cut in GQ corresponds to a partition of the taxa set into two parts. Given

a cut C=(Y, X\Y) in the graph:

  • A satisfied quartet contributes 4 good edges to the cut
  • A deferred contributes 2 good edges and 1 bad edge
  • A violated contributes 2 good edges and 2 bad edges
  • We want to find a cut C* maximizing

C* = 𝒃𝒔𝒉𝒏𝒃𝒚𝑫 (|good edges| -  |bad edges|)

  •  is dynamically chosen such that C* maximizes ρ(C*)=

|𝒉𝒑𝒑𝒆 𝒇𝒆𝒉𝒇𝒕| |𝒄𝒃𝒆 𝒇𝒆𝒉𝒇𝒕|

slide-10
SLIDE 10

10

Q = { 12|34 , 13|45 }

GQ:

The cut {125}, {34} satisfies 12|34 but violates 13|45.

slide-11
SLIDE 11

Quartet Max Cut (cont.)

  • Problems?
  • Each quartet has same weight
  • What if we have confidence values for each quartet ?

(prior knowledge, confidence based on avg. branch length)

  • Possible Solution?
  • Consider only quartets having high confidence
  • Loss of information
  • BAD
  • Need a better Amalgamation technique
slide-12
SLIDE 12

Satisfies last 3 quartets Satisfies first 2 quartets

Which one is better?

slide-13
SLIDE 13

Weighted Quartet Max Cut

  • Intuition: Add confidence of quartets as weights to graph
  • Build Graph GQ similar to QMC
  • For each edge in GQ

Weight of edge = Weight of Mother Quartet

  • We want to find a cut C* maximizing

C* = 𝒃𝒔𝒉𝒏𝒃𝒚𝑫 (|weight of good edges| -  |weight of bad edges|)

slide-14
SLIDE 14

Definitions

  • Weight of a Quartet given a model tree

𝒙𝒓=

(𝒆𝒊−𝒆𝒎) 𝒇(𝒆𝒊−𝒆𝒏)∗𝒆𝒊

𝑒𝑚, 𝑒𝑛, 𝑒ℎ represent the three pair wise sums

  • Qfit
  • Similarity measure between two trees based on quartets common to compared trees
  • WQfit
  • Novel similarity measure defined by the authors
  • Takes into account both shared quartets and their weights to calculate similarity
slide-15
SLIDE 15

Simulation

  • Number of quartets used #𝑟𝑠𝑢 = 𝑜𝑙

where k = qrt-num-factor

  • Rewire
  • Choose a quartet randomly based on its confidence
  • (low confidence) high probability of selection
  • Randomly change the topology of the chosen quartet
  • Weight of rewired quartets / Total weight = Ratio of rewire
slide-16
SLIDE 16

Results

slide-17
SLIDE 17

Results (cont.)

slide-18
SLIDE 18

Results (cont.)

slide-19
SLIDE 19

Results (cont.)

slide-20
SLIDE 20

Results (cont.)

  • Cynobacterial dataset (HGT is evident)
  • Compared wQMC to embedded quartets method
  • Embedded Quartets Method (Zhaxybayeva et al., 2006)
  • Construct a tree for every gene
  • Get induced quartet from every gene trees
  • Get ML score for each quartet
  • Remove low confidence quartets
  • Run MRP to get super tree
  • wQMC tree matched the Embedded Quartets

method

1128 genes, 214,729 quartets

slide-21
SLIDE 21

Questions?

slide-22
SLIDE 22

Thank You