Complexity Insights of the Minimum Duplication Problem Guillaume - - PowerPoint PPT Presentation

complexity insights of the minimum duplication problem
SMART_READER_LITE
LIVE PREVIEW

Complexity Insights of the Minimum Duplication Problem Guillaume - - PowerPoint PPT Presentation

Complexity Insights of the Minimum Duplication Problem Guillaume Blin Paola Bonizzoni Riccardo Dondi Romeo Rizzi Florian Sikora Universit e Paris-Est Marne-la-Vall ee, LIGM - UMR CNRS 8049, France DISCo, Universit a degli Studi di


slide-1
SLIDE 1

Complexity Insights of the Minimum Duplication Problem

Guillaume Blin Paola Bonizzoni Riccardo Dondi Romeo Rizzi Florian Sikora

Universit´ e Paris-Est Marne-la-Vall´ ee, LIGM - UMR CNRS 8049, France DISCo, Universit´ a degli Studi di Milano-Bicocca, - Milano, Italy DSLCSC, Universit´ a degli Studi di Bergamo, - Bergamo, Italy DIMI, Universit´ a di Udine - Udine, Italy Lehrstuhl fur Bioinformatik, Friedrich-Schiller-Universitat Jena, Germany

January 2012

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-2
SLIDE 2

Minimum Duplication Problem

◮ Problem in phylogenetics and comparative genomics

related to 2 types of trees: gene trees and species trees

◮ Evolutionary history of genomes

◮ results from a series of evolutionary events producing new

species from a common ancestor (speciation)

◮ represented as a species tree Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-3
SLIDE 3

Minimum Duplication Problem

◮ Other evolutionary events such as gene duplication, loss,

lateral transfer leading to new species

◮ Focus on duplication: genomic event causing a gene inside

a genome to be copied; each copy evolving independently

◮ Considering a specific gene family, its evolution with

regards to extant species is given as a gene tree

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-4
SLIDE 4

Trees reconciliation

◮ Gene and species trees may present incompatibilities ◮ A challenging problem is to reconcile them by hypothetical

gene duplication

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-5
SLIDE 5

Trees reconciliation

◮ Gene and species trees may present incompatibilities ◮ A challenging problem is to reconcile them by hypothetical

gene duplication

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-6
SLIDE 6

Trees reconciliation

◮ Gene and species trees may present incompatibilities ◮ A challenging problem is to reconcile them by hypothetical

gene duplication

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-7
SLIDE 7

Trees reconciliation

◮ Parsimony principle in finding minimum number of gene

duplications

◮ Inferred by lower common ancestor mapping

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-8
SLIDE 8

Minimum Duplication Problem

Definition Input a set of gene trees Output a species tree that induces a minimum number of gene duplications Known Hardness Results

◮ Relation with Minimum Triplets Consistency : NP-hard,

W[2]-hard,

◮ inapproximable within factor O(log n) even for a forest of

unbounded number of uniquely leaf-labbeled gene trees with three leaves

◮ ⇒ We will prove that it is APX-hard even when consisting

  • f 5 uniquely leaf-labelled gene trees with unbounded

number of leaves (technical proof not presented here)

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-9
SLIDE 9

Minimum Duplication Problem

Definition Input a set of gene trees Output a species tree that induces a minimum number of gene duplications Known Results On The Bright Side

◮ Different heuristics have been proposed ◮ Among them, Chauve et al proposed to consider a related

problem which recursively produces a natural greedy heuristic: MINIMUM BIPARTITE DUPLICATION PROBLEM

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-10
SLIDE 10

Minimum Bipartite Duplication Problem

Definition Input a set of gene trees Output a bipartition (Λ1, Λ2) of the species inducing a minimum number of gene duplications It corresponds to find duplications preceeding the first speciation (pre-duplications)

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-11
SLIDE 11

Minimum Bipartite Duplication Problem

Definition Input a set of gene trees Output a bipartition (Λ1, Λ2) of the species inducing a minimum number of gene duplications It corresponds to find duplications preceeding the first speciation (pre-duplications) Known Results On The Bright Side

◮ 2-approximable ◮ ⇒ We show that the problem is Randomized Polynomial

for an unbounded number of bounded depth gene trees

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-12
SLIDE 12

Randomized Algorithm

◮ Definition: Algorithm allowed to do some random decisions

as it processes the input

◮ We will prove that our algorithm has a polynomial overall

running time to get a high probability of success

◮ Based on the following correspondence : MBD ≡ Min Cut

in Colored Hypergraph ≡ Min Cut in Colored Graph

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-13
SLIDE 13

Randomized Algorithm

◮ Definition: Algorithm allowed to do some random decisions

as it processes the input

◮ We will prove that our algorithm has a polynomial overall

running time to get a high probability of success

◮ Based on the following correspondence : MBD ≡ Min Cut

in Colored Hypergraph ≡ Min Cut in Colored Graph

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-14
SLIDE 14

Randomized Algorithm

◮ Definition: Algorithm allowed to do some random decisions

as it processes the input

◮ We will prove that our algorithm has a polynomial overall

running time to get a high probability of success

◮ Based on the following correspondence : MBD ≡ Min Cut

in Colored Hypergraph ≡ Min Cut in Colored Graph

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-15
SLIDE 15

Randomized Algorithm

◮ Definition: Algorithm allowed to do some random decisions

as it processes the input

◮ We will prove that our algorithm has a polynomial overall

running time to get a high probability of success

◮ Based on the following correspondence : MBD ≡ Min Cut

in Colored Hypergraph ≡ Min Cut in Colored Graph

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-16
SLIDE 16

Min Cut in Colored Graph

◮ Randomized algorithm using colored contraction algorithm

inspired by folklore algorithm 1:

  • 1J. Kleinberg and E. Tardos

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-17
SLIDE 17

Min Cut in Colored Graph

◮ Randomized algorithm using colored contraction algorithm

inspired by folklore algorithm 1: Random choice of a color and contract all edges of this color

  • 1J. Kleinberg and E. Tardos

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-18
SLIDE 18

Min Cut in Colored Graph

◮ Randomized algorithm using colored contraction algorithm

inspired by folklore algorithm 1:

  • 1J. Kleinberg and E. Tardos

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-19
SLIDE 19

Min Cut in Colored Graph

◮ Randomized algorithm using colored contraction algorithm

inspired by folklore algorithm 1: Until you reach only two super-vertices

  • 1J. Kleinberg and E. Tardos

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-20
SLIDE 20

Min Cut in Colored Graph

◮ Randomized algorithm using colored contraction algorithm

inspired by folklore algorithm 1: At each step mul(c) contractions = |V| decreases from mul(c)

  • 1J. Kleinberg and E. Tardos

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-21
SLIDE 21

Min Cut in Colored Graph

◮ Simple randomized algorithm, but what about performance

analysis ? ⇒ It returns opt with probability ≥ (|V|2k)−1 where k = maxc∈Cmul(c)

◮ Let OPT = ♯ colors in optimal cut set

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-22
SLIDE 22

Min Cut in Colored Graph

◮ Simple randomized algorithm, but what about performance

analysis ? ⇒ It returns opt with probability ≥ (|V|2k)−1 where k = maxc∈Cmul(c)

◮ Let OPT = ♯ colors in optimal cut set ◮ Rk1: ∀v ∈ V, d(v) ≥ OPT

  • therwise ({v}, {V \ v}) would be better solution

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-23
SLIDE 23

Min Cut in Colored Graph

◮ Simple randomized algorithm, but what about performance

analysis ? ⇒ It returns opt with probability ≥ (|V|2k)−1 where k = maxc∈Cmul(c)

◮ Let OPT = ♯ colors in optimal cut set ◮ Rk1: ∀v ∈ V, d(v) ≥ OPT ◮ Rk2: OPT.|V| 2

≤ |E|

  • v∈V (d(v))

2

≤ |E|

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-24
SLIDE 24

Min Cut in Colored Graph

◮ Simple randomized algorithm, but what about performance

analysis ? ⇒ It returns opt with probability ≥ (|V|2k)−1 where k = maxc∈Cmul(c)

◮ Let OPT = ♯ colors in optimal cut set ◮ Rk1: ∀v ∈ V, d(v) ≥ OPT ◮ Rk2: OPT.|V| 2

≤ |E|

◮ Rk3: |E| ≤ k.|C|

since each color cannot be used more than k edges in E

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-25
SLIDE 25

Min Cut in Colored Graph

◮ Simple randomized algorithm, but what about performance

analysis ? ⇒ It returns opt with probability ≥ (|V|2k)−1 where k = maxc∈Cmul(c)

◮ Let OPT = ♯ colors in optimal cut set ◮ Rk1: ∀v ∈ V, d(v) ≥ OPT ◮ Rk2: OPT.|V| 2

≤ |E|

◮ Rk3: |E| ≤ k.|C| ◮ ⇒ OPT.|V| ≤ 2.|E| ≤ 2k.|C|

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-26
SLIDE 26

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-27
SLIDE 27

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-28
SLIDE 28

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’|

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-29
SLIDE 29

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k.|C’| |V ′|.|C’| since OPT.|V| ≤ 2.|E| ≤ 2k.|C|

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-30
SLIDE 30

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′|

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-31
SLIDE 31

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-32
SLIDE 32

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-33
SLIDE 33

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

j=0( |V|−i−2k |V|−i

)

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-34
SLIDE 34

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

j=0( |V|−i−2k |V|−i

)≥

1 |V|2k

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-35
SLIDE 35

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

j=0( |V|−i−2k |V|−i

)≥

1 |V|2k ◮ A single run of the algorithm fails to find the optimal with

probability at most (1 − (|V|2k)−1)

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-36
SLIDE 36

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

j=0( |V|−i−2k |V|−i

)≥

1 |V|2k ◮ A single run of the algorithm fails to find the optimal with

probability at most (1 − (|V|2k)−1)

◮ Running the algorithm |V|2kln|V| will lead to having no

success with a probability at most

1 |V| ; implying a bounded

k

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-37
SLIDE 37

Min Cut in Colored Graph

◮ The probability Pr[Fj] of failing at jth contraction considering

we are left with C ′ colors, and |V ′| = |V| − i vertices

◮ = choosing a color among the OPT ones ◮ Pr[Fj] ≤ OPT |C’| ≤ 2k |V ′| ◮ Pr[Success] ≥ j=0(1 − Pr[Fj])≥ j=0(1 − 2k |V|−i )

j=0( |V|−i−2k |V|−i

)≥

1 |V|2k ◮ A single run of the algorithm fails to find the optimal with

probability at most (1 − (|V|2k)−1)

◮ Running the algorithm |V|2kln|V| will lead to having no

success with a probability at most

1 |V| ; implying a bounded

k

◮ ⇒ MBD is randomized polynomial when the gene trees are

  • f bounded depth

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-38
SLIDE 38

Open problem

◮ What is the complexity of Minimum Colored Cut ? ◮ What is the complexity of MDB considering unbounded

depth gene trees ?

Guillaume Blin Complexity Insights of the Minimum Duplication Problem

slide-39
SLIDE 39

Complexity Insights of the Minimum Duplication Problem

Guillaume Blin Paola Bonizzoni Riccardo Dondi Romeo Rizzi Florian Sikora

Universit´ e Paris-Est Marne-la-Vall´ ee, LIGM - UMR CNRS 8049, France DISCo, Universit´ a degli Studi di Milano-Bicocca, - Milano, Italy DSLCSC, Universit´ a degli Studi di Bergamo, - Bergamo, Italy DIMI, Universit´ a di Udine - Udine, Italy Lehrstuhl fur Bioinformatik, Friedrich-Schiller-Universitat Jena, Germany

January 2012

Guillaume Blin Complexity Insights of the Minimum Duplication Problem