Optimally Solving Hard Combinatorial Problems in Computational - - PowerPoint PPT Presentation

optimally solving hard combinatorial problems in
SMART_READER_LITE
LIVE PREVIEW

Optimally Solving Hard Combinatorial Problems in Computational - - PowerPoint PPT Presentation

Colorful Components Graph Orientation Optimally Solving Hard Combinatorial Problems in Computational Biology Falk Hffner Institut fr Softwaretechnik und Theoretische Informatik, TU Berlin 7 October 2013 Falk Hffner (TU Berlin)


slide-1
SLIDE 1

Colorful Components Graph Orientation

Optimally Solving Hard Combinatorial Problems in Computational Biology

Falk Hüffner

Institut für Softwaretechnik und Theoretische Informatik, TU Berlin

7 October 2013

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 1/24

slide-2
SLIDE 2

Colorful Components Graph Orientation

Multiple Sequence Alignment

T1 A2 C3 G4 T5 A6 T1 A2 G3 T4 A5 T1 A2 C3 G4 T5 G6 A7

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-3
SLIDE 3

Colorful Components Graph Orientation

Multiple Sequence Alignment

T1 A2 C3 G4 T5 A6 T1 A2 G3 T4 A5 T1 A2 C3 G4 T5 G6 A7

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-4
SLIDE 4

Colorful Components Graph Orientation

Multiple Sequence Alignment

T1 A2 C3 G4 T5 A6 T1 A2 G3 T4 A5 T1 A2 C3 G4 T5 G6 A7

1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 6 1 6 1 6 1 7 1 7

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-5
SLIDE 5

Colorful Components Graph Orientation

Multiple Sequence Alignment

?

1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 6 1 6 1 6 1 7 1 7 Idea

Use alignment graph constructed by local alignment to reconstruct global alignment.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-6
SLIDE 6

Colorful Components Graph Orientation

Multiple Sequence Alignment

?

1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 6 1 6 1 6 1 7 1 7 Idea

Use alignment graph constructed by local alignment to reconstruct global alignment.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-7
SLIDE 7

Colorful Components Graph Orientation

Multiple Sequence Alignment

T1 A2 C3 G4 T5 A6 T1 A2 G3 T4 A5 T1 A2 C3 G4 T5 G6 A7

1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 6 1 6 1 6 1 7 1 7 Idea

Use alignment graph constructed by local alignment to reconstruct global alignment.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 2/24

slide-8
SLIDE 8

Colorful Components Graph Orientation

Colorful Components

Part of a Multiple Sequence Alignment pipeline suggested by Corel, Pitschi & Morgenstern (Bioinformatics 2010).

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 3/24

slide-9
SLIDE 9

Colorful Components Graph Orientation

Colorful Components

Part of a Multiple Sequence Alignment pipeline suggested by Corel, Pitschi & Morgenstern (Bioinformatics 2010).

COLORFUL COMPONENTS

Instance: An undirected graph G = (V , E ) and a coloring of the vertices χ : V → {1, . . . , c}. Task: Delete a minimum number of edges such that all connected components are colorful, that is, they do not contain two vertices of the same color.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 3/24

slide-10
SLIDE 10

Colorful Components Graph Orientation

Colorful Components

Part of a Multiple Sequence Alignment pipeline suggested by Corel, Pitschi & Morgenstern (Bioinformatics 2010).

COLORFUL COMPONENTS

Instance: An undirected graph G = (V , E ) and a coloring of the vertices χ : V → {1, . . . , c}. Task: Delete a minimum number of edges such that all connected components are colorful, that is, they do not contain two vertices of the same color. Other application: Orthologs in multiple genomes: From the set of all pairwise homologies, find disjoint orthology sets of genes. [Zheng, Swenson, Lyons & Sankoff, WABI ’11]

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 3/24

slide-11
SLIDE 11

Colorful Components Graph Orientation

Complexity of Colorful Components

COLORFUL COMPONENTS with two colors can be solved in O (√nm) time by matching techniques.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 4/24

slide-12
SLIDE 12

Colorful Components Graph Orientation

Complexity of Colorful Components

COLORFUL COMPONENTS with two colors can be solved in O (√nm) time by matching techniques. COLORFUL COMPONENTS is NP-hard already with three colors.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 4/24

slide-13
SLIDE 13

Colorful Components Graph Orientation

Complexity of Colorful Components

COLORFUL COMPONENTS with two colors can be solved in O (√nm) time by matching techniques. COLORFUL COMPONENTS is NP-hard already with three colors. COLORFUL COMPONENTS can be approximated by a factor

  • f 4 ln(c + 1).

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 4/24

slide-14
SLIDE 14

Colorful Components Graph Orientation

Exact solutions

Want to solve COLORFUL COMPONENTS exactly: Can interpret solutions within the model; Can differentiate between weaknesses of model and weaknesses of algorithm; Can judge quality of heuristics; Time-limited exact algorithms often give good heuristics.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 5/24

slide-15
SLIDE 15

Colorful Components Graph Orientation

Fixed-parameter algorithms

Idea

Find an algorithm that gives optimal solutions and thus has exponential running time, but restrict the combinatorial explosion to a parameter.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 6/24

slide-16
SLIDE 16

Colorful Components Graph Orientation

Fixed-parameter algorithms

Idea

Find an algorithm that gives optimal solutions and thus has exponential running time, but restrict the combinatorial explosion to a parameter.

Definition

A problem is called fixed-parameter tractable with respect to a parameter k if an instance of size n can be solved in f (k ) · n O (1) time for an arbitrary function f .

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 6/24

slide-17
SLIDE 17

Colorful Components Graph Orientation

Fixed-parameter algorithm

Observation

COLORFUL COMPONENTS can be seen as the problem of destroying by edge deletions all bad paths, that is, simple paths between equally colored vertices.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 7/24

slide-18
SLIDE 18

Colorful Components Graph Orientation

Fixed-parameter algorithm

Observation

COLORFUL COMPONENTS can be seen as the problem of destroying by edge deletions all bad paths, that is, simple paths between equally colored vertices.

Observation

Unless the graph is already colorful, we can always find a bad path with at most c edges, where c is the number of colors.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 7/24

slide-19
SLIDE 19

Colorful Components Graph Orientation

Fixed-parameter algorithm

Observation

COLORFUL COMPONENTS can be seen as the problem of destroying by edge deletions all bad paths, that is, simple paths between equally colored vertices.

Observation

Unless the graph is already colorful, we can always find a bad path with at most c edges, where c is the number of colors.

Theorem

COLORFUL COMPONENTS can be solved in O (c k · m) time, where k is the number of edge deletions.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 7/24

slide-20
SLIDE 20

Colorful Components Graph Orientation

Improved fixed-parameter algorithm

Theorem

COLORFUL COMPONENTS can be solved in O ((c − 1)k · m) time, where k is the number of edge deletions.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 8/24

slide-21
SLIDE 21

Colorful Components Graph Orientation

Improved fixed-parameter algorithm

Theorem

COLORFUL COMPONENTS can be solved in O ((c − 1)k · m) time, where k is the number of edge deletions.

Proof.

If there is a degree-3 or higher vertex v, find a bad path with at most (c − 1) edges by BFS from v. Otherwise, the instance is easy.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 8/24

slide-22
SLIDE 22

Colorful Components Graph Orientation

Limits of fixed-parameter algorithms

Question

How much further can we improve this algorithm?

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 9/24

slide-23
SLIDE 23

Colorful Components Graph Orientation

Limits of fixed-parameter algorithms

Question

How much further can we improve this algorithm?

Theorem

COLORFUL COMPONENTS with three colors cannot be solved in 2o(k ) · n O (1) unless the Exponential Time Hypothesis is false.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 9/24

slide-24
SLIDE 24

Colorful Components Graph Orientation

Data reduction

Data reduction

Let V ′ ⊆ V be a colorful subgraph. If the cut between V ′ and V \ V ′ is at least as large as the connectivity of V ′, then merge V ′ into a single vertex.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 10/24

slide-25
SLIDE 25

Colorful Components Graph Orientation

Kernelizations

In classical (one-dimensional) complexity analysis, nothing can be proven about the power of data reduction. In parameterized complexity, we have the concept of a problem kernel: a data reduction rule that creates an instance whose size depends only on the parameter k , and not on the original input size n anymore.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 11/24

slide-26
SLIDE 26

Colorful Components Graph Orientation

Data

We generated one COLORFUL COMPONENTS instance for each multiple alignment instance from the BAliBASE 3.0 benchmark. We restricted the experiments to the 135 of instances that have at most 10 colors.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 12/24

slide-27
SLIDE 27

Colorful Components Graph Orientation

Data reduction: Largest connected component

  • riginal

after data red. n m c n m c average 504 921 6.2 354 607 5.3 median 149 232 6 42 58 5

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 13/24

slide-28
SLIDE 28

Colorful Components Graph Orientation

Branching algorithms: running time

< 1 s 1 s to 10 min > 10 min branching branching 70 9 56

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 14/24

slide-29
SLIDE 29

Colorful Components Graph Orientation

Sequence alignment quality

DIALIGN with several methods for solving the COLORFUL COMPONENTS subproblem: TC score min-cut heuristic 53.6 % exact algorithm 56.6 %

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 15/24

slide-30
SLIDE 30

Colorful Components Graph Orientation

Sequence alignment quality

DIALIGN with several methods for solving the COLORFUL COMPONENTS subproblem: TC score min-cut heuristic 53.6 % exact algorithm 56.6 % DIALIGN with the min-cut heuristic is about 10 percentage points worse than current state-of-the-art multiple alignment

  • methods. Hence, an improvement of 3 percentage points is a

sizable step towards closing the gap between DIALIGN and these methods.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 15/24

slide-31
SLIDE 31

Colorful Components Graph Orientation

Integer Linear Programming

An Integer Linear Program (ILP) minimizes a linear function under linear constraints and integrality constraints.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 16/24

slide-32
SLIDE 32

Colorful Components Graph Orientation

Integer Linear Programming

An Integer Linear Program (ILP) minimizes a linear function under linear constraints and integrality constraints. Commercial solvers like CPLEX and Gurobi profit from decades

  • f engineering and can often solve real-world instances

surprisingly fast.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 16/24

slide-33
SLIDE 33

Colorful Components Graph Orientation

ILP for Colorful Components

Idea: binary variable euv that is 1 if u and v are in the same component

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 17/24

slide-34
SLIDE 34

Colorful Components Graph Orientation

ILP for Colorful Components

Idea: binary variable euv that is 1 if u and v are in the same component maximize

{u,v}∈E wuveuv where

wuv =      −∞ if χ(u) = χ(v), 1 if {u, v} ∈ E ,

  • therwise.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 17/24

slide-35
SLIDE 35

Colorful Components Graph Orientation

ILP for Colorful Components

Idea: binary variable euv that is 1 if u and v are in the same component maximize

{u,v}∈E wuveuv where

wuv =      −∞ if χ(u) = χ(v), 1 if {u, v} ∈ E ,

  • therwise.

subject to euv + evw − euw 1

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 17/24

slide-36
SLIDE 36

Colorful Components Graph Orientation

Wikipedia interlanguage links

30 most popular languages 11,977,500 vertices, 46,695,719 edges 2,698,241 connected components, of which 2,472,481 are already colorful largest connected component has 1,828 vertices and 14,403 edges solved optimally by data reduction + ILP in about 80 minutes 618,660 edges deleted, 434,849 inserted.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 18/24

slide-37
SLIDE 37

Colorful Components Graph Orientation

Random graph model

10

  • 2

10

  • 1

10 10

1

10

2

time (s) 20 40 60 80 100 instances solved (%)

Implicit Hitting Set Hitting Set row generation Clique Partitioning ILP Clique Partitioning without cuts Branching

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 19/24

slide-38
SLIDE 38

Colorful Components Graph Orientation

Graph orientation

Current technologies like two-hybrid screening can find protein interactions, but cannot decide their direction. We can try to reconstruct the directions from gene knockout experiments.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 20/24

slide-39
SLIDE 39

Colorful Components Graph Orientation

Graph orientation

Current technologies like two-hybrid screening can find protein interactions, but cannot decide their direction. We can try to reconstruct the directions from gene knockout experiments.

GRAPH ORIENTATION

Instance: An undirected graph G = (V , E ) and a set P ⊆ V × V of source–target pairs. Task: Find an orientation of each edge in E such that for a maximum number of (s, t) ∈ P there is a directed path from s to t.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 20/24

slide-40
SLIDE 40

Colorful Components Graph Orientation

Graph orientation

Data reduction

Join the vertices of a cycle into a single vertex.

Observation

We can assume w. l. o. g. that the input is a tree.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 21/24

slide-41
SLIDE 41

Colorful Components Graph Orientation

Graph orientation

Data reduction

Join the vertices of a cycle into a single vertex.

Observation

We can assume w. l. o. g. that the input is a tree.

Theorem (Medvedovsky, Bafna, Zwick & Sharan, WABI ’08)

TREE ORIENTATION is NP-hard, even if the tree has diameter two or maximum degree three.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 21/24

slide-42
SLIDE 42

Colorful Components Graph Orientation

Graph orientation

Data reduction

Join the vertices of a cycle into a single vertex.

Observation

We can assume w. l. o. g. that the input is a tree.

Theorem (Medvedovsky, Bafna, Zwick & Sharan, WABI ’08)

TREE ORIENTATION is NP-hard, even if the tree has diameter two or maximum degree three.

Theorem

TREE ORIENTATION can be reduced to VERTEX COVER on the conflict graph.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 21/24

slide-43
SLIDE 43

Colorful Components Graph Orientation

Parameters

p: number of pairs: 2p · n O (1) k : number of unsatisfied pairs: 1.38k · n O (1) mv: max. number of paths over a tree vertex: 2mv · n O (1) qv: max. number of cross paths over a tree vertex: 2qv · n O (1) me: max. number of paths over an edge: NP-hard for me 3

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 22/24

slide-44
SLIDE 44

Colorful Components Graph Orientation

Parameters

p: number of pairs: 2p · n O (1) k : number of unsatisfied pairs: 1.38k · n O (1) mv: max. number of paths over a tree vertex: 2mv · n O (1) qv: max. number of cross paths over a tree vertex: 2qv · n O (1) me: max. number of paths over an edge: NP-hard for me 3 n p k mv qv me 799 2014 17 2014 3 59 796 2443 46 2443 35 275 638 2311 68 2310 151 208 441 787 75 785 88 45 299 477 110 411 75 165 192 167 32 161 24 86 114 27 2 26 2 21

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 22/24

slide-45
SLIDE 45

Colorful Components Graph Orientation

Experiments

Data reduction for VERTEX COVER

Take the neighbor of a degree-1 vertex into the cover. Running time: Branching 0.13s ILP 0.02s

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 23/24

slide-46
SLIDE 46

Colorful Components Graph Orientation

Conclusions

Fixed-parameter algorithms can often solve hard problems optimally and have useful worst-case bounds; Data reduction can substantially speed up algorithms for hard problems and should always be used, whether using exact or heuristic approaches; Integer Linear Programming often yields a simple way to solve hard problems fast.

Falk Hüffner (TU Berlin) Optimally Solving Hard Combinatorial Problems in Computational Biology 24/24