Polynomial-Time Approximation Algorithms for Weighted LCS Problem - PowerPoint PPT Presentation

Polynomial-Time Approximation Algorithms for Weighted LCS Problem Marek Cygan 1 , Marcin Kubica 1 , Jakub Radoszewski 1 , Wojciech Rytter 1 , 2 and Tomasz Waleń 1 1 University of Warsaw, Poland 2 Copernicus University, Toruń, Poland CPM 2011, 2011–06–29 1/23

Definitions Definition of a weighted sequence A weighted sequence X = x 1 x 2 . . . x n of length | X | = n over an alphabet Σ = { σ 1 , σ 2 , . . . , σ K } is a sequence of sets of pairs of the form: x i = { ( σ j , p ( X ) ( σ j )) : j = 1 , 2 , . . . , K } . i Here p i ( σ j ) is the occurrence probability of the character σ j at the position i , these values are non-negative and sum up to 1 for a given i . WS (Σ) is the set of all weighted sequences over the alphabet Σ . We assume that | Σ | = O ( 1 ) . 2/23

Definitions Example x 1 x 2 x 3 x 4 p 1 ( a ) = 1 / 3 p 2 ( a ) = 1 p 3 ( a ) = 0 p 4 ( a ) = 1 / 2 p 1 ( b ) = 1 / 3 p 2 ( b ) = 0 p 3 ( b ) = 1 / 2 p 4 ( b ) = 1 / 4 p 1 ( c ) = 1 / 3 p 2 ( c ) = 0 p 3 ( c ) = 1 / 2 p 4 ( c ) = 1 / 4 A weighted sequence X = x 1 x 2 x 3 x 4 over the alphabet Σ = { a , b , c } 3/23

Background Weighted sequences are also referred to in the literature as p-weighted sequences or Position Weighted Matrices (PWM) [Amir et al. 2010, Thompson et al. 1994]. The notion of a weighted sequence was introduced as a tool for motif discovery and local alignment, and is extensively used in computational molecular biology. Multiple algorithmic results related to combinatorics of weighted sequences, i.e., repetitions, regularities and pattern matching, have already been presented. 4/23

Definitions Definition (Occurence of subsequence s in weighted sequence X ) | s | = d , π = ( i 1 , i 2 , . . . , i d ) , 1 ≤ i 1 < i 2 < . . . < i d ≤ | X | , d p ( X ) � P X ( π, s ) = ( s k ) . i k k = 1 � s ∈ Σ ∗ : ∃ � � � π ∈ Seq | X | SUBS ( X , α ) = P X ( π, s ) ≥ α . | s | In other words SUBS ( X , α ) is the set of deterministic strings which match a subsequence of X with probability at least α . 5/23

Problems α -LCWS problem Input: Two weighted sequences X , Y ∈ WS (Σ) and a cut-off probability α . Output: The longest string s ∈ Σ ∗ such that � | s | , π ′ ∈ Seq | Y | � π ∈ Seq | X | P X ( π, s ) · P Y ( π ′ , s ) ≥ α. ∃ | s | Equivalently, s is the longest string in SUBS ( X , α 1 ) ∩ SUBS ( Y , α 2 ) for some α 1 · α 2 ≥ α . ( α 1 , α 2 )-LCWS2 problem Input: Two weighted sequences X , Y and two cut-off probabilities α 1 , α 2 . Output: The longest string s ∈ SUBS ( X , α 1 ) ∩ SUBS ( Y , α 2 ) . 6/23

Example: α -LCWS problem ( s , π, π ′ ) is the solution for α - X LCWS problem for α = 0 . 23. 1 2 3 4 5 0.2 1.0 0.3 0.9 0.9 a s = abba 0.1 0.8 0.0 0.7 0.1 b π = ( 1 , 2 , 4 , 5 ) π ′ = ( 1 , 3 , 4 , 5 ) Y 1 2 3 4 5 P X ( π, s ) = 0 . 9 · 0 . 8 · 0 . 7 · 0 . 9 = 0.5 0.1 0.2 0.9 0.8 a 0 . 4536 P Y ( π ′ , s ) = 0 . 9 · 0 . 9 · 0 . 8 · 0 . 8 = 0.1 0.5 0.9 0.8 0.2 b 0 . 5184 P X ( π, s ) ·P Y ( π ′ , s ) = 0 . 23514624 7/23

Example: ( α 1 , α 2 )-LCWS2 problem Solution for ( α 1 , α 2 )-LCWS2 for X 1 2 3 4 5 α 1 = 0 . 7, α 2 = 0 . 6. 0.2 0.3 0.9 0.9 1.0 a s = aba 0.1 0.8 0.0 0.7 0.1 b π = ( 1 , 2 , 3 ) π ′ = ( 1 , 3 , 5 ) Y 1 2 3 4 5 0.9 0.5 0.1 0.2 0.8 P X ( π, s ) = 0 . 9 · 0 . 8 · 1 . 0 = 0 . 72 a P Y ( π ′ , s ) = 0 . 9 · 0 . 9 · 0 . 8 = 0 . 648 0.1 0.5 0.8 0.2 0.9 b 8/23

Results summary Previous results for α -LCWS [Amir et al. 2010] The α -LCWS problem can be solved in O ( n 3 ) time and O ( n 2 ) space. If we are only interested in the length of the output, the problem can be solved in O ( Ln 2 ) time, where L is the length of the solution. NP-hardness for integer version of ( α 1 , α 2 )-LCWS2 Previous work Our results unbounded alphabet | Σ | = 2 Approximation results for ( α 1 , α 2 )-LCWS2 Previous work Our results 0.5 ( O ( n 5 ) time, O ( n 2 ) space) ( 1 / | Σ | ) PTAS ( O ( n 5 ) space) 9/23

( α 1 , α 2 )-LCWS2 and α -LCWS2 problems Definition ( α -LCWS2 problem) Input: Two weighted sequences X , Y ∈ WS (Σ) and a cut-off probability α . Output: The longest string s ∈ SUBS ( X , α ) ∩ SUBS ( Y , α ) . The following lemma shows that the ( α 1 , α 2 )-LCWS2 and α -LCWS2 problems are equivalent. Lemma The ( α 1 , α 2 )-LCWS2 problem can be reduced in linear time to the α -LCWS2 problem (with α = min ( α 1 , α 2 ) ). 10/23

( α 1 , α 2 )-LCWS2 and α -LCWS2 problems Proof. Solution: just rescale probabilities, and add special symbol # that will sum new probabilities to 1. Let α 1 < α 2 , and γ = log α 2 α 1 . p ( X ′ ) ( σ j ) = p ( X ) p ( X ′ ) ( σ j ) , (#) = 0 i i i k p ( Y ′ ) ( σ j ) = p ( Y ) p ( Y ′ ) p ( Y ′ ) ( σ j ) γ , � (#) = 1 − ( σ j ) . i i i i j = 1 11/23

NP-hardness Definition Define an I-weighted sequence X over the alphabet Σ = { σ 1 , σ 2 , . . . , σ K } as a sequence of sets of pairs of the form: x i = { ( σ j , w ( X ) where w ( X ) ( σ j )) : j = 1 , 2 , . . . , K } , ( σ j ) ∈ Z + . i i Definition For an I-weighted sequence X and s ∈ Σ d , define: d w ( X ) for π = ( i 1 , . . . , i d ) ∈ Seq | X | � W X ( π, s ) = ( s k ) d . i k k = 1 For an I-weighted sequence X and α ∈ Z + , denote: � s ∈ Σ ∗ : ∃ � � � π ∈ Seq | X | SUBS ( X , α ) = W X ( π, s ) ≤ α . | s | 12/23

NP-hardness Definition ( α -LCIWS2 problem) Input: Two I-weighted sequences X , Y and a cut-off value α ∈ Z + . Output: The longest string s ∈ SUBS ( X , α ) ∩ SUBS ( Y , α ) . Definition (Partition problem) Input: A finite set S , S ⊆ Z + . Binary output: Is there a subset S ′ ⊆ S such that � S ′ = � S \ S ′ . 13/23

NP-hardness Theorem LCIWS2 problem over a binary alphabet is NP-hard. Proof. For instance of Partition Problem, set S = { q 1 , q 2 , . . . , q n } we construct I-weighted sequences X = x 1 x 2 . . . x n and Y = y 1 y 2 . . . y n over the alphabet Σ = { a , b } with the following weights of letters from Σ : w ( X ) ( a ) = q i + c , w ( X ) w ( Y ) ( a ) = c , w ( Y ) ( b ) = c , ( b ) = q i + c . i i i i Here c > 0 is an arbitrary positive integer. Finally let � S + nc . α = 1 2 The Partition problem for an instance S has a positive answer iff the length of the solution to α -LCIWS2 for X and Y is n . 14/23

Approximation results Theorem (Amir et al. 2010) The α -LCWS problem can be solved in O ( n 3 ) time and O ( n 2 ) space. If we are only interested in the length of the output, the problem can be solved in O ( Ln 2 ) time, where L is the length of the solution. Theorem We can compute a solution to the α -LCWS2 problem for X , Y ∈ WS (Σ) of length at least ⌊ OPT ( X , Y , α ) / 2 ⌋ in O ( n 3 ) time and O ( n 2 ) space. Proof idea Solve α 2 -LCWS in O ( n 3 ) time, and then extract a solution for α -LCWS2 of size ⌊ OPT ( X , Y , α ) / 2 ⌋ . 15/23

Approximation results Proof sketch Let ( s , π, π ′ ) be the solution of α 2 -LCWS P X ( π, s ) · P Y ( π ′ , s ) ≥ α 2 . (1) � d � We can split this solution to two parts. Let g = . Obtaining 2 partial probabilities: g g p ( X ) p ( Y ) � � A = ( s j ) , B = ( s j ) , i j i ′ j j = 1 j = 1 d d p ( X ) p ( Y ) � � C = ( s j ) , D = ( s j ) . i j i ′ j j = g + 1 j = g + 1 Observe that only one of A , B , C , D can be smaller then α . So either ( A , B ) or ( C , D ) forms a solution with weight ≥ α . 16/23

Approximation results Theorem There exists a ( 1 / 2 ) -approximation algorithm for the α -LCWS2 problem which runs in O ( n 5 ) time and O ( n 2 ) space. Proof. Basically it is a consequence of previous lemma. To obtain the exact approximation ratio, we have to deal with the odd n case (this causes an O ( n 2 ) increase in the time complexity). 17/23

Polynomial-Time Approximation Algorithms for Weighted LCS Problem - PowerPoint PPT Presentation

Polynomial-Time Approximation Algorithms for Weighted LCS Problem Marek Cygan 1 , Marcin Kubica 1 , Jakub Radoszewski 1 , Wojciech Rytter 1 , 2 and Tomasz Wale 1 1 University of Warsaw, Poland 2 Copernicus University, Toru, Poland CPM 2011,

Advanced Algorithms COMS31900 Approximation algorithms part four Asymptotic Polynomial Time

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

International Low Carbon Society Research Network: LCS-RNet R h N t k LCS RN t Researchers

Polynomial approximation and floating-point numbers Algorithms Project Seminar Sylvain

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Moderately exponential approximation Bridging the gap between exact computation and polynomial

6. Approximation and fitting norm approximation least-norm problems regularized

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

Japan Scenario for LCS Japan Scenario for LCS Kyoto University Kyoto University Tomoaki YURA

Some consideration to deliver LCS from sustainable development points of view Mikiko Kainuma

LCS LCS LCSs for China LCSs for China s for China s for China R Residential Residential

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

POLAND-SCHERAGA model and renewal theory Maha Khatib Supervised by Giambattista Giacomin LPMA -

Turning Borel sets into Clopen effectively Vassilis Gregoriades TU Darmstadt

A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

the real-time Internet routing observatory Alessandro Improta alessandro.improta@iit.cnr.it Our

Perturbations of Binary de Bruijn sequences Martianus Frederic Ezerman, Adamas Aqsa Fahreza NTU,

Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen Sasha Rakhlin Karthik

Modeling of fractional dynamics using L evy walks - recent advances Marcin Magdziarz Hugo

MALACH : Multilingual Access to Large spoken ArCHives http://www.clsp.jhu.edu/research/malach