the closest substring problem with small distances
play

The Closest Substring problem with small distances D aniel Marx - PowerPoint PPT Presentation

The Closest Substring problem with small distances D aniel Marx dmarx@informatik.hu-berlin.de June 10, 2005 The Closest Substring problem with small distances p.1/28 The Closest String problem C LOSEST S TRING Strings s 1 , . . . , s k


  1. The Closest Substring problem with small distances D´ aniel Marx dmarx@informatik.hu-berlin.de June 10, 2005 The Closest Substring problem with small distances – p.1/28

  2. The Closest String problem C LOSEST S TRING Strings s 1 , . . . , s k of length L Input: Solution: A string s of length L (center string) max k Minimize: i =1 d ( s, s i ) d ( w 1 , w 2 ) : the number of positions where w 1 and w 2 differ (Hamming distance). Applications: computational biology (e.g., finding common ancestors) Problem is NP-hard even with binary alphabet [Frances and Litman, 1997]. The Closest Substring problem with small distances – p.2/28

  3. The Closest Substring problem C LOSEST S UBSTRING Strings s 1 , . . . , s k , an integer L Input: Solution: — string s of length L (center string), — a length L substring s ′ i of s i for every i max k i =1 d ( s, s ′ Minimize: i ) Remark: For a given s , it is easy to find the best s ′ i for every i . Applications: finding common patterns, drug design. Problem is NP-hard even with binary alphabet (C LOSEST S TRING is the special case | s i | = L .) C LOSEST S UBSTRING admits a PTAS [Li, Ma, & Wang, 2002]: for every ǫ > 0 there is an n O (1 /ǫ 4 ) algorithm that produces a (1 + ǫ ) -approximation. The Closest Substring problem with small distances – p.3/28

  4. Parameterized Complexity Goal: restrict the exponential growth of the running time to one parameter of the input. Definition: Problem is fixed-parameter tractable (FPT) with parameter k if there is an algorithm with running time f ( k ) · n c where c is a fixed constant not depending on k . Definition: Problem is fixed-parameter tractable (FPT) with parameters k 1 and k 2 if there is an algorithm with running time f ( k 1 , k 2 ) · n c where c is a fixed constant not depending on k 1 and k 2 . The Closest Substring problem with small distances – p.4/28

  5. Parameterized intractability We expect that M AXIMUM I NDEPENDENT S ET is not fixed-parameter tractable, no n o ( k ) algorithm is known. W[1]-complete ≈ “as hard as M AXIMUM I NDEPENDENT S ET ” The Closest Substring problem with small distances – p.5/28

  6. Parameterized intractability We expect that M AXIMUM I NDEPENDENT S ET is not fixed-parameter tractable, no n o ( k ) algorithm is known. W[1]-complete ≈ “as hard as M AXIMUM I NDEPENDENT S ET ” Parameterized reductions: L 1 is reducible to L 2 , if there is a function f that transforms ( x, k ) to ( x ′ , k ′ ) such that ( x, k ) ∈ L 1 if and only if ( x ′ , k ′ ) ∈ L 2 , f can be computed in f ( k ) | x | c time, k ′ depends only on k If L 1 is reducible to L 2 , and L 2 is in FPT, then L 1 is in FPT as well. Most NP-completeness proofs are not good for parameterized reductions. The Closest Substring problem with small distances – p.5/28

  7. Parameterized Closest Substring C LOSEST S UBSTRING Strings s 1 , . . . , s k over Σ , integers L and d Input: k, L, d, | Σ | Possible parameters: Find: — string s of length L (center string), — a length L substring s ′ i of s i for every i such that d ( s, s ′ i ) ≤ d for every i Possible parameters: k : might be small d : might be small L : usually large | Σ | : usually a small constant The Closest Substring problem with small distances – p.6/28

  8. Closest Substring—Results parameter | Σ | is constant | Σ | is parameter | Σ | is unbounded d ? ? W[1]-hard k W[1]-hard W[1]-hard W[1]-hard d,k ? ? W[1]-hard L FPT FPT W[1]-hard d,k,L FPT FPT W[1]-hard (Hardness results by [Fellows, Gramm, Niedermeier 2002].) The Closest Substring problem with small distances – p.7/28

  9. Closest Substring—Results parameter | Σ | is constant | Σ | is parameter | Σ | is unbounded d W[1]-hard W[1]-hard W[1]-hard k W[1]-hard W[1]-hard W[1]-hard d,k W[1]-hard W[1]-hard W[1]-hard L FPT FPT W[1]-hard d,k,L FPT FPT W[1]-hard (Hardness results by [Fellows, Gramm, Niedermeier 2002].) Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d , even if | Σ | = 2 . (In the rest of the talk, Σ is always { 0 , 1 } .) The Closest Substring problem with small distances – p.7/28

  10. Hardness of Closest Substring Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d . Proof by parameterized reduction from M AXIMUM I NDEPENDENT S ET . C LOSEST S UBSTRING M AXIMUM I NDEPENDENT S ET k = 2 2 O ( t ) ⇒ ( G, t ) d = 2 O ( t ) Corollary: No f ( k, d ) · n c algorithm for C LOSEST S UBSTRING unless FPT=W[1]. The Closest Substring problem with small distances – p.8/28

  11. Hardness of Closest Substring Theorem: [D.M.] C LOSEST S UBTRING is W[1]-hard with parameters k and d . Proof by parameterized reduction from M AXIMUM I NDEPENDENT S ET . C LOSEST S UBSTRING M AXIMUM I NDEPENDENT S ET k = 2 2 O ( t ) ⇒ ( G, t ) d = 2 O ( t ) Corollary: No f ( k, d ) · n c algorithm for C LOSEST S UBSTRING unless FPT=W[1]. Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOS - EST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algo- rithm. The Closest Substring problem with small distances – p.8/28

  12. Hardness of Closest Substring Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOSEST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm. M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm ⇓ n variable 3-SAT can be solved in 2 o ( n ) time � FPT=M[1] The Closest Substring problem with small distances – p.9/28

  13. Hardness of Closest Substring Corollary: No f ( k, d ) · n o (log d ) or f ( k, d ) · n o (log log k ) algorithm for C LOSEST S UBSTRING unless M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm. M AXIMUM I NDEPENDENT S ET has an f ( t ) · n o ( t ) algorithm ⇓ n variable 3-SAT can be solved in 2 o ( n ) time � FPT=M[1] The lower bound on the exponent of n is best possible: Theorem: [D.M.] C LOSEST S UBSTRING can be solved in f 1 ( d, k ) · n O (log d ) time. Theorem: [D.M.] C LOSEST S UBSTRING can be solved in f 2 ( d, k ) · n O (log log k ) time. The Closest Substring problem with small distances – p.9/28

  14. Relation to approximability PTAS: algorithm that produces a (1 + ǫ ) -approximation in time n f ( ǫ ) . EPTAS: (efficient PTAS) a PTAS with running time f ( ǫ ) · n O (1) . 1 Observation: if ǫ = d +1 , then a (1 + ǫ ) -approximation algorithm can correctly decide whether the optimum is d or d + 1 ⇒ if an optimization problem has an EPTAS, then it is FPT. Corollary: C LOSEST S UBSTRING has no EPTAS, unless FPT=W[1]. Corollary: C LOSEST S UBSTRING has no f ( ǫ ) · n o (log ǫ ) time PTAS, unless FPT=M[1]. The Closest Substring problem with small distances – p.10/28

  15. What’s next? f 1 ( d, k ) · n O (log d ) time algorithm Some results on hypergraphs f 2 ( d, k ) · n O (log log k ) time algorithm Sketch of the completeness proof Conclusions Lunch The Closest Substring problem with small distances – p.11/28

  16. The first algorithm Definition: A solution is a minimal solution if � k i =1 d ( s, s ′ i ) is as small as possible (and d ( s, s ′ i ) ≤ d for every i ). The Closest Substring problem with small distances – p.12/28

  17. The first algorithm Definition: A solution is a minimal solution if � k i =1 d ( s, s ′ i ) is as small as possible (and d ( s, s ′ i ) ≤ d for every i ). Definition: A set of length L strings G generates a length L string s if whenever the strings in G agree at the i -th position, then s has the same character at this position. Example: G 1 generates s but G 2 does not. 1 1 0 1 0 1 1 1 0 1 1 1 G 1 G 2 0 1 0 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 s 1 1 0 1 0 1 s 1 1 0 1 0 1 The Closest Substring problem with small distances – p.12/28

  18. First algorithm Let S be the set of all length L substrings of s 1 , . . . , s k . Clearly, |S| ≤ n . Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. The Closest Substring problem with small distances – p.13/28

  19. First algorithm Let S be the set of all length L substrings of s 1 , . . . , s k . Clearly, |S| ≤ n . Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. Algorithm: Construct the set S . Consider every subset G ⊆ S of size O (log d ) . If there are at most O ( d log d ) positions in G where they disagree, then try every center string generated by G . Running time: | Σ | O ( d log d ) · n O (log d ) . The Closest Substring problem with small distances – p.13/28

  20. Proof of the lemma Lemma: If s is the center string of a minimal solution, then S has a subset G of size O (log d ) that generates s , and the strings in G agree in all but at most O ( d log d ) positions. Proof: Let ( s, s ′ 1 , . . . , s ′ k ) be a minimal solution. We show that { s ′ 1 , . . . , s ′ k } has a O (log d ) subset that generates s . The bad positions of a set of strings are the positions where they agree, but s is different. Clearly, { s ′ 1 } has at most d bad positions. We show that if a set of strings has p bad positions, then we can decrease the number of bad positions to p/ 2 by adding a string s ′ i ⇒ no bad position remains after adding log d strings. The Closest Substring problem with small distances – p.14/28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend