The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris - - PowerPoint PPT Presentation

the complexity of string partitioning
SMART_READER_LITE
LIVE PREVIEW

The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris - - PowerPoint PPT Presentation

Motivation The collision-free string partition problems Complexity proofs Summary The complexity of string partitioning Anne Condon 1 nuch 1 , 2 Chris Thachuk 1 J an Ma 1 Department of Computer Science University of British Columbia 2


slide-1
SLIDE 1

Motivation The collision-free string partition problems Complexity proofs Summary

The complexity of string partitioning

Anne Condon1 J´ an Maˇ nuch1,2 Chris Thachuk1

1Department of Computer Science

University of British Columbia

2Department of Mathematics

Simon Fraser University

CPM 2012

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-2
SLIDE 2

Motivation The collision-free string partition problems Complexity proofs Summary

Outline

1

Motivation

2

The collision-free string partition problems Problem definition Results summary

3

Complexity proofs Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

4

Summary

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-3
SLIDE 3

Motivation The collision-free string partition problems Complexity proofs Summary

Synthesizing Novel Genes

design goal synthesize a long duplex of DNA/RNA

ACCGATAGAATTTTTCTCCTCGATCTTTTTTCCATGTAGTCCTCGGTCGATTTGCGAGAAGATTTTAGAGGCTGCTTCGGT TGGCTATCTTAAAAAGAGGAGCTAGAAAAAAGGTACATCAGGAGCCAGCTAAACGCTCTTCTAAAATCTCCGACGAAGCCA

5' -

  • 3'
  • 5'

3' -

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-4
SLIDE 4

Motivation The collision-free string partition problems Complexity proofs Summary

Synthesizing Novel Genes

design goal synthesize a long duplex of DNA/RNA (some) challenges

  • nly short fragments can be synthesized reliably

ACCGATAGAATTTTTCTCCTCGATCTTTTTTCCATGTAGTCCTCGGTCGATTTGCGAGAAGATTTTAGAGGCTGCTTCGGT TGGCTATCTTAAAAAGAGGAGCTAGAAAAAAGGTACATCAGGAGCCAGCTAAACGCTCTTCTAAAATCTCCGACGAAGCCA

5' -

  • 3'
  • 5'

3' -

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-5
SLIDE 5

Motivation The collision-free string partition problems Complexity proofs Summary

Synthesizing Novel Genes

design goal synthesize a long duplex of DNA/RNA (some) challenges

  • nly short fragments can be synthesized reliably

fragments must cover duplex fragments must self-assemble correctly

TGGCTATCTT ACCGATAGAATTTTTCTCCTCGATCTTTTTTCCATGTAGTCCTCGGTCGATTTGCGAGAAGATTTTAGAGGCTGCTTCGGT TGGCTATCTTAAAAAGAGGAGCTAGAAAAAAGGTACATCAGGAGCCAGCTAAACGCTCTTCTAAAATCTCCGACGAAGCCA

5' -

AATTTTTCTCC TTTTTTCCATGT ATCAGGAGCCAG CGCTCTTCTAAA TTTTAGAGGCTG CGACGAAGCCA TCGATTTGCG GGAGCTAGAAAAA

  • 3'
  • 5'

3' -

Oi Oj Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-6
SLIDE 6

Motivation The collision-free string partition problems Complexity proofs Summary

Collision Example

Example Fragments c and g are: identical

  • n the same strand

CATTAATCGCA CGTTATGGTCCCT GGGATCGATTCGTT CCTGAATCGAGCAA GCAAACGTCAAAGGG TTACGCGTAAGTAA a b c d e f CGTTATGGTCCCT g GGGAAACTGCAAACG h

5' 3' 3' 5'

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-7
SLIDE 7

Motivation The collision-free string partition problems Complexity proofs Summary

Collision Example

Example Fragments c and g are: identical

  • n the same strand

CATTAATCGCA CGTTATGGTCCCT GGGATCGATTCGTT CCTGAATCGAGCAA GCAAACGTCAAAGGG TTACGCGTAAGTAA a b c d e f CGTTATGGTCCCT g GGGAAACTGCAAACG h

5' 3' 3' 5'

C A T T A A T C G C A C G T T A T G G T C C C T T T A C G C G T A A G T A A G G G A A A C T G C A A A C G a g b h GGGATCGATTCGTT CCTGAATCGAGCAA GCAAACGTCAAAGGG d f e CGTTATGGTCCCT c Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-8
SLIDE 8

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

K-partition

Definition (K-partition of a string w) A K-partition of a string w is a sequence P = p1, p2, . . . , pℓ for some ℓ such that w = p1p2 . . . pℓ and for each i = 1, . . . ℓ, |pi| ≤ K. Example p1 p2 p3 p4 p5 p6 m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-9
SLIDE 9

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

K-partition

Definition (K-partition of a string w) A K-partition of a string w is a sequence P = p1, p2, . . . , pℓ for some ℓ such that w = p1p2 . . . pℓ and for each i = 1, . . . ℓ, |pi| ≤ K. We say: p1, . . . , pℓ are selected and for any i ≤ j, string pipi+1 . . . pj is super-selected. Example p1 p2 p3 p4 p5 p6 m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-10
SLIDE 10

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

K-partition

Definition (K-partition of a string w) A K-partition of a string w is a sequence P = p1, p2, . . . , pℓ for some ℓ such that w = p1p2 . . . pℓ and for each i = 1, . . . ℓ, |pi| ≤ K. We say: p1, . . . , pℓ are selected and for any i ≤ j, string pipi+1 . . . pj is super-selected. Example p1 p2 p3 p4 p5 p6 m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-11
SLIDE 11

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

K-partition

Definition (K-partition of a string w) A K-partition of a string w is a sequence P = p1, p2, . . . , pℓ for some ℓ such that w = p1p2 . . . pℓ and for each i = 1, . . . ℓ, |pi| ≤ K. We say: p1, . . . , pℓ are selected and for any i ≤ j, string pipi+1 . . . pj is super-selected. Definition (K-partition of a set of strings W) A K-partition of a set of strings W is a set of K-partitions of all strings in W. Example p1 p2 p3 p4 p5 p6 m i s s i s s i p p i a c g g g a t c c t a g c g g a c a g g g c t a

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-12
SLIDE 12

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Problem definition

X-Free String Partition (X-SP) Problem Instance Finite alphabet Σ, a positive integer K, and a string w from Σ∗.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-13
SLIDE 13

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Problem definition

X-Free String Partition (X-SP) Problem Instance Finite alphabet Σ, a positive integer K, and a string w from Σ∗. Question Is there an X-free, K-partition P of w?

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-14
SLIDE 14

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Problem definition

X-Free String Partition (X-SP) Problem Instance Finite alphabet Σ, a positive integer K, and a string w from Σ∗. Question Is there an X-free, K-partition P of w? X-Free Multiple String Partition (X-SP) Problem Instance Finite alphabet Σ, a positive integer K, and a set of strings W from Σ∗. Question Is there an X-free, K-partition P of W?

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-15
SLIDE 15

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Types of collisions

Types of collisions we consider X equality (E) m i s s i s s i p p i m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-16
SLIDE 16

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Types of collisions

Types of collisions we consider X equality (E) m i s s i s s i p p i m i s s i s s i p p i factor (F) m i s s i s s i p p i m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-17
SLIDE 17

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Types of collisions

Types of collisions we consider X equality (E) m i s s i s s i p p i m i s s i s s i p p i factor (F) m i s s i s s i p p i m i s s i s s i p p i prefix (P) m i s s i s s i p p i m i s s i s s i p p i

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-18
SLIDE 18

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Results

X constant partition size K constant alphabet size L equality (E) factor (F) prefix (P) Note. If both K and L are constant, then any string longer than L + L2 + · · · + LK does not have an X-free K-partition. Since the number of strings shorter or equal to this number is constant, the problem can be solved in constant time. Similarly, the problem is easily solvable for the unary alphabet (L = 1) or the partition size K = 1.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-19
SLIDE 19

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Results

Previous results X constant partition size K constant alphabet size L equality (E) NP-c for K = 2 NP-c for L = 4 factor (F) prefix (P)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-20
SLIDE 20

Motivation The collision-free string partition problems Complexity proofs Summary Problem definition Results summary

Results

New results X constant partition size K constant alphabet size L equality (E) NP-c for K = 2 NP-c for L = 2 factor (F) NP-c for K = 3 NP-c for L = 2 prefix (P) NP-c for K = 2 NP-c for L = 2

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-21
SLIDE 21

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Chain of reductions

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2) FF-MSP(K=3) FF-MSP(L=2) FF-SP(L=2) FF-SP(K=3) PF-MSP(K=2) PF-SP(K=2) PF-MSP(L=2) PF-SP(L=2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-22
SLIDE 22

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Chain of reductions

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2) FF-MSP(K=3) FF-MSP(L=2) FF-SP(L=2) FF-SP(K=3) PF-MSP(K=2) PF-SP(K=2) PF-MSP(L=2) PF-SP(L=2)

3SAT(3) Problem Instance A formula φ with a set C of clauses over a set X of variables in conjunctive normal form such that: every clause contains 2 or 3 literals, each variable occurs in exactly 3 clauses,

  • nce negated and twice positive

Question Is φ satisfiable?

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-23
SLIDE 23

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Chain of reductions

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2) FF-MSP(K=3) FF-MSP(L=2) FF-SP(L=2) FF-SP(K=3) PF-MSP(K=2) PF-SP(K=2) PF-MSP(L=2) PF-SP(L=2)

3SAT(3) Problem Instance A formula φ with a set C of clauses over a set X of variables in conjunctive normal form such that: every clause contains 2 or 3 literals, each variable occurs in exactly 3 clauses,

  • nce negated and twice positive

Question Is φ satisfiable?

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-24
SLIDE 24

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (K = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) for K = 2 is NP-complete.

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-25
SLIDE 25

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Overview

For an arbitrary 3SAT(3) instance, φ, with clause set C over variable set X, construct an EF-MSP (K = 2) instance as follows:

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-26
SLIDE 26

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Overview

For an arbitrary 3SAT(3) instance, φ, with clause set C over variable set X, construct an EF-MSP (K = 2) instance as follows:

1

set Σ = {ˆ xi; xi ∈ X} ∪ {ˆ cj

i ; ci ∈ C ∧ j ≤ |ci|} ∪ {⊟, ⊞}

symbol for each variable symbol for each literal forbidden symbols

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-27
SLIDE 27

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Overview

For an arbitrary 3SAT(3) instance, φ, with clause set C over variable set X, construct an EF-MSP (K = 2) instance as follows:

1

set Σ = {ˆ xi; xi ∈ X} ∪ {ˆ cj

i ; ci ∈ C ∧ j ≤ |ci|} ∪ {⊟, ⊞}

symbol for each variable symbol for each literal forbidden symbols

2

set of strings W = C ∪ E ∪ F C encodes the clauses of φ E ensures selected literals in C are consistent F = {⊟, ⊞} ensures forbidden substrings are not selected in C ∪ E

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-28
SLIDE 28

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Constructing C

1

For each clause ci ∈ C, create the ith clause string: |ci|? ˆ c1

i ⊟ ˆ

c2

i

2 ˆ c1

i ⊟ ˆ

c2

i ⊟ ˆ

c3

i

3

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-29
SLIDE 29

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Constructing C

1

For each clause ci ∈ C, create the ith clause string: |ci|? ˆ c1

i ⊟ ˆ

c2

i

2 ˆ c1

i ⊟ ˆ

c2

i ⊟ ˆ

c3

i

3

2

Since ⊟ is selected in F, we have only the following equality-free 2-partitions of clause strings:

ˆ c1

i

⊟ ˆ c2

i

ˆ c1

i

⊟ ˆ c2

i

⊟ ˆ c3

i Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-30
SLIDE 30

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Constructing C

1

For each clause ci ∈ C, create the ith clause string: |ci|? ˆ c1

i ⊟ ˆ

c2

i

2 ˆ c1

i ⊟ ˆ

c2

i ⊟ ˆ

c3

i

3

2

Since ⊟ is selected in F, we have only the following equality-free 2-partitions of clause strings:

ˆ c1

i

⊟ ˆ c2

i

ˆ c1

i

⊟ ˆ c2

i

⊟ ˆ c3

i

Lemma If no string from F is selected, exactly one literal symbol is selected for each clause string in any equality-free 2-partition of C.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-31
SLIDE 31

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Constructing E

1

For each variable xv ∈ X, create the v th enforcer string: ˆ cp

i ⊞ ˆ

cr

k ˆ

xv ˆ cr

k ˆ

xv ˆ cr

k ⊞ ˆ

cq

j

where ˆ cp

i and ˆ

cq

j are the two positive occurrences,

and ˆ cr

k is the negated occurrence

Lemma Given that no string from F is selected, any equality-free 2-partition of C ∪ E must be “consistent”.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-32
SLIDE 32

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Proof. In all possible 2-partitions (9), there are two cases: (i) ˆ cr

k is selected: then ˆ

cr

k cannot be selected in C

(⇒ all selected literals of xv in C are positive) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-33
SLIDE 33

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Proof. In all possible 2-partitions (9), there are two cases: (i) ˆ cr

k is selected: then ˆ

cr

k cannot be selected in C

(⇒ all selected literals of xv in C are positive) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j

(ii) ˆ cp

i and ˆ

cq

j are selected: neither can be selected in C

(⇒ all selected literals of xv in C are negated) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-34
SLIDE 34

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Proof. In all possible 2-partitions (9), there are two cases: (i) ˆ cr

k is selected: then ˆ

cr

k cannot be selected in C

(⇒ all selected literals of xv in C are positive) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j

(ii) ˆ cp

i and ˆ

cq

j are selected: neither can be selected in C

(⇒ all selected literals of xv in C are negated) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j

There are equality-free 2-partitions of the enforcer string for the 4 consistent assignments to the 3 literals of variable xv.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-35
SLIDE 35

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Proof. In all possible 2-partitions (9), there are two cases: (i) ˆ cr

k is selected: then ˆ

cr

k cannot be selected in C

(⇒ all selected literals of xv in C are positive) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j

(ii) ˆ cp

i and ˆ

cq

j are selected: neither can be selected in C

(⇒ all selected literals of xv in C are negated) ˆ cp

i

⊞ ˆ cr

k

ˆ xv ˆ cr

k

ˆ xv ˆ cr

k

⊞ ˆ cq

j

There are equality-free 2-partitions of the enforcer string for the 4 consistent assignments to the 3 literals of variable xv. reduction is in poly time/space (the total length of strings in C ∪ E ∪ F is at most 5|C| + 9|X| + 2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-36
SLIDE 36

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-SP (K = 2)

Theorem Equality-free String Partition Problem (EF-SP) for K = 2 is NP-complete.

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-37
SLIDE 37

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-38
SLIDE 38

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm} EF-SP instance ¯ I alphabet ¯ Σ = Σ ∪ {⊡, d1, . . . , dm−1} ¯ K = K = 2 ¯ w = ⊡ ⊡ ⊡ ⊡ ⊟w1d1 ⊡ ⊡d1w2d2 ⊡ ⊡d2 . . . dm−1 ⊡ ⊡dm−1wm

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-39
SLIDE 39

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm} EF-SP instance ¯ I alphabet ¯ Σ = Σ ∪ {⊡, d1, . . . , dm−1} ¯ K = K = 2 ¯ w = ⊡ ⊡ ⊡ ⊡ ⊟w1d1 ⊡ ⊡d1w2d2 ⊡ ⊡d2 . . . dm−1 ⊡ ⊡dm−1wm The following substrings have to be selected in a valid 2-partition ¯ w: ⊡ ⊡ ⊡ ⊡ ⊟ w1 d1 ⊡ ⊡ d1 w2 d2 ⊡ ⊡ d2 . . . dm−1⊡ ⊡dm−1 wm

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-40
SLIDE 40

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm} EF-SP instance ¯ I alphabet ¯ Σ = Σ ∪ {⊡, d1, . . . , dm−1} ¯ K = K = 2 ¯ w = ⊡ ⊡ ⊡ ⊡ ⊟w1d1 ⊡ ⊡d1w2d2 ⊡ ⊡d2 . . . dm−1 ⊡ ⊡dm−1wm The following substrings have to be selected in a valid 2-partition ¯ w: ⊡ ⊡ ⊡ ⊡ ⊟ w1 d1 ⊡ ⊡ d1 w2 d2 ⊡ ⊡ d2 . . . dm−1⊡ ⊡dm−1 wm

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-41
SLIDE 41

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm} EF-SP instance ¯ I alphabet ¯ Σ = Σ ∪ {⊡, d1, . . . , dm−1} ¯ K = K = 2 ¯ w = ⊡ ⊡ ⊡ ⊡ ⊟w1d1 ⊡ ⊡d1w2d2 ⊡ ⊡d2 . . . dm−1 ⊡ ⊡dm−1wm The following substrings have to be selected in a valid 2-partition ¯ w: ⊡ ⊡ ⊡ ⊡ ⊟ w1 d1 ⊡ ⊡ d1 w2 d2 ⊡ ⊡ d2 . . . dm−1⊡ ⊡dm−1 wm All strings w1, w2, . . . , wm are super-selected.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-42
SLIDE 42

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ K = 2 W = {w1, . . . , wm} EF-SP instance ¯ I alphabet ¯ Σ = Σ ∪ {⊡, d1, . . . , dm−1} ¯ K = K = 2 ¯ w = ⊡ ⊡ ⊡ ⊡ ⊟w1d1 ⊡ ⊡d1w2d2 ⊡ ⊡d2 . . . dm−1 ⊡ ⊡dm−1wm The following substrings have to be selected in a valid 2-partition ¯ w: ⊡ ⊡ ⊡ ⊡ ⊟ w1 d1 ⊡ ⊡ d1 w2 d2 ⊡ ⊡ d2 . . . dm−1⊡ ⊡dm−1 wm All strings w1, w2, . . . , wm are super-selected. A valid 2-partition of ¯ w contains a valid 2-partition of W, and a valid 2-partition of W can be extended to a valid 2-partition of ¯ w.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-43
SLIDE 43

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (L = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) with binary alphabet (L = 2) is NP-complete.

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-44
SLIDE 44

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ = {a1, . . . , aL} K = 2 W = {w1, . . . , wm}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-45
SLIDE 45

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ = {a1, . . . , aL} K = 2 W = {w1, . . . , wm} EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = 2δ, where δ ≥ log L

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-46
SLIDE 46

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ = {a1, . . . , aL} K = 2 W = {w1, . . . , wm} EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = 2δ, where δ ≥ log L encode each letter ai in Σ with a binary string ci ∈ C of length δ

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-47
SLIDE 47

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ = {a1, . . . , aL} K = 2 W = {w1, . . . , wm} EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = 2δ, where δ ≥ log L encode each letter ai in Σ with a binary string ci ∈ C of length δ let h : Σ∗ → ¯ Σ∗ be homomorphism defined by h(ai) = ci, for every i = 1, . . . , L

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-48
SLIDE 48

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction

EF-MSP instance I alphabet Σ = {a1, . . . , aL} K = 2 W = {w1, . . . , wm} EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = 2δ, where δ ≥ log L encode each letter ai in Σ with a binary string ci ∈ C of length δ let h : Σ∗ → ¯ Σ∗ be homomorphism defined by h(ai) = ci, for every i = 1, . . . , L ¯ W ={h(w1), . . . , h(cm)}∪ {prefixi(c); c ∈ C, i = 1, . . . , δ − 1}∪ {prefixi(cd); c, d ∈ C, i = δ + 1, . . . , 2δ − 1}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-49
SLIDE 49

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Example

Example EF-MSP instance I alphabet Σ = {a, b, c, d, e} K = 2 W = {a, bc, de, ea}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-50
SLIDE 50

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Example

Example EF-MSP instance I alphabet Σ = {a, b, c, d, e} K = 2 W = {a, bc, de, ea} Example EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} δ = 3, ¯ K = 6 C = {000, 001, 010, 011, 100} ¯ W = {000, 001010, 011100, 100000, 0, 1, 00, 01, 10, 0000, 0001, 00000, 00001, 00010, . . . 1000, 1001, 10000, 10001, 10010}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-51
SLIDE 51

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Example

Example EF-MSP instance I alphabet Σ = {a, b, c, d, e} K = 2 W = {a, bc, de, ea} Example EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} δ = 3, ¯ K = 6 C = {000, 001, 010, 011, 100} ¯ W = {000, 001010, 011100, 100000, 0, 1, 00, 01, 10, 0000, 0001, 00000, 00001, 00010, . . . 1000, 1001, 10000, 10001, 10010}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-52
SLIDE 52

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction Example

Example EF-MSP instance I alphabet Σ = {a, b, c, d, e} K = 2 W = {a, bc, de, ea} Example EF-MSP instance ¯ I alphabet ¯ Σ = {0, 1} δ = 3, ¯ K = 6 C = {000, 001, 010, 011, 100} ¯ W = {000, 001010, 011100, 100000, 0, 1, 00, 01, 10, 0000, 0001, 00000, 00001, 00010, . . . 1000, 1001, 10000, 10001, 10010}

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-53
SLIDE 53

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (L = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) with binary alphabet (L = 2) is NP-complete. Proof. It can be shown by induction that each string in {prefixi(c); c ∈ C, i = 1, . . . , δ − 1} ∪ {prefixi(cd); c, d ∈ C, i = δ + 1, . . . , 2δ − 1} has to be selected complete.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-54
SLIDE 54

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (L = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) with binary alphabet (L = 2) is NP-complete. Proof. It can be shown by induction that each string in {prefixi(c); c ∈ C, i = 1, . . . , δ − 1} ∪ {prefixi(cd); c, d ∈ C, i = δ + 1, . . . , 2δ − 1} has to be selected complete. It then follows that each string in {h(w1), . . . , h(wm)} has to be partitioned into substrings of length δ or 2δ.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-55
SLIDE 55

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (L = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) with binary alphabet (L = 2) is NP-complete. Proof. It can be shown by induction that each string in {prefixi(c); c ∈ C, i = 1, . . . , δ − 1} ∪ {prefixi(cd); c, d ∈ C, i = δ + 1, . . . , 2δ − 1} has to be selected complete. It then follows that each string in {h(w1), . . . , h(wm)} has to be partitioned into substrings of length δ or 2δ. Hence, a valid 2δ-partition of ¯ W can be translated to a valid 2-partition of W, and vice versa.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-56
SLIDE 56

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-MSP (L = 2)

Theorem Equality-free Multiple String Partition Problem (EF-MSP) with binary alphabet (L = 2) is NP-complete. Proof. It can be shown by induction that each string in {prefixi(c); c ∈ C, i = 1, . . . , δ − 1} ∪ {prefixi(cd); c, d ∈ C, i = δ + 1, . . . , 2δ − 1} has to be selected complete. It then follows that each string in {h(w1), . . . , h(wm)} has to be partitioned into substrings of length δ or 2δ. Hence, a valid 2δ-partition of ¯ W can be translated to a valid 2-partition of W, and vice versa. The total length of new strings || ¯ W|| = δ||W|| + (3L2 + L)(δ − 1)δ/2.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-57
SLIDE 57

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Complexity of EF-SP (L = 2)

Theorem Equality-free String Partition Problem (EF-SP) with binary alphabet (L = 2) is NP-complete.

3SAT(3) EF-MSP(K=2) EF-MSP(L=2) EF-SP(L=2) EF-SP(K=2)

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-58
SLIDE 58

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction overview

EF-MSP instance I alphabet Σ = {0, 1} K = 2δ W = {w1, . . . , wm}, where each wi starts with 0

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-59
SLIDE 59

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction overview

EF-MSP instance I alphabet Σ = {0, 1} K = 2δ W = {w1, . . . , wm}, where each wi starts with 0 EF-SP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = K = 2δ

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-60
SLIDE 60

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction overview

EF-MSP instance I alphabet Σ = {0, 1} K = 2δ W = {w1, . . . , wm}, where each wi starts with 0 EF-SP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = K = 2δ ¯ w = w1d1w2d2 . . . dm−1wm, where delimiter strings d1, . . . , dm−1 are binary strings design in the way that in any valid K-partition of ¯ w, they are super-selected.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-61
SLIDE 61

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Reduction overview

EF-MSP instance I alphabet Σ = {0, 1} K = 2δ W = {w1, . . . , wm}, where each wi starts with 0 EF-SP instance ¯ I alphabet ¯ Σ = {0, 1} ¯ K = K = 2δ ¯ w = w1d1w2d2 . . . dm−1wm, where delimiter strings d1, . . . , dm−1 are binary strings design in the way that in any valid K-partition of ¯ w, they are super-selected.

  • Note. The length of ¯

w is ||W|| + mK 2.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-62
SLIDE 62

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Delimiter strings design

d1 = 0(1)K−1(1)K(K−1)/2(1)K−10 di = chain(bin(j)) for i > 1 bin(j) = binary representation of j without leading 1 chain(s′10i

  • s

) = padR

K−|s|(s) padK−|s|(s′1) padR K−|s|(s′1) padK−|s|(s′10) padR K−|s|(s′10)

. . . padK−|s|(s′1(0)i−1) padR

K−|s|(s′1(0)i−1) padK−|s|(s)

padi(s) = 1i−10s

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-63
SLIDE 63

Motivation The collision-free string partition problems Complexity proofs Summary Complexity of EF-MSP (partition size K = 2) Complexity of EF-SP (partition size K = 2) Complexity of EF-MSP (alphabet size L = 2) Complexity of EF-SP (alphabet size L = 2)

Delimiter strings design

d1 = 0(1)K−1(1)K(K−1)/2(1)K−10 di = chain(bin(j)) for i > 1 bin(j) = binary representation of j without leading 1 chain(s′10i

  • s

) = padR

K−|s|(s) padK−|s|(s′1) padR K−|s|(s′1) padK−|s|(s′10) padR K−|s|(s′10)

. . . padK−|s|(s′1(0)i−1) padR

K−|s|(s′1(0)i−1) padK−|s|(s)

padi(s) = 1i−10s

Example d1 = 0(1)K−1(1)K(K−1)/2(1)K−10 d2 = chain(0) = 00(1)K−2(1)K−200(1)K−2(1)K−200 d3 = chain(1) = 10(1)K−2(1)K−201 d4 = chain(00) = 000(1)K−3(1)K−300(1)K−3(1)K−30000(1)K−3(1)K−3000 d5 = chain(01) = 100(1)K−3(1)K−3001

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-64
SLIDE 64

Motivation The collision-free string partition problems Complexity proofs Summary

Results

X-Free (Multiple) String Partition Problem X constant partition size K constant alphabet size L equality (E) NP-c for K = 2 NP-c for L = 2 factor (F) NP-c for K = 3 NP-c for L = 2 prefix (P) NP-c for K = 2 NP-c for L = 2

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-65
SLIDE 65

Motivation The collision-free string partition problems Complexity proofs Summary

Open Questions

Open Questions Consider different types of collisions, for example, the selected strings have to form a code. Complexity of collision-free string partitioning for string duplexes. Complexity when considering collisions with inexact matching.

Condon, Maˇ nuch and Thachuk The complexity of string partitioning

slide-66
SLIDE 66

Motivation The collision-free string partition problems Complexity proofs Summary

Questions?

Condon, Maˇ nuch and Thachuk The complexity of string partitioning