String Matching with Involutions Florin Manea Challenges in - - PowerPoint PPT Presentation

string matching with involutions
SMART_READER_LITE
LIVE PREVIEW

String Matching with Involutions Florin Manea Challenges in - - PowerPoint PPT Presentation

String Matching with Involutions Florin Manea Challenges in Combinatorics on Words April 2013 Fields Institute, Toronto Open Problem String Matching with Involutions 1 String matching Given two words T (text) and P (pattern), find all


slide-1
SLIDE 1

String Matching with Involutions

Florin Manea Challenges in Combinatorics on Words – April 2013 Fields Institute, Toronto

Open Problem String Matching with Involutions 1

slide-2
SLIDE 2

String matching

Given two words T (text) and P (pattern), find all occurrences of P in T.

Open Problem String Matching with Involutions 2

slide-3
SLIDE 3

String matching

Given two words T (text) and P (pattern), find all occurrences of P in T. P = acgttgcacg T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaaagcaaggtcgaataatacgttgcacgtttttt

Open Problem String Matching with Involutions 2

slide-4
SLIDE 4

String matching

Given two words T (text) and P (pattern), find all occurrences of P in T. P = acgttgcacg T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaaagcaaggtcgaataatacgttgcacgtttttt

Open Problem String Matching with Involutions 2

slide-5
SLIDE 5

String matching

Given two words T (text) and P (pattern), find all occurrences of P in T. P = acgttgcacg T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaaagcaaggtcgaataatacgttgcacgtttttt

Open Problem String Matching with Involutions 2

slide-6
SLIDE 6

String matching

Given two words T (text) and P (pattern), find all occurrences of P in T. P = acgttgcacg T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaaagcaaggtcgaataatacgttgcacgtttttt Solution: O(|T| + |P|), e.g., the Knuth-Morris-Pratt algorithm.

Open Problem String Matching with Involutions 2

slide-7
SLIDE 7

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id].

Open Problem String Matching with Involutions 3

slide-8
SLIDE 8

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id]. Given T and P and an antimorphic involution f : V ∗ → V ∗, find all factors P′ of T obtained by non-overlapping f -mirrorings from P.

Open Problem String Matching with Involutions 3

slide-9
SLIDE 9

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id]. Given T and P and an antimorphic involution f : V ∗ → V ∗, find all factors P′ of T obtained by non-overlapping f -mirrorings from P. P = acgttgcacg f : f (a) = a, f (c) = c, f (g) = g, f (t) = t T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaagcatacgtcgaataatacgacgttcgtttttt

Open Problem String Matching with Involutions 3

slide-10
SLIDE 10

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id]. Given T and P and an antimorphic involution f : V ∗ → V ∗, find all factors P′ of T obtained by non-overlapping f -mirrorings from P. P = acgttgcacg f : f (a) = a, f (c) = c, f (g) = g, f (t) = t T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaagcatacgtcgaataatacgacgttcgtttttt

Open Problem String Matching with Involutions 3

slide-11
SLIDE 11

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id]. Given T and P and an antimorphic involution f : V ∗ → V ∗, find all factors P′ of T obtained by non-overlapping f -mirrorings from P. P = acgttgcacg f : f (a) = a, f (c) = c, f (g) = g, f (t) = t T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaagcatacgtcgaataatacgacgttcgtttttt P = acgttgcacg f : f (a) = t, f (c) = g, f (g) = c, f (t) = a T = atatatataacgttgcacgtcgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaacgttagcaacgaataatacgtgcaacgtttttt

Open Problem String Matching with Involutions 3

slide-12
SLIDE 12

String matching with involutions

Antimorphic involution f : V ∗ → V ∗: f -mirroring. [f (w) = f (w[n])f (w[n − 1]) · · · f (w[1]), f 2 = Id]. Given T and P and an antimorphic involution f : V ∗ → V ∗, find all factors P′ of T obtained by non-overlapping f -mirrorings from P. P = acgttgcacg f : f (a) = a, f (c) = c, f (g) = g, f (t) = t T = atatatataacgttgcacgttgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaagcatacgtcgaataatacgacgttcgtttttt P = acgttgcacg f : f (a) = t, f (c) = g, f (g) = c, f (t) = a T = atatatataacgttgcacgtcgcacgaaaaaaacgttgcacgaataatacgttgcacg acacacacaacgttgcacgaaaaaacgttagcaacgaataatacgtgcaacgtttttt

Open Problem String Matching with Involutions 3

slide-13
SLIDE 13

Why string matching with involutions?

Approximate string matching: find all the factors of T obtained from P by a series of simple operations (e.g., edit operations).

Open Problem String Matching with Involutions 4

slide-14
SLIDE 14

Why string matching with involutions?

Approximate string matching: find all the factors of T obtained from P by a series of simple operations (e.g., edit operations). Bio-inspired operations: affect the pattern on a larger scale, e.g., mirroring of factors, translocations, etc. [Cantone, Cristofaro, Faro, Giaquinta, Grabowski, 2009 - 2011]: string matching with rotations and translocations,

Open Problem String Matching with Involutions 4

slide-15
SLIDE 15

Why string matching with involutions?

Approximate string matching: find all the factors of T obtained from P by a series of simple operations (e.g., edit operations). Bio-inspired operations: affect the pattern on a larger scale, e.g., mirroring of factors, translocations, etc. [Cantone, Cristofaro, Faro, Giaquinta, Grabowski, 2009 - 2011]: string matching with rotations and translocations, [Czeizler, Czeizler, Kari, Seki, 2008 - 2011]: combinatorics on words for repetitions with involutions: xf (x)xxf (x) . . .,

Open Problem String Matching with Involutions 4

slide-16
SLIDE 16

Why string matching with involutions?

Approximate string matching: find all the factors of T obtained from P by a series of simple operations (e.g., edit operations). Bio-inspired operations: affect the pattern on a larger scale, e.g., mirroring of factors, translocations, etc. [Cantone, Cristofaro, Faro, Giaquinta, Grabowski, 2009 - 2011]: string matching with rotations and translocations, [Czeizler, Czeizler, Kari, Seki, 2008 - 2011]: combinatorics on words for repetitions with involutions: xf (x)xxf (x) . . ., [Gawrychowski, Manea, M¨ uller, Merca¸ s, Nowotka, 2012 - 2013]: algorithmics and combinatorics on words for general pseudo-repetitions.

Open Problem String Matching with Involutions 4

slide-17
SLIDE 17

Known results

|T| = n, |P| = m Mirroring: O(nm) time in the worst case, O(m2) space complexity [Cantone et al., CPM 2011].

Open Problem String Matching with Involutions 5

slide-18
SLIDE 18

Known results

|T| = n, |P| = m Mirroring: O(nm) time in the worst case, O(m2) space complexity [Cantone et al., CPM 2011]. Translocations are allowed: O(nm2) time in the worst case, O(m) space, O(n) average time (subject to some artificial restriction). [Grabowski et al., Inf. Proc. Lett. 2011]

Open Problem String Matching with Involutions 5

slide-19
SLIDE 19

Known results

|T| = n, |P| = m Mirroring: O(nm) time in the worst case, O(m2) space complexity [Cantone et al., CPM 2011]. Translocations are allowed: O(nm2) time in the worst case, O(m) space, O(n) average time (subject to some artificial restriction). [Grabowski et al., Inf. Proc. Lett. 2011] Open problem: linear average time, with O(nm) or better time in worst case, O(m2) or better space complexity. [Cantone et al., CPM 2011].

Open Problem String Matching with Involutions 5

slide-20
SLIDE 20

(our) Latest Results:

Antimorphic involutions: generalized mirroring.

Open Problem String Matching with Involutions 6

slide-21
SLIDE 21

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

Open Problem String Matching with Involutions 6

slide-22
SLIDE 22

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity.

Open Problem String Matching with Involutions 6

slide-23
SLIDE 23

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity. O(n) average time (subject to some simple restrictions on the input alphabet, depending on the involution).

Open Problem String Matching with Involutions 6

slide-24
SLIDE 24

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity. O(n) average time (subject to some simple restrictions on the input alphabet, depending on the involution). Online algorithm.

Open Problem String Matching with Involutions 6

slide-25
SLIDE 25

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity. O(n) average time (subject to some simple restrictions on the input alphabet, depending on the involution). Online algorithm. Open problems: better complexities (for what kind of alphabets?)

Open Problem String Matching with Involutions 6

slide-26
SLIDE 26

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity. O(n) average time (subject to some simple restrictions on the input alphabet, depending on the involution). Online algorithm. Open problems: better complexities (for what kind of alphabets?), use also translocations

Open Problem String Matching with Involutions 6

slide-27
SLIDE 27

(our) Latest Results:

Antimorphic involutions: generalized mirroring. Novel (simpler) strategy: greedy (but with complex data structures)

  • vs. dynamic programming.

O(nm) worst case time complexity, O(m) space complexity. O(n) average time (subject to some simple restrictions on the input alphabet, depending on the involution). Online algorithm. Open problems: better complexities (for what kind of alphabets?), use also translocations, simpler solutions.

Open Problem String Matching with Involutions 6