Linear-Time Algorithm for Morphic Imprimitivity Testing Tomasz - - PowerPoint PPT Presentation

linear time algorithm for morphic imprimitivity testing
SMART_READER_LITE
LIVE PREVIEW

Linear-Time Algorithm for Morphic Imprimitivity Testing Tomasz - - PowerPoint PPT Presentation

Linear-Time Algorithm for Morphic Imprimitivity Testing Tomasz Kociumaka 1 Jakub Radoszewski 1 Wojciech Rytter 1 , 2 Tomasz Wale 3 , 1 1 Faculty of Mathematics, Informatics and Mechanics, University of Warsaw { kociumaka,jrad,rytter,walen }


slide-1
SLIDE 1

Linear-Time Algorithm for Morphic Imprimitivity Testing

Tomasz Kociumaka1 Jakub Radoszewski1 Wojciech Rytter 1,2 Tomasz Waleń3,1

1Faculty of Mathematics, Informatics and Mechanics, University of Warsaw

{kociumaka,jrad,rytter,walen}@mimuw.edu.pl

2Faculty of Mathematics and Computer Science, Nicolaus Copernicus University,

Toruń

3International Institute of Molecular and Cell Biology in Warsaw

LATA 2013, 2013–04–05

1/24

slide-2
SLIDE 2

Outline

  • 1. Problem definition
  • 2. Short introduction to existing solutions
  • 3. Description of the new linear time solution

2/24

slide-3
SLIDE 3

Problem definition

Morphic Imprimitivity Testing

For a input word w ∈ Σn, is there a non-trivial morphism h such that: h(w) = w Non-trivial means that h should not be an identity function. The word w is non-primitive if such morphism exists,

  • therwise it is primitive.

3/24

slide-4
SLIDE 4

Problem definition

Morphic Imprimitivity Testing

For a input word w ∈ Σn, is there a non-trivial morphism h such that: h(w) = w Non-trivial means that h should not be an identity function. The word w is non-primitive if such morphism exists,

  • therwise it is primitive.

Previous results

◮ it can be solved in O((|Σ| + log n) · n) time (S. Holub 2009), ◮ slightly improved to O(|Σ| · n) time (S. Holub, V. Matocha,

arXiv 2012).

3/24

slide-5
SLIDE 5

Example

Simple case

Let w = abaacaca

4/24

slide-6
SLIDE 6

Example

Simple case

Let w = abaacaca Letter b appears only once, so we can take: h(a) = ǫ (empty word) h(b) = abaacaca h(c) = ǫ

4/24

slide-7
SLIDE 7

Example

Simple case

Let w = abaacaca Letter b appears only once, so we can take: h(a) = ǫ (empty word) h(b) = abaacaca h(c) = ǫ

More complicated case

Let w = aacabaaaacaacabaa

4/24

slide-8
SLIDE 8

Example

Simple case

Let w = abaacaca Letter b appears only once, so we can take: h(a) = ǫ (empty word) h(b) = abaacaca h(c) = ǫ

More complicated case

Let w = aac abaa aac aac abaa we can take: h(a) = ǫ h(b) = abaa h(c) = aac

4/24

slide-9
SLIDE 9

Problem applications

Closely connected to several topics in formal language theory, and combinatorics on words:

◮ fixed points of morphisms, ◮ pattern languages, ◮ ambiguity of the morphisms.

5/24

slide-10
SLIDE 10

Problem applications

Closely connected to several topics in formal language theory, and combinatorics on words:

◮ fixed points of morphisms, ◮ pattern languages, ◮ ambiguity of the morphisms.

Reviewer’s opinion

Although I cannot think of any actual applications, I find this question to be very natural

5/24

slide-11
SLIDE 11

How to solve it? - Intuition

Theorem

For a word w, if there exists non-trivial morphism h, such that h(w) = w, then there exists non-trivial morphism h′ such that:

◮ h′(w) = w ◮ for all immortal letters x ∈ E: h′(x) = lx x rx

(i.e. h′(b) = abaa)

◮ for all mortal letters x ∈ E: h′(x) = ǫ

6/24

slide-12
SLIDE 12

How to solve it? - Intuition

Theorem

For a word w, if there exists non-trivial morphism h, such that h(w) = w, then there exists non-trivial morphism h′ such that:

◮ h′(w) = w ◮ for all immortal letters x ∈ E: h′(x) = lx x rx

(i.e. h′(b) = abaa)

◮ for all mortal letters x ∈ E: h′(x) = ǫ

a b b a c d c d c d c c d c a b b a c d c d c d c c d c h(a) = cdac h(b) = dbc h(c) = ǫ h(d) = ǫ w = h(w) = a,b – immortal letters, c,d – mortal letters.

6/24

slide-13
SLIDE 13

Holub’s algorithm

The algorithm maintains three sets:

◮ E – set of candidates for immortal letters, ◮ L and R – sets of interpositions.

7/24

slide-14
SLIDE 14

Holub’s algorithm

The algorithm maintains three sets:

◮ E – set of candidates for immortal letters, ◮ L and R – sets of interpositions.

Algorithm:

◮ start with empty sets E = L = R = ∅, ◮ apply rules (a)-(e) (in any order), to obtain fixed-point.

7/24

slide-15
SLIDE 15

Holub’s algorithm

The algorithm maintains three sets:

◮ E – set of candidates for immortal letters, ◮ L and R – sets of interpositions.

Algorithm:

◮ start with empty sets E = L = R = ∅, ◮ apply rules (a)-(e) (in any order), to obtain fixed-point.

From triple (E, L, R) the actual morphism can be obtained:

◮ if the set E = Σ, then the morphism is non-trivial, ◮ from L, R we can deduce a way to divide input word to obtain

morphism.

7/24

slide-16
SLIDE 16

Holub’s rule (a) – initialization of the algorithm

L := L ∪ {0, n}, R := R ∪ {0, n} Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c

L,R L,R

8/24

slide-17
SLIDE 17

Holub’s rule (b) – initialization of immortal letters

if w[i] ∈ E then L := L ∪ {i − 1} and R := R ∪ {i}, Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b

i

L R

9/24

slide-18
SLIDE 18

Holub’s rule (c) – neighborhood marking

The neighborhood of letter x – nx is the maximum factor that surrounds each occurrence of letter x in w. if w[i..j] = nx for some x ∈ E then R := R ∪ {i − 1} and L := L ∪ {j}, Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b b

i j

L R

nb nb

10/24

slide-19
SLIDE 19

Holub’s rule (d) – copying rules

if w[i..j] = w[i′..j′] = na for some a ∈ E and i − 1 ≤ k ≤ j then if w[k] ∈ L then L := L ∪ {i′ + (k − i)} if w[k] ∈ R then R := R ∪ {i′ + (k − i)} Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b b

i j i′ j′

L L R R L L R R

nb nb

11/24

slide-20
SLIDE 20

Holub’s rule (d) – copying rules

if w[i..j] = w[i′..j′] = na for some a ∈ E and i − 1 ≤ k ≤ j then if w[k] ∈ L then L := L ∪ {i′ + (k − i)} if w[k] ∈ R then R := R ∪ {i′ + (k − i)} Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b b

i j i′ j′

L L R R L L R R

nb nb

Problem

This rule is hard to implement efficiently!

11/24

slide-21
SLIDE 21

Holub’s rule (e) – new immortals letters

if i < j, i ∈ L, j ∈ R then add α(w[(i + 1)..j]) to E — letter c ∈ w[(i + 1)..j] that has smallest number of occurrences in word w. Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b

i j

L R

12/24

slide-22
SLIDE 22

Holub’s algorithm summary

Theorem

Extending a correct triple (E, L, R) using any of the rules (a)-(e) leads to a correct triple. In particular, if any sequence of actions corresponding to (a)-(e) leads to E = Σ then w is morphically primitive.

13/24

slide-23
SLIDE 23

Holub’s algorithm summary

Theorem

Extending a correct triple (E, L, R) using any of the rules (a)-(e) leads to a correct triple. In particular, if any sequence of actions corresponding to (a)-(e) leads to E = Σ then w is morphically primitive. This is quite suprising that this set of simple rules, provides the solution for the problem.

13/24

slide-24
SLIDE 24

Holub’s algorithm summary

◮ simple implementation requires O(n2) time, ◮ this time complexity can be slightly improved using some

preprocessing and data structures,

◮ unfortunately the obtaining linear time seems to be difficult

task:

◮ the non-determinism in rules choice is problematic, ◮ rule (d) is the main bottleneck (it operates globally on the

word).

14/24

slide-25
SLIDE 25

What we have done? Outline

◮ modified set of rules (a),(b’)–(e’), that are equivalent to

Holub’s rules but are easier to implement,

◮ strict ordering of rules application, ◮ new data structures to speed up the processing time.

15/24

slide-26
SLIDE 26

What we have done? Outline

◮ modified set of rules (a),(b’)–(e’), that are equivalent to

Holub’s rules but are easier to implement,

◮ strict ordering of rules application, ◮ new data structures to speed up the processing time.

Result

As a consequence we obtained O(n) running time algorithm.

15/24

slide-27
SLIDE 27

New neighborhood definitions

We introduced new definitions of neighborhood, to capture essential local neighborhood of the characters/word positions. · · · · · ·

i R R R R R

e e1 e2

re

right(i)

γright(i) γright(e)

le

left(i)

γleft(i) γleft(e)

16/24

slide-28
SLIDE 28

New neighborhood definitions

le – the length of the longest common suffix of all prefixes ending with e (minus 1) in word w. re – the length of the longest common prefix of all suffixes starting with e (minus 1) in word w, · · · · · ·

i R R R R R

e e1 e2

re

right(i)

γright(i) γright(e)

le

left(i)

γleft(i) γleft(e)

16/24

slide-29
SLIDE 29

New neighborhood definitions

left(i) = min(lw[i], i − predE(i) − 1) right(i) = min(rw[i], succE(i) − i − 1) · · · · · ·

i R R R R R

e e1 e2

re

right(i)

γright(i) γright(e)

le

left(i)

γleft(i) γleft(e)

16/24

slide-30
SLIDE 30

New neighborhood definitions

γleft(i) = i − predR(i) − 1 γright(i) = predR(i + right(i) + 1) − i · · · · · ·

i R R R R R

e e1 e2

re

right(i)

γright(i) γright(e)

le

left(i)

γleft(i) γleft(e)

16/24

slide-31
SLIDE 31

New neighborhood definitions

γleft(e) = min{γleft(i) : i ∈ Occ(e)} γright(e) = max{γright(i) : i ∈ Occ(e)} · · · · · ·

i R R R R R

e e1 e2

re

right(i)

γright(i) γright(e)

le

left(i)

γleft(i) γleft(e)

16/24

slide-32
SLIDE 32

New rule (b’)

Old: if w[i] ∈ E then L := L ∪ {i − 1} and R := R ∪ {i}, New: if w[i] ∈ E then R := R ∪ {i}, Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b

i

R

17/24

slide-33
SLIDE 33

New rule (c’)

Old: if w[i..j] = nx for some x ∈ E then R := R ∪ {i − 1} and L := L ∪ {j}, New: if w[i] ∈ E then R := R ∪ {i − 1 − left(i)} and L := L ∪ {i + right(i)}, Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b

i

R L

left(i) right(i)

18/24

slide-34
SLIDE 34

New rule (d’)

Old: if w[i..j] = w[i′..j′] = na for some a ∈ E and i − 1 ≤ k ≤ j then if w[k] ∈ L then L := L ∪ {i′ + (k − i)} if w[k] ∈ R then R := R ∪ {i′ + (k − i)} New: if w[i] ∈ E then R := R ∪ {i − 1 − γleft(w[i]), i + γright(w[i])} Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b b

i

R R R R R R R R

γleft(b) γright(b) γleft(b) γright(b)

19/24

slide-35
SLIDE 35

New rule (e’)

Old: if i < j, i ∈ L, j ∈ R then add α(w[(i + 1)..j]) to E New: if i < j, succR(i) = j, predL(j) = i, {w[k] : i + 1 ≤ k ≤ j} ∩ E = ∅ then add α(w[(i + 1)..j]) to E — letter c ∈ w[(i + 1)..j] that has smallest number of occurrences in word w. Example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

c c a a b a a c a a a c a a b a a c a c b

i j

L R

20/24

slide-36
SLIDE 36

New rules correctness

Theorem

Extending a correct triple (E, L, R) using any of the rules (a),(b’)-(e’) leads to a correct triple. In particular, if any sequence

  • f actions corresponding to (a),(b’)-(e’) leads to E = Σ then w is

morphically primitive.

21/24

slide-37
SLIDE 37

New rules correctness

Theorem

Extending a correct triple (E, L, R) using any of the rules (a),(b’)-(e’) leads to a correct triple. In particular, if any sequence

  • f actions corresponding to (a),(b’)-(e’) leads to E = Σ then w is

morphically primitive.

Proof outline

We can show that using new rules we can simulate essential behavior of Holub’s algorithm.

21/24

slide-38
SLIDE 38

Efficient implementation

Unfortunately that’s not over, we have to deal with:

22/24

slide-39
SLIDE 39

Efficient implementation

Unfortunately that’s not over, we have to deal with: Non-determinism:

◮ This is resolved with events queues that handle the order of

application of the rules. Especially we have to be careful to apply rules only when they add new elements to E, L, R.

22/24

slide-40
SLIDE 40

Efficient implementation

Unfortunately that’s not over, we have to deal with: Non-determinism:

◮ This is resolved with events queues that handle the order of

application of the rules. Especially we have to be careful to apply rules only when they add new elements to E, L, R. Data structures:

◮ For answering α(i, j) queries in O(1) time we use

Range-Minimum-Queries (RMQ) data structure,

◮ For efficient computing the neighborhoods we use Suffix

Arrays combined with Longest Common Prefix table.

22/24

slide-41
SLIDE 41

Summary

◮ we presented a linear time algorithm for deciding if a word is

morphically imprimitive,

◮ we started from the original quadratic algorithm by Holub, and

transformed it by reducing the set of rules used by the algorithm,

◮ finally we proposed several efficient data structures that

enabled linear-time implementation.

23/24

slide-42
SLIDE 42

Thank you for your attention!

24/24