Efficient Data Structures for the Factor Periodicity Problem Tomasz - - PowerPoint PPT Presentation

efficient data structures for the factor periodicity
SMART_READER_LITE
LIVE PREVIEW

Efficient Data Structures for the Factor Periodicity Problem Tomasz - - PowerPoint PPT Presentation

Efficient Data Structures for the Factor Periodicity Problem Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Wale University of Warsaw, Poland SPIRE 2012 Cartagena, October 23, 2012 Tomasz Kociumaka Efficient Data Structures


slide-1
SLIDE 1

Efficient Data Structures for the Factor Periodicity Problem

Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Waleń

University of Warsaw, Poland

SPIRE 2012 Cartagena, October 23, 2012

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 1/15

slide-2
SLIDE 2

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-3
SLIDE 3

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-4
SLIDE 4

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-5
SLIDE 5

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b Periods of w[11..22] are 5

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-6
SLIDE 6

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b Periods of w[11..22] are 5, 10

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-7
SLIDE 7

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b Periods of w[11..22] are 5, 10, 11

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-8
SLIDE 8

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b Periods of w[11..22] are 5, 10, 11 and 12.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-9
SLIDE 9

Factor Periodicity Problem

w: a

a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b

11 22

a a a a a a a a b b b b Periods of w[11..22] are 5, 10, 11 and 12. Notation Per(w[11..22])={5, 10, 11, 12}, per(w[11..22]) = 5.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

slide-10
SLIDE 10

Arithmetic sets

A word of length m might have Θ(m) periods, e.g. am. Definition A set A = {a, a + d, a + 2d, . . . , a + kd} ⊆ Z is called

  • arithmetic. An integer d is called the difference of A.

Observe that an arithmetic set can be represented by three integers: a, d and k. Fact Let v be a word of length m. Then Per(v) is a union of at most log m disjoint arithmetic sets. For example Per(w[11..22]) = {5} ∪ {10, 11, 12} = {5, 10} ∪ {11, 12}.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 3/15

slide-11
SLIDE 11

Formal problem statement

Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 ≤ i ≤ j ≤ n) compute Per(w[i..j]) respresented as a union

  • f O(log |w|) arithmetic sets.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

slide-12
SLIDE 12

Formal problem statement

Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 ≤ i ≤ j ≤ n) compute Per(w[i..j]) respresented as a union

  • f O(log |w|) arithmetic sets.

Definition We say that p is an (1 + δ)-period of v if |v| ≥ (1 + δ)p.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

slide-13
SLIDE 13

Formal problem statement

Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 ≤ i ≤ j ≤ n) compute Per(w[i..j]) respresented as a union

  • f O(log |w|) arithmetic sets.

Definition We say that p is an (1 + δ)-period of v if |v| ≥ (1 + δ)p. Problem ((1 + δ)-Period Queries) Let us fix a real number δ > 0. Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 ≤ i ≤ j ≤ n) compute all (1 + δ)-periods

  • f w[i..j] respresented as a union of O(1) arithmetic sets.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

slide-14
SLIDE 14

Related work

To the best of our knowledge no previous research on the general case of Period Queries. Even for computing the maximal period, only straightforward solutions:

memorize all answers — O(n2) space, O(1) query time compute the answer from scratch for each query — no extra space, O(n) query time

Efficient data structures for primitivity testing (generalized by (1 + δ)-Period Queries with δ = 1)

Karhum¨ aki, Lifshits & Rytter; CPM 2007 O(n log n) space, O(1) query time, Crochemore et. al; SPIRE 2010 O(n logε n) space, O(log n) query time.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 5/15

slide-15
SLIDE 15

Our results

Several results based on the common idea but different tools. Space All periods (1 + δ)-periods O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n(log log n)2) O((log log n)2) O(n logε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/15

slide-16
SLIDE 16

Our results

Several results based on the common idea but different tools. Space All periods (1 + δ)-periods O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n(log log n)2) O((log log n)2) O(n logε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) Standard assumption on the model of computation integer alphabet, i.e. Σ ⊆

  • 0, 1, . . . , nO(1)

, word RAM model with w = Ω(log n), randomization.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/15

slide-17
SLIDE 17

Our approach

Let Borders(v) = {|u| : u is a border of v}. Fact Per(v) = |v| ⊖ Borders(v) = {|v| − b : b ∈ Borders(v)}.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

slide-18
SLIDE 18

Our approach

Let Borders(v) = {|u| : u is a border of v}. Fact Per(v) = |v| ⊖ Borders(v) = {|v| − b : b ∈ Borders(v)}. We compute Borders(v) ∩ {2k, . . . , 2k+1 − 1} separately for each k ∈ {0, . . . , ⌈log |v|⌉}. v

2k 2k − 1 2k − 1 2k

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

slide-19
SLIDE 19

Our approach

Let Borders(v) = {|u| : u is a border of v}. Fact Per(v) = |v| ⊖ Borders(v) = {|v| − b : b ∈ Borders(v)}. We compute Borders(v) ∩ {2k, . . . , 2k+1 − 1} separately for each k ∈ {0, . . . , ⌈log |v|⌉}. v

2k 2k − 1 2k − 1 2k

prefix suffix

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

slide-20
SLIDE 20

Our approach

Let Borders(v) = {|u| : u is a border of v}. Fact Per(v) = |v| ⊖ Borders(v) = {|v| − b : b ∈ Borders(v)}. We compute Borders(v) ∩ {2k, . . . , 2k+1 − 1} separately for each k ∈ {0, . . . , ⌈log |v|⌉}. v

2k 2k − 1 2k − 1 2k

prefix suffix border

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

slide-21
SLIDE 21

Our approach

Let Borders(v) = {|u| : u is a border of v}. Fact Per(v) = |v| ⊖ Borders(v) = {|v| − b : b ∈ Borders(v)}. We compute Borders(v) ∩ {2k, . . . , 2k+1 − 1} separately for each k ∈ {0, . . . , ⌈log |v|⌉}. v

2k 2k − 1 2k − 1 2k

prefix suffix border

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

slide-22
SLIDE 22

Close occurrences

Let Occ(u, v) be the set of positions of v where an occurrence

  • f u starts. Arithmetic sets naturally appear as the Occ sets.

Fact Let |v| ≤ 2|u|. Then Occ(u, v) is arithmetic. Moreover, if |Occ(u, v)| ≥ 3 then its difference is equal to per(u).

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

slide-23
SLIDE 23

Close occurrences

Let Occ(u, v) be the set of positions of v where an occurrence

  • f u starts. Arithmetic sets naturally appear as the Occ sets.

Fact Let |v| ≤ 2|u|. Then Occ(u, v) is arithmetic. Moreover, if |Occ(u, v)| ≥ 3 then its difference is equal to per(u). Case with |Occ(u, v)| ≤ 2 is trivial. Assume |Occ(u, v)| ≥ 2

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

slide-24
SLIDE 24

Close occurrences

Let Occ(u, v) be the set of positions of v where an occurrence

  • f u starts. Arithmetic sets naturally appear as the Occ sets.

Fact Let |v| ≤ 2|u|. Then Occ(u, v) is arithmetic. Moreover, if |Occ(u, v)| ≥ 3 then its difference is equal to per(u). Case with |Occ(u, v)| ≤ 2 is trivial. Assume |Occ(u, v)| ≥ 2 v u

periods of u

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

slide-25
SLIDE 25

Close occurrences

Let Occ(u, v) be the set of positions of v where an occurrence

  • f u starts. Arithmetic sets naturally appear as the Occ sets.

Fact Let |v| ≤ 2|u|. Then Occ(u, v) is arithmetic. Moreover, if |Occ(u, v)| ≥ 3 then its difference is equal to per(u). Case with |Occ(u, v)| ≤ 2 is trivial. Assume |Occ(u, v)| ≥ 2 v u

period of u

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

slide-26
SLIDE 26

A formula for border lengths

v

2k 2k − 1 2k − 1 2k

p p′ s s′

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/15

slide-27
SLIDE 27

A formula for border lengths

v

2k 2k − 1 2k − 1 2k

p p′ s s′ P = Occ(p, s′s)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/15

slide-28
SLIDE 28

A formula for border lengths

v

2k 2k − 1 2k − 1 2k

p p′ s s′ P = Occ(p, s′s) S = Occ(s, pp′)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/15

slide-29
SLIDE 29

A formula for border lengths

v

2k 2k − 1 2k − 1 2k

p p′ s s′ P = Occ(p, s′s) S = Occ(s, pp′) ℓ ℓ ℓ ℓ

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/15

slide-30
SLIDE 30

A formula for border lengths

v

2k 2k − 1 2k − 1 2k

p p′ s s′ P = Occ(p, s′s) S = Occ(s, pp′) ℓ ℓ ℓ ℓ

Fact Let 0 ≤ ℓ < 2k. Then the word v has a border of length 2k + ℓ if and only if ℓ + 1 ∈ S and |s′s| − ℓ ∈ P. Consequently Borders(v) ∩ {2k, . . . , 2k+1 − 1} is arithmetic.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/15

slide-31
SLIDE 31

Intersecting arithmetic sets

Intersecting to arithmetics set in general is performed using the extended Euclid’s algorithm, so takes O(log n) time. But if this sets share a common difference, constant time is sufficient.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/15

slide-32
SLIDE 32

Intersecting arithmetic sets

Intersecting to arithmetics set in general is performed using the extended Euclid’s algorithm, so takes O(log n) time. But if this sets share a common difference, constant time is sufficient. Lemma If |P| ≥ 3 and |S| ≥ 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/15

slide-33
SLIDE 33

Intersecting arithmetic sets

Intersecting to arithmetics set in general is performed using the extended Euclid’s algorithm, so takes O(log n) time. But if this sets share a common difference, constant time is sufficient. Lemma If |P| ≥ 3 and |S| ≥ 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v

p p′ s s′ P S

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/15

slide-34
SLIDE 34

Intersecting arithmetic sets

Intersecting to arithmetics set in general is performed using the extended Euclid’s algorithm, so takes O(log n) time. But if this sets share a common difference, constant time is sufficient. Lemma If |P| ≥ 3 and |S| ≥ 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v

p p′ s s′ P S ≥ 2per(s) ≥ 2per(p)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/15

slide-35
SLIDE 35

Intersecting arithmetic sets

Intersecting to arithmetics set in general is performed using the extended Euclid’s algorithm, so takes O(log n) time. But if this sets share a common difference, constant time is sufficient. Lemma If |P| ≥ 3 and |S| ≥ 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v

p p′ s s′ P S ≥ 2per(s) ≥ 2per(p)

  • f period both per(p) and per(s)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/15

slide-36
SLIDE 36

Summary of the combinatorial part

Problem (Occurrence Queries) Design a data structure that for a word w can answer the following queries. Given a basic factor u and a factor v of w such that |v| ≤ 2|u| (both represented by one of their

  • ccurrences) compute the arithmetic set Occ(u, v).

Theorem Assume there is a data structure answering the Occurrence Queries in O(f(n)) time. Then this data structure can answer Period Queries in O(f(n) log n) time and (1 + δ)-Period Queries is O(f(n)) time.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 11/15

slide-37
SLIDE 37

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j i′

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-38
SLIDE 38

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j i′

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-39
SLIDE 39

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j SUCC(u, i) i′

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-40
SLIDE 40

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j SUCC(u, i) i′ SUCC(u, i′ + 1)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-41
SLIDE 41

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j SUCC(u, i) i′ SUCC(u, i′ + 1) PRED(u, j)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-42
SLIDE 42

Range Predecessor/Successor Queries

Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an

  • ccurrence in w) and i ∈ {1 . . . n} find PRED(u, i) — the

last occurrence of u ending at a position ≤ i, SUCC(u, i) — the first occurrence of u starting at a position ≥ i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w v

i j SUCC(u, i) i′ SUCC(u, i′ + 1) PRED(u, j)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/15

slide-43
SLIDE 43

Simple solution

Recall that the Dictionary of Basic Factors is a data structure, which assigns integer identifiers to all basic factors of w. DBFk[i] = DBFk[j] ⇐ ⇒ w[i..i + 2k − 1] = w[j..j + 2k − 1].

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/15

slide-44
SLIDE 44

Simple solution

Recall that the Dictionary of Basic Factors is a data structure, which assigns integer identifiers to all basic factors of w. DBFk[i] = DBFk[j] ⇐ ⇒ w[i..i + 2k − 1] = w[j..j + 2k − 1]. For each k and identifier id we store a set Ak,id = {i : DBFk[i] = id}.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/15

slide-45
SLIDE 45

Simple solution

Recall that the Dictionary of Basic Factors is a data structure, which assigns integer identifiers to all basic factors of w. DBFk[i] = DBFk[j] ⇐ ⇒ w[i..i + 2k − 1] = w[j..j + 2k − 1]. For each k and identifier id we store a set Ak,id = {i : DBFk[i] = id}. If u = w[i..i + 2k − 1] and DBFk[i] = id, then Ak,id = Occ(u, w).

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/15

slide-46
SLIDE 46

Simple solution

Recall that the Dictionary of Basic Factors is a data structure, which assigns integer identifiers to all basic factors of w. DBFk[i] = DBFk[j] ⇐ ⇒ w[i..i + 2k − 1] = w[j..j + 2k − 1]. For each k and identifier id we store a set Ak,id = {i : DBFk[i] = id}. If u = w[i..i + 2k − 1] and DBFk[i] = id, then Ak,id = Occ(u, w). A single binary search in Ak,id allows to answer PRED(u, i) and SUCC(u, i).

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/15

slide-47
SLIDE 47

Simple solution

Recall that the Dictionary of Basic Factors is a data structure, which assigns integer identifiers to all basic factors of w. DBFk[i] = DBFk[j] ⇐ ⇒ w[i..i + 2k − 1] = w[j..j + 2k − 1]. For each k and identifier id we store a set Ak,id = {i : DBFk[i] = id}. If u = w[i..i + 2k − 1] and DBFk[i] = id, then Ak,id = Occ(u, w). A single binary search in Ak,id allows to answer PRED(u, i) and SUCC(u, i). Corollary There exists a (simple) data structure of O(n log n) size that answer the Occurrence Queries in O(log n) time, and consequently the Period Queries in O(log2 n) time.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/15

slide-48
SLIDE 48

O(1) query time solution

Fix 2k ≤ n and consider a basic factor u, |u| = 2k. Split w into fragments of length 2k+1 with overlaps of size 2k. w

2k+1 2k+1

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/15

slide-49
SLIDE 49

O(1) query time solution

Fix 2k ≤ n and consider a basic factor u, |u| = 2k. Split w into fragments of length 2k+1 with overlaps of size 2k. w

2k+1 2k+1

Each occurrence of u occurs within a fragment.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/15

slide-50
SLIDE 50

O(1) query time solution

Fix 2k ≤ n and consider a basic factor u, |u| = 2k. Split w into fragments of length 2k+1 with overlaps of size 2k. w

2k+1 2k+1

arithmetic sets

Each occurrence of u occurs within a fragment. Occurrences in single fragments form arithmetic sets.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/15

slide-51
SLIDE 51

O(1) query time solution

Imagine a (large) array indexed by identifiers and fragments. u′′ u′ . . . u

  • 0, 2k+1
  • 2k, 3 · 2k
  • 2k+1, 2k+2
  • m · 2k, n
  • .

. . . . . . . . . . .

. . . . . . . . . . . .

This array has Θ

  • n2

2k

  • cells.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/15

slide-52
SLIDE 52

O(1) query time solution

Imagine a (large) array indexed by identifiers and fragments. u′′ u′ . . . u

  • 0, 2k+1
  • 2k, 3 · 2k
  • 2k+1, 2k+2
  • m · 2k, n
  • .

. . . . . . . . . . .

. . . . . . . . . . . .

This array has Θ

  • n2

2k

  • cells.

All factors of a fixed length have ≤ n occurrences in total, so ≤ n non-empty fields — perfect hashing can be used.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/15

slide-53
SLIDE 53

O(1) query time solution

Imagine a (large) array indexed by identifiers and fragments. u′′ u′ . . . u

  • 0, 2k+1
  • 2k, 3 · 2k
  • 2k+1, 2k+2
  • m · 2k, n
  • .

. . . . . . . . . . .

. . . . . . . . . . . .

This array has Θ

  • n2

2k

  • cells.

All factors of a fixed length have ≤ n occurrences in total, so ≤ n non-empty fields — perfect hashing can be used. This gives O(n log n) in total for all values of k.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/15

slide-54
SLIDE 54

O(1) query time solution

Answering queries: w

2k+1 2k+1

v v lies within at most two consecutive fragments,

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/15

slide-55
SLIDE 55

O(1) query time solution

Answering queries: w

2k+1 2k+1

v u v lies within at most two consecutive fragments, get the occurrences of u from the hash table,

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/15

slide-56
SLIDE 56

O(1) query time solution

Answering queries: w

2k+1 2k+1

v u v lies within at most two consecutive fragments, get the occurrences of u from the hash table,

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/15

slide-57
SLIDE 57

O(1) query time solution

Answering queries: w

2k+1 2k+1

v u v lies within at most two consecutive fragments, get the occurrences of u from the hash table, crop and merge these arithmetic sets to obtain the result.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/15

slide-58
SLIDE 58

O(1) query time solution

Answering queries: w

2k+1 2k+1

v u v lies within at most two consecutive fragments, get the occurrences of u from the hash table, crop and merge these arithmetic sets to obtain the result. Corollary There exists a data structure of O(n log n) size that answers the Occurrence Queries in O(1) time.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/15

slide-59
SLIDE 59

Space-efficient solution

Theorem (Nekrich, Navarro; 2012) There exists data structures that given the locus of u in the suffix tree of w answer the Range Predecessor/Successor queries in and satisfy the following space and time bounds: Space Query time O(n) O(logε n) O(n log log n) O((log log n)2) O(n logε n) O(log log n) Theorem (Weighted LA — Kopelovitz, Lewenstein; 2007) There exists a data structure of size O(n), which given an interval [i..j] finds the locus of w[i..j] in the suffix tree of w in O(log log n) time.

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 17/15

slide-60
SLIDE 60

Summary

Theorem (this paper) There exists data structures that satisfy following time and space bounds for size, Period Queries query time and (1 + δ)-Period Queries query time: Space Period Queries (1 + δ)-Period Q. O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n(log log n)2) O((log log n)2) O(n logε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 18/15

slide-61
SLIDE 61

Further research

Currently in progress: Space Period Queries (1 + δ)-Period Q. O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n(log log n)2) O((log log n)2) O(n logε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1)

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 19/15

slide-62
SLIDE 62

Further research

Currently in progress: Space Period Queries 2-Period Queries O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n(log log n)2) O((log log n)2) O(n logε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) O(n) — O(1) Open problems: Can the O(n log n) time preprocessing improved with

  • (n) queries?

Can the maximum period be found faster than O(log n) with o(n2) space?

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 19/15

slide-63
SLIDE 63

Further research

Currently in progress: Space Period Queries 2-Period Queries O(n) O(log1+ε n) O(logε n) O(n log log n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) O(n) — O(1) Open problems: Can the O(n log n) time preprocessing improved with

  • (n) queries?

Can the maximum period be found faster than O(log n) with o(n2) space?

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 19/15

slide-64
SLIDE 64

Thank you

Thank you for your attention!

Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 20/15