Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, - - PowerPoint PPT Presentation

efficient counting of square substrings in a tree
SMART_READER_LITE
LIVE PREVIEW

Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, - - PowerPoint PPT Presentation

Efficient Counting of Square Substrings in a Tree Tomasz Kociumaka, Jakub Pachocki , Jakub Radoszewski, Wojciech Rytter, Tomasz Wale University of Warsaw ISAAC 2012 Taipei, December 19, 2012 Jakub Pachocki Efficient Counting of Square


slide-1
SLIDE 1

Efficient Counting of Square Substrings in a Tree

Tomasz Kociumaka, Jakub Pachocki, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

University of Warsaw

ISAAC 2012 Taipei, December 19, 2012

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 1/15

slide-2
SLIDE 2

Square

a a b b b b b b

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

slide-3
SLIDE 3

Square in a string

a a b b b b b b

a a a b b b

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

slide-4
SLIDE 4

Square in a tree

a a b b b b b b

a a a b b b

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

slide-5
SLIDE 5

Square in a tree

a a b b b b b b

a a a b b b a b a a b a ba b a b aa b a b a b a b a b a b a b b b a a b b a b a a b a b a a b a a b b a

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

slide-6
SLIDE 6

Square in a tree

a a b b b b b b

a a a b b b a b a a b a ba b a b aa b a b a b a b a b a b a b b b a a b b a b a a b a b a a b a a b b a

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 2/15

slide-7
SLIDE 7

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-8
SLIDE 8

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-9
SLIDE 9

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b a a

Squares in T: aa

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-10
SLIDE 10

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b a a

Squares in T: aa

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-11
SLIDE 11

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b a b a a b a

Squares in T: aa, abaaba

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-12
SLIDE 12

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b a b a a b a

Squares in T: aa, abaaba

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-13
SLIDE 13

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b b

Squares in T: aa, abaaba, bb

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-14
SLIDE 14

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b b

Squares in T: aa, abaaba, bb

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-15
SLIDE 15

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b

Squares in T: aa, abaaba, bb

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-16
SLIDE 16

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b

Squares in T: aa, abaaba, bb

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-17
SLIDE 17

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b c b c

Squares in T: aa, abaaba, bb, bcbc

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-18
SLIDE 18

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b b c b c

Squares in T: aa, abaaba, bb, bcbc, cbcb

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-19
SLIDE 19

Number of squares in a tree

We consider unrooted, unoriented trees with edges labeled by single letters. A substring of such a tree is the value of a simple path. T :

c a b a a b a b c b b

Squares in T: aa, abaaba, bb, bcbc, cbcb. There are 5 distinct squares, i.e. sq(T) = 5.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 3/15

slide-20
SLIDE 20

Previous findings

Theorem (Fraenkel & Simpson, 1998) A word of length n contains at most 2n squares. Theorem (Gusfield & Stoye, 2004) It is possible to compute the number of squares in a string in O(n) time.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 4/15

slide-21
SLIDE 21

Previous findings

Theorem (Fraenkel & Simpson, 1998) A word of length n contains at most 2n squares. Theorem (Gusfield & Stoye, 2004) It is possible to compute the number of squares in a string in O(n) time. Theorem (Crochemore et al., 2012) A tree of n nodes contains O(n4/3) squares. This bound is asymptotically tight.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 4/15

slide-22
SLIDE 22

Our result

Theorem (this paper) It is possible to compute an O(n log n)-sized representation of all the squares in a tree in O(n log2 n) time. The representation allows counting distinct squares in the tree. In this presentation, we assume that the trees are fully deterministic. That is, no two adjacent edges have the same label.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 5/15

slide-23
SLIDE 23

Counting squares in strings

Main-Lorentz algorithm. We split the string at the center, r:

val(v) r SUF[v] PREF[v] v suffix of val(v) suffix of val(v) prefix of val(v) prefix of val(v)

We need to efficiently compute: SUF PREF

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 6/15

slide-24
SLIDE 24

Packages

Definition A package is a substring s with an integer interval [l, r] describing cyclic shifts of s. We obtain O(n log n) (possibly intersecting) packages of squares. To remove duplicates efficiently, we group the strings with respect to cyclic equivalence. It is enough to know maxRot(s) for each s. We need to efficiently compute: SUF PREF maxRot

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 7/15

slide-25
SLIDE 25

Main-Lorentz algorithm for trees (1)

Similar approach. Rather than at the center, split at a centroid: r

T1 T2 T3 T4

. . .

Tk

|Ti| ≤ |T|

2

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 8/15

slide-26
SLIDE 26

Main-Lorentz algorithm for trees (2)

r v PREF[v] r v SUF[v] We need to efficiently compute SUF, PREF, and maxRot generalized to trees.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 9/15

slide-27
SLIDE 27

Computation of SUF

Theorem (Shibuya, 1999) The suffix tree ST of a labeled tree T can be computed in O(n) time.

r

a c a b b c b

T

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

slide-28
SLIDE 28

Computation of SUF

Theorem (Shibuya, 1999) The suffix tree ST of a labeled tree T can be computed in O(n) time.

r

a c a b b c b

T

r

a c a b c a b c a b

ST

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

slide-29
SLIDE 29

Computation of SUF

Theorem (Shibuya, 1999) The suffix tree ST of a labeled tree T can be computed in O(n) time.

r

a c a b b c b

T

r

a c a b c a b c a b

ST

r

a c a b b c a b c a b

T ∪ ST Forexample, SUF[acb] = (cb)R.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 10/15

slide-30
SLIDE 30

Computation of PREF (1)

PREF[v] = x if and only if: = c

r

c

v x

We extend Imre Simon’s automata to trees. The pair (x, c) is an essential transition if val(x)c has a nonempty border.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 11/15

slide-31
SLIDE 31

Computation of PREF (2)

Lemma In a deterministic tree of size n there are at most 2n − 1 essential transitions. At most n − 1 transitions in which an edge labeled c leaves node x. Every other transition fixes some value of PREF. Through enumerating all essential transitions, we can compute all values of PREF in linear time.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 12/15

slide-32
SLIDE 32

Computation of maxRots

We devise a general incremental algorithm for finding maximal rotations. It can be used to find maxRot of the path from r to v for all v. s t x maxSuf (sx) Call t a nonredundant suffix of s iff tx is the maximum suffix of sx, for some string x. maxRot(s) always starts at a nonredundant suffix.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 13/15

slide-33
SLIDE 33

Finding nonredundant suffixes

Every nonredundant suffix of s is a border of maxSuf (s). Lemma If t, r are borders of maxSuf (s) such that |r| < |t| ≤ 2|r| then r is a redundant suffix of s. Therefore, every string s has O(log |s|) nonredundant suffixes. We maintain a logarithmically-sized superset of them and update it in O(log |s|) time upon letter addition.

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 14/15

slide-34
SLIDE 34

Thank you Thank you for your attention!

Jakub Pachocki Efficient Counting of Square Substrings in a Tree 15/15