Circular Words in Finite and Infinite Sequences: Theory and - - PowerPoint PPT Presentation

circular words in finite and infinite sequences theory
SMART_READER_LITE
LIVE PREVIEW

Circular Words in Finite and Infinite Sequences: Theory and - - PowerPoint PPT Presentation

Circular Words in Finite and Infinite Sequences: Theory and Applications Marinella Sciortino University of Palermo, Italy AutoMathA 2015 Leipzig, May 6 9, 2015 M. Sciortino Circular Words in Finite and Infinite Sequences: Theory and


slide-1
SLIDE 1

Circular Words in Finite and Infinite Sequences: Theory and Applications

Marinella Sciortino

University of Palermo, Italy

AutoMathA 2015 Leipzig, May 6 – 9, 2015

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-2
SLIDE 2

Central topic of the talk

A circular word is an equivalence class under conjugation of a finite word.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-3
SLIDE 3

Central topic of the talk

A circular word is an equivalence class under conjugation of a finite word. Some preliminaries: Let Σ be a finite alphabet. Two finite words u,v ∈ Σ∗ are conjugate if there exist words w1,w2 such that u = w1w2 and v = w2w1. Example: the words ababba and babbaa are conjugate. The conjugacy relation (denoted by ∼) is an equivalence over Σ∗, whose classes are called conjugacy classes.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-4
SLIDE 4

Central topic of the talk

A circular word is an equivalence class under conjugation of a finite word. Some preliminaries: Let Σ be a finite alphabet. Two finite words u,v ∈ Σ∗ are conjugate if there exist words w1,w2 such that u = w1w2 and v = w2w1. Example: the words ababba and babbaa are conjugate. The conjugacy relation (denoted by ∼) is an equivalence over Σ∗, whose classes are called conjugacy classes. If Σ is a total ordered alphabet, a word w is Lyndon if it is the smallest conjugate (w.r.t lexicographic order) in its conjugacy class.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-5
SLIDE 5

Central topic of the talk

A circular word is an equivalence class under conjugation of a finite word. Some preliminaries: Let Σ be a finite alphabet. Two finite words u,v ∈ Σ∗ are conjugate if there exist words w1,w2 such that u = w1w2 and v = w2w1. Example: the words ababba and babbaa are conjugate. The conjugacy relation (denoted by ∼) is an equivalence over Σ∗, whose classes are called conjugacy classes. If Σ is a total ordered alphabet, a word w is Lyndon if it is the smallest conjugate (w.r.t lexicographic order) in its conjugacy class. We denote by (w) the circular word corresponding to all the conjugates of the word w. We say that the circular word is primitive if the word in the conjugacy class is primitive.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-6
SLIDE 6

Circular words or necklaces

A circular word is also called necklace and represented on a circle (read clockwise).

a a a a b a a a b b a a b a b a a b b b a b a b b a b b b b

Figure : The six primitive necklaces of length 5 on the alphabet {a,b}.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-7
SLIDE 7

Necklaces can represent real structures in several context

(Single or multiple) circular structure of DNA of viruses, bacteria, eukaryotic cells, and archaea

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-8
SLIDE 8

Necklaces can represent real structures in several context

(Single or multiple) circular structure of DNA of viruses, bacteria, eukaryotic cells, and archaea Figures in computational geometry Circular structures in astronomical data ...

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-9
SLIDE 9

Goals of the talk

Necklaces from several points of view: As a tool to characterize finite words To measure the complexity of infinite words To construct indexing structures for circular matching of a pattern in a text

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-10
SLIDE 10

How many primitive necklaces

Let τ(n,k) be the number of primitive necklaces of length n over a k-letters alphabet. Proposition (Witt’s Formula) The number of primitive necklaces of length n on k letters is τ(n,k) = 1 n ∑

d|n

µ(n/d)kd, where µ is the M¨

  • bius function defined by µ(1) = 1 and for n > 1

µ(n) =

  • (−1)i

if n is the product of i distinct prime numbers

  • therwise

Example Let Σ = {a,b}. The number of primitive necklaces of length n over the alpfabet Σ is: τ(n,2)=2,1,2,3,6,9,18,30,56,99,186,335,630,1161,2182,4080,...

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-11
SLIDE 11

How many necklaces

Let ν(n,k) be the number of necklaces of length n over a k-letters alphabet. Proposition The number of necklaces of length n on k letters is ν(n,k) = 1 n ∑

d|n

ϕ(n/d)kd. where ϕ is the Euler’s totient function. Example Let Σ = {a,b}. The number of necklaces of length n over the alpfabet Σ is: ν(n,2)=2,3,4,6,8,14,20,36,60,108,188,352,632,1182,2192,4116,...

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-12
SLIDE 12

Some combinatorial properties of necklaces

A finite word v ∈ Σ∗ is a factor of a necklace (w) if v occurs in some conjugate of w. A finite word u ∈ Σ∗ is a special factor of (w) if both ux and uy are factors of (w), with x,y ∈ Σ, x = y. A necklace (w) is called balanced if for each u,v factors of (w), with |u| = |v|, and for each a ∈ Σ one has that ||u|a −|v|a| ≤ 1. Example: (baab) is not balanced, (abaab) is balanced. Analogous definition of balanceness in a finite or infinite word.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-13
SLIDE 13

Some combinatorial properties of necklaces

A finite word v ∈ Σ∗ is a factor of a necklace (w) if v occurs in some conjugate of w. A finite word u ∈ Σ∗ is a special factor of (w) if both ux and uy are factors of (w), with x,y ∈ Σ, x = y. A necklace (w) is called balanced if for each u,v factors of (w), with |u| = |v|, and for each a ∈ Σ one has that ||u|a −|v|a| ≤ 1. Example: (baab) is not balanced, (abaab) is balanced. Analogous definition of balanceness in a finite or infinite word. Proposition (Borel and Reutenauer, 2006) Let w be a finite word of length n ≥ 2. The following statements are equivalent:

1

(w) is primitive;

2

for k = 0,...n −1 the necklace (w) has at least k +1 factors of length k;

3

(w) has n factors of length n −1.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-14
SLIDE 14

Sturmian Necklaces

A finite word w on a binary alphabet is called a Christoffel word if it is obtained by discretizing a segment in the lattice N×N. Given the pair of coprime integers p and q and the segment from the point 0,0 to the point p,q, the (lower) Christoffel word is

  • btained by considering the path under the segment and by coding

by a a horizontal step and by b a vertical step. Such words are conjugate of standard sturmian words (used to construct infinite Sturmian words).

a a a a a b b b

a

b a 5 8 a a b a a a a a a b b b b

a a

b a a 5 8

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-15
SLIDE 15

Sturmian Necklaces

A finite word w on a binary alphabet is called a Christoffel word if it is obtained by discretizing a segment in the lattice N×N. Given the pair of coprime integers p and q and the segment from the point 0,0 to the point p,q, the (lower) Christoffel word is

  • btained by considering the path under the segment and by coding

by a a horizontal step and by b a vertical step. Such words are conjugate of standard sturmian words (used to construct infinite Sturmian words).

a a a a a b b b

a

b a 5 8 a a b a a a a a a b b b b

a a

b a a 5 8

A necklace is Sturmian if some word in its conjugacy class is a Christoffel

  • word. For instance, (baaba) is a Sturmian necklace.
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-16
SLIDE 16

Combinatorial Properties of Sturmian Necklaces

Proposition (Jenkinson and Zamboni, 2004. Borel and Reutenauer, 2006) Let w be a word of length n ≥ 2. The following statements are equivalent:

1

(w) is a Sturmian necklace;

2

for k = 0,...n −1 the necklace (w) has exactly k +1 factors of length k;

3

(w) has n −1 factors of length n −2 and w is primitive;

4

(w) is balanced. Example Let us consider the Sturmian necklace (abaababaabaab). One can verify the properties.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-17
SLIDE 17

Combinatorial Properties of Sturmian Necklaces

Proposition (Castiglione, Restivo, S., 2009)

1

The necklace (w) is Sturmian if and only if for each k = 0,...,n −2 there exists a unique special factor of (w) of length k.

2

If v is a Christoffel word and v R its reverse, then (v) = (v R).

3

If (w) is a Sturmian necklace with |w|a > |w|b then either w = a or there exists an integer p > 0 such that (w) is a concatenation of bap and bap+1 (analogously if |w|a > |w|b, by exchanging a and b). Example Let us consider the Sturmian necklace (abaababaabaab). One can verify the properties.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-18
SLIDE 18

Necklaces and Finite Words

Let Σ be a finite alphabet. Let M be the family of multisets of primitive necklaces (circular words) of Σ∗. Theorem (Gessel and Reutenauer, 1993.) There exists a bijection between Σ∗ and M. Example Let Σ = {a,b,c}. ccbbbcacaaabba

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-19
SLIDE 19

Necklaces and Finite Words

Let Σ be a finite alphabet. Let M be the family of multisets of primitive necklaces (circular words) of Σ∗. Theorem (Gessel and Reutenauer, 1993.) There exists a bijection between Σ∗ and M. Example Let Σ = {a,b,c}. ccbbbcacaaabba ⇐ ⇒ a b a c b c a c b a c b a b

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-20
SLIDE 20

From an algorithmic point of view

The Gessel and Reutenauer bijection can be realized by the Extended Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-21
SLIDE 21

From an algorithmic point of view

The Gessel and Reutenauer bijection can be realized by the Extended Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005]. Sort all the conjugates of the words in S by the ω order relation: u ω v ⇐ ⇒ uω <lex vω where uω = uuuuu ··· and vω = vvvvv ···; S = {abac,bca,cbab,cba}.

a b a c a b ··· a b c a b c ··· a b c b a b ··· a c a b a c ··· a c b a c b ··· b a b c b a ··· b a c a b a ··· b a c b a c ··· b c a b c a ··· b c b a b c ··· c a b a c a ··· c a b c a b ··· c b a b c b ··· c b a c b a ···

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-22
SLIDE 22

From an algorithmic point of view

The Gessel and Reutenauer bijection can be realized by the Extended Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005]. Sort all the conjugates of the words in S by the ω order relation: u ω v ⇐ ⇒ uω <lex vω where uω = uuuuu ··· and vω = vvvvv ···; Consider the list of the sorted conjugates and take the word L

  • btained by concatenating the

last letter of each word; S = {abac,bca,cbab,cba}.

a b a c a b ··· a b c a b c ··· a b c b a b ··· a c a b a c ··· a c b a c b ··· b a b c b a ··· b a c a b a ··· b a c b a c ··· b c a b c a ··· b c b a b c ··· c a b a c a ··· c a b c a b ··· c b a b c b ··· c b a c b a ··· = ⇒ 1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-23
SLIDE 23

From an algorithmic point of view

The Gessel and Reutenauer bijection can be realized by the Extended Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005]. Sort all the conjugates of the words in S by the ω order relation: u ω v ⇐ ⇒ uω <lex vω where uω = uuuuu ··· and vω = vvvvv ···; Consider the list of the sorted conjugates and take the word L

  • btained by concatenating the

last letter of each word; Take the set I containing the positions of the words corresponding to the ones in S. S = {abac,bca,cbab,cba}.

a b a c a b ··· a b c a b c ··· a b c b a b ··· a c a b a c ··· a c b a c b ··· b a b c b a ··· b a c a b a ··· b a c b a c ··· b c a b c a ··· b c b a b c ··· c a b a c a ··· c a b c a b ··· c b a b c b ··· c b a c b a ··· = ⇒ → 1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c → 9 b c a 10 b c b a 11 c a b a 12 c a b → 13 c b a b → 14 c b a

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-24
SLIDE 24

From an algorithmic point of view

The Gessel and Reutenauer bijection can be realized by the Extended Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005]. Sort all the conjugates of the words in S by the ω order relation: u ω v ⇐ ⇒ uω <lex vω where uω = uuuuu ··· and vω = vvvvv ···; Consider the list of the sorted conjugates and take the word L

  • btained by concatenating the

last letter of each word; Take the set I containing the positions of the words corresponding to the ones in S. S = {abac,bca,cbab,cba}.

a b a c a b ··· a b c a b c ··· a b c b a b ··· a c a b a c ··· a c b a c b ··· b a b c b a ··· b a c a b a ··· b a c b a c ··· b c a b c a ··· b c b a b c ··· c a b a c a ··· c a b c a b ··· c b a b c b ··· c b a c b a ··· = ⇒ → 1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c → 9 b c a 10 b c b a 11 c a b a 12 c a b → 13 c b a b → 14 c b a

Output: EBWT(S) = L = ccbbbcacaaabba and I = {1,9,13,14}.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-25
SLIDE 25

Properties and Reversibility

Example: L = ccbbbcacaaabba and I = {1,9,13,14}.

◮ The last character of each word

wj is L[Ij];

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-26
SLIDE 26

Properties and Reversibility

Example: L = ccbbbcacaaabba and I = {1,9,13,14}.

◮ The last character of each word

wj is L[Ij];

◮ For each character z, the i-th

  • ccurrence of z in L

corresponds to the i-th

  • ccurrence of z in F;

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-27
SLIDE 27

Properties and Reversibility

Example: L = ccbbbcacaaabba and I = {1,9,13,14}.

◮ The last character of each word

wj is L[Ij];

◮ For each character z, the i-th

  • ccurrence of z in L

corresponds to the i-th

  • ccurrence of z in F;

◮ In any row i = I , the character

F[i] follows L[i] in a word in S.

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-28
SLIDE 28

Properties and Reversibility

Example: L = ccbbbcacaaabba and I = {1,9,13,14}.

◮ The last character of each word

wj is L[Ij];

◮ For each character z, the i-th

  • ccurrence of z in L

corresponds to the i-th

  • ccurrence of z in F;

◮ In any row i = I , the character

F[i] follows L[i] in a word in S.

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a π =

  • 1

2 3 4 5 6 7 8 9 10 11 12 13 14 11 12 6 7 8 13 1 14 2 3 4 9 10 5

  • = ( 11 4 7 1 )( 9 2 12 )( 13 10 3 6 )( 14 5 8 )
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-29
SLIDE 29

Properties and Reversibility

Example: L = ccbbbcacaaabba and I = {1,9,13,14}.

◮ The last character of each word

wj is L[Ij];

◮ For each character z, the i-th

  • ccurrence of z in L

corresponds to the i-th

  • ccurrence of z in F;

◮ In any row i = I , the character

F[i] follows L[i] in a word in S.

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a π =

  • 1

2 3 4 5 6 7 8 9 10 11 12 13 14 11 12 6 7 8 13 1 14 2 3 4 9 10 5

  • = ( 11 4 7 1 )( 9 2 12 )( 13 10 3 6 )( 14 5 8 )

So, we can recover each word of the multiset S = {abac,bca,cbab,cba}.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-30
SLIDE 30

Multipurpose EBWT

If we don’t care about the indices, then EBWT : M − → Σ∗ (where M is the family of multisets of primitive necklaces of Σ∗) is the Gessel-Reutenauer bijection.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-31
SLIDE 31

Multipurpose EBWT

If we don’t care about the indices, then EBWT : M − → Σ∗ (where M is the family of multisets of primitive necklaces of Σ∗) is the Gessel-Reutenauer bijection. EBWT has been used as a tool to investigate the combinatorial properties of finite words by the multiset of necklaces that are inverse image via EBWT

Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-32
SLIDE 32

Multipurpose EBWT

If we don’t care about the indices, then EBWT : M − → Σ∗ (where M is the family of multisets of primitive necklaces of Σ∗) is the Gessel-Reutenauer bijection. EBWT has been used as a tool to investigate the combinatorial properties of finite words by the multiset of necklaces that are inverse image via EBWT

Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.

EBWT as combinatorial preprocessing to compress large-scale DNA sequence collections

Cox, Bauer, Jakobi, and Rosone, 2012. Janin, Rosone, and Cox, 2014.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-33
SLIDE 33

Multipurpose EBWT

If we don’t care about the indices, then EBWT : M − → Σ∗ (where M is the family of multisets of primitive necklaces of Σ∗) is the Gessel-Reutenauer bijection. EBWT has been used as a tool to investigate the combinatorial properties of finite words by the multiset of necklaces that are inverse image via EBWT

Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.

EBWT as combinatorial preprocessing to compress large-scale DNA sequence collections

Cox, Bauer, Jakobi, and Rosone, 2012. Janin, Rosone, and Cox, 2014.

EBWT as a combinatorial tool to compare necklaces and, more in general, biological sequences

Mantaci, Restivo, Rosone and S. 2008. Yang, Chang, Zhang, and Wang,

  • 2010. Yang, Zhang, and Wang, 2010.
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-34
SLIDE 34

Sorting of the conjugates

1 a b a c 2 a b c 3 a b c b 4 a c a b 5 a c b 6 b a b c 7 b a c a 8 b a c 9 b c a 10 b c b a 11 c a b a 12 c a b 13 c b a b 14 c b a

Sorting the conjugates of each word of the multiset in according to ω order is the bottleneck of the algorithm. Mantaci, Restivo, Rosone and S., 2007 - Use a periodicity theorem to reduce the number of comparisons. Hon, Ku, Lu, Shah and Thankachan, 2011 - A O(nlogn) algorithm is provided, where n denotes the total length of the words in S. Linear time algorithm? Open Question! Gessel, Restivo and Reutenauer, 2012 - Other bijection with different order, but not computationally simpler.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-35
SLIDE 35

Necklaces producing clustered images via EBWT

A multiset of necklaces W over an ordered alphabet Σ = {a1,a2,...,ak} with a1 < a2 < ... < ak, has a simple EBWT, if EBWT(W ) is of the form ank

k ank−1 k−1 ···an1 1 , for some positive integers n1,n2,...,nk.

Example If W = {(acbcbcadad)}, then EBWT(W ) = ddcccbbaaa.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-36
SLIDE 36

Necklaces producing clustered images via EBWT

A multiset of necklaces W over an ordered alphabet Σ = {a1,a2,...,ak} with a1 < a2 < ... < ak, has a simple EBWT, if EBWT(W ) is of the form ank

k ank−1 k−1 ···an1 1 , for some positive integers n1,n2,...,nk.

Example If W = {(acbcbcadad)}, then EBWT(W ) = ddcccbbaaa. Theorem (Mantaci, Restivo and S., 2003) If Σ is a binary alphabet, EBWT(W ) = bpaq (with gcd(p,q) = k) if and

  • nly if W is a Sturmian necklace (with multiplicity k).

Example If W = {(abaab),(abaab)}, then EBWT(W ) = bbbbaaaaaa.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-37
SLIDE 37

Necklaces producing clustered images via EBWT

A multiset of necklaces W over an ordered alphabet Σ = {a1,a2,...,ak} with a1 < a2 < ... < ak, has a simple EBWT, if EBWT(W ) is of the form ank

k ank−1 k−1 ···an1 1 , for some positive integers n1,n2,...,nk.

Example If W = {(acbcbcadad)}, then EBWT(W ) = ddcccbbaaa. Theorem (Mantaci, Restivo and S., 2003) If Σ is a binary alphabet, EBWT(W ) = bpaq (with gcd(p,q) = k) if and

  • nly if W is a Sturmian necklace (with multiplicity k).

Example If W = {(abaab),(abaab)}, then EBWT(W ) = bbbbaaaaaa. In alphabets with more than two letters, this result does not hold [Restivo and Rosone, 2009].

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-38
SLIDE 38

Necklaces producing clustered images via EBWT

Open Question To give a combinatorial characterization of the multisets W of primitive necklaces such that EBWT(W ) is of the form an1

σ(1)an2 σ(1) ···ank σ(k), for

some permutation σ = id and some positive integers n1,n2,...,nk

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-39
SLIDE 39

Necklaces producing clustered images via EBWT

Open Question To give a combinatorial characterization of the multisets W of primitive necklaces such that EBWT(W ) is of the form an1

σ(1)an2 σ(1) ···ank σ(k), for

some permutation σ = id and some positive integers n1,n2,...,nk Some partial results when W contains just one necklace: A single necklace produces a clustering effect if and only if it occurs in some discrete interval exchange transformations [Ferenczi and Zamboni , 2013]. In case of ternary alphabet, Σ = {a1,a2,a3}, EBWT(u) = an3

3 an2 2 an1 1 ,

if and only if (n1,n2,n3) is a triple of integers satisfying both the conditions gcd(n1,n2,n3) = 1 and gcd(n1 +n2,n2 +n3) = 1. [Simpson and Puglisi, 2008, Pak and Redlich, 2008] This result cannot be extended to larger alphabet. Example (cccbbaaaaaaaaaaaaa) ⇔ {(acaa)(acaa)(acaa)(aba)(aba)}.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-40
SLIDE 40

Other extremal cases of EBWT images

Σ the alphabet of cardinality k, Γ the set of all k! products of distinct elements of Σ: For instance, for Σ = {a,b,c}, Γ = {abc,acb,bac,bca,cab,cba}.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-41
SLIDE 41

Other extremal cases of EBWT images

Σ the alphabet of cardinality k, Γ the set of all k! products of distinct elements of Σ: For instance, for Σ = {a,b,c}, Γ = {abc,acb,bac,bca,cab,cba}. Theorem (Higgins, 2012) Let W a multiset of necklaces. Then EBWT(W ) ∈ Γkn−1 if and only if W is a de Bruijn set of span n. A multiset W = {s1,s2,...,sm} of necklaces is a de Bruijn set of span n

  • ver an alphabet Σ if |s1|+|s2|+...+|sm| = |Σ|n and every word u ∈ Σn

is a prefix of some power of some word in a necklace of W . Example Let Σ = {a,b} with a < b. Then Γ = {ab,ba}. Let n = 4, and consider the word v = baabbabaabababba ∈ Γ8. EBWT −1(v) = {(baaaaba),(baabbbbab)} is a de Bruijn set of span 4.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-42
SLIDE 42

EBWT and de Brujin Word

Let us denote by α the element a1a2 ···ak ∈ Γ. Consider the special case where EBWT(W ) is a power of α.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-43
SLIDE 43

EBWT and de Brujin Word

Let us denote by α the element a1a2 ···ak ∈ Γ. Consider the special case where EBWT(W ) is a power of α. Theorem (Perrin and Restivo, 2015) Let v = αkn−1 and let S = EBWT −1(v), then S is the set of necklaces of the Lyndon words of length dividing n. Example Let Σ = {a,b} with a < b, and the word α23 = (ab)8. Let S = EBWT −1((ab)8), then S = {(a),(aaab),(aabb),(ab),(abbb),(b)}, which is the set of necklaces of the Lyndon words of length dividing 4. Note that if we consider the concatenation of such Lyndon words, we

  • btain the word

a.aaab.aabb.ab.abbb.b which is the first de Bruijn word of order 4 in the lexicographic order. This fact is proved by the well known theorem of Fredricksen and Maiorana (1978).

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-44
SLIDE 44

Comparing Necklaces

The transformation EBWT is used in order to define an alignment-free method for comparing necklaces or sequences. For instance, let S = {u = ababccb,v = ababccc}. Then EBWT(S) = bcbbcaaaacccbb.

Sorted conjugates EBWT ababccb b ababccc c abccbab b abcccab b bababcc c babccba a babccca a bccbaba a bcccaba a cababcc c cbababc c ccababc c ccbabab b cccabab b

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-45
SLIDE 45

Comparing Necklaces

The transformation EBWT is used in order to define an alignment-free method for comparing necklaces or sequences. For instance, let S = {u = ababccb,v = ababccc}. Then EBWT(S) = bcbbcaaaacccbb.

Sorted conjugates EBWT ρ(u,v) ababccb b 1 ababccc c 1 abccbab b abcccab b bababcc c 1 babccba a babccca a bccbaba a bcccaba a cababcc c 1 cbababc c ccababc c ccbabab b cccabab b ρ(u,v) =

k

i=1

|ci(u)−ci(v)| = 4

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-46
SLIDE 46

Comparing Necklaces

Application to the whole mitochondrial genome phylogeny of mammals.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-47
SLIDE 47

Comparing Necklaces

Open question Does there exist a EBWT-based similarity measure that approximates the block edit distance between two words or necklaces? It is a distance that, given two words u and v measures the minimum number of block edit operations (block copying, deletion and relocation) needed to transform u into v. The computation of the block edit distance is a NP-complete problem.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-48
SLIDE 48

Necklaces to investigate infinite words

Necklaces have been recently used to define a new complexity measure for infinite words Extension of Morse-Hedlund Theorem. Particular behaviour for Sturmian Words.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-49
SLIDE 49

Periodic and aperiodic words

Given an infinite word ω = ω1ω2 ··· over a finite alphabet Σ, we say that:

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-50
SLIDE 50

Periodic and aperiodic words

Given an infinite word ω = ω1ω2 ··· over a finite alphabet Σ, we say that: ω is (purely) periodic if there exists a positive integer p such that ωi+p = ωi for all indices i. ω

p

u

p

u

p

u

p

u

p

u

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-51
SLIDE 51

Periodic and aperiodic words

Given an infinite word ω = ω1ω2 ··· over a finite alphabet Σ, we say that: ω is (purely) periodic if there exists a positive integer p such that ωi+p = ωi for all indices i. ω

p

u

p

u

p

u

p

u

p

u A word ω is ultimately periodic if ωi+p = ωi for all sufficiently large i, i.e. ω = vuuuuuu ···.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-52
SLIDE 52

Periodic and aperiodic words

Given an infinite word ω = ω1ω2 ··· over a finite alphabet Σ, we say that: ω is (purely) periodic if there exists a positive integer p such that ωi+p = ωi for all indices i. ω

p

u

p

u

p

u

p

u

p

u A word ω is ultimately periodic if ωi+p = ωi for all sufficiently large i, i.e. ω = vuuuuuu ···. A word ω is called aperiodic if it is not ultimately periodic.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-53
SLIDE 53

Look at an infinite word with sliding finite windows

Fact(ω) denotes the set of factors, i.e. all finite word that occurs within ω. It can be used to describe the complexity of ω

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-54
SLIDE 54

Look at an infinite word with sliding finite windows

Fact(ω) denotes the set of factors, i.e. all finite word that occurs within ω. It can be used to describe the complexity of ω The Parikh vector of a factor u ∈ Σ∗ (denoted by PV (u)) is the vector whose i-th component is the number of occurrences in u of ith letter of the alphabet Σ. Example: the Parikh vector of ababb is (2,3).

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-55
SLIDE 55

Look at an infinite word with sliding finite windows

Fact(ω) denotes the set of factors, i.e. all finite word that occurs within ω. It can be used to describe the complexity of ω The Parikh vector of a factor u ∈ Σ∗ (denoted by PV (u)) is the vector whose i-th component is the number of occurrences in u of ith letter of the alphabet Σ. Example: the Parikh vector of ababb is (2,3). A necklace (u) occurs in a word ω if some conjugate of u appears in ω as factor,

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-56
SLIDE 56

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-57
SLIDE 57

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-58
SLIDE 58

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-59
SLIDE 59

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is 0 .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-60
SLIDE 60

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is 01 .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-61
SLIDE 61

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is 0110 .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-62
SLIDE 62

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is 011011 .

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-63
SLIDE 63

A classical measure of complexity: the factor complexity

Definition The factor complexity of a word ω is the function pω(n) = |Fact(ω)∩An|, i.e., the function that counts the number of distinct factors of length n of ω, for every n ≥ 0. Example (Maximal Factor Complexity) An example of word achieving maximal factor complexity over an alphabet of size k > 1 is the word that can be obtained by concatenating the k-ary expansions of non-negative integers. For example, if k = 2, the word is 0110111001011101111000···.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-64
SLIDE 64

Morse and Hedlund Theorem

Factor complexity allows one to describe the exact borderline between periodicity and aperiodicity of words. Theorem (Morse and Hedlund, 1938) A word ω is ultimately periodic iff its factor complexity pω(n) is bounded, or equivalently pω(n) ≤ n for some n ≥ 1. Morse and Hedlund theorem is a fundamental tool in the study of discrete systems. Generalizations to higher dimensions have been studied.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-65
SLIDE 65

Morse and Hedlund Theorem

Factor complexity allows one to describe the exact borderline between periodicity and aperiodicity of words. Theorem (Morse and Hedlund, 1938) A word ω is ultimately periodic iff its factor complexity pω(n) is bounded, or equivalently pω(n) ≤ n for some n ≥ 1. Morse and Hedlund theorem is a fundamental tool in the study of discrete systems. Generalizations to higher dimensions have been studied. Example Let us consider the periodic word ω = aabaabaabaabaab···. The complexity function is:

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-66
SLIDE 66

Morse and Hedlund Theorem

Factor complexity allows one to describe the exact borderline between periodicity and aperiodicity of words. Theorem (Morse and Hedlund, 1938) A word ω is ultimately periodic iff its factor complexity pω(n) is bounded, or equivalently pω(n) ≤ n for some n ≥ 1. Morse and Hedlund theorem is a fundamental tool in the study of discrete systems. Generalizations to higher dimensions have been studied. Example Let us consider the periodic word ω = aabaabaabaabaab···. The complexity function is: pω(0) = 1, as Fact(ω)∩A0 = {ε},

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-67
SLIDE 67

Morse and Hedlund Theorem

Factor complexity allows one to describe the exact borderline between periodicity and aperiodicity of words. Theorem (Morse and Hedlund, 1938) A word ω is ultimately periodic iff its factor complexity pω(n) is bounded, or equivalently pω(n) ≤ n for some n ≥ 1. Morse and Hedlund theorem is a fundamental tool in the study of discrete systems. Generalizations to higher dimensions have been studied. Example Let us consider the periodic word ω = aabaabaabaabaab···. The complexity function is: pω(0) = 1, as Fact(ω)∩A0 = {ε}, pω(1) = 2, as Fact(ω)∩A1 = {a,b},

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-68
SLIDE 68

Morse and Hedlund Theorem

Factor complexity allows one to describe the exact borderline between periodicity and aperiodicity of words. Theorem (Morse and Hedlund, 1938) A word ω is ultimately periodic iff its factor complexity pω(n) is bounded, or equivalently pω(n) ≤ n for some n ≥ 1. Morse and Hedlund theorem is a fundamental tool in the study of discrete systems. Generalizations to higher dimensions have been studied. Example Let us consider the periodic word ω = aabaabaabaabaab···. The complexity function is: pω(0) = 1, as Fact(ω)∩A0 = {ε}, pω(1) = 2, as Fact(ω)∩A1 = {a,b}, pω(n) = 3, n ≥ 2, as Fact(ω)∩An contains exactly one word beginning with aa, one word beginning with ab, and one word beginning with ba.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-69
SLIDE 69

Other complexity measures

Two finite words u,v are abelian equivalent (denoted u ≈ v) if they have the same Parikh vector. Example: the words aababba and babaaba are abelian equivalent with Parikh vector (4,3). ≈ is an equivalence relation over A∗.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-70
SLIDE 70

Other complexity measures

Two finite words u,v are abelian equivalent (denoted u ≈ v) if they have the same Parikh vector. Example: the words aababba and babaaba are abelian equivalent with Parikh vector (4,3). ≈ is an equivalence relation over A∗. Definition The abelian complexity of a word ω is the function aω(n) = |{PV (u) | u ∈ Fact(ω)∩An}|, i.e., the function that counts the number of ≈-equivalence classes of factors of length n of ω, for every n ≥ 0.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-71
SLIDE 71

Other complexity measures

Two finite words u,v are abelian equivalent (denoted u ≈ v) if they have the same Parikh vector. Example: the words aababba and babaaba are abelian equivalent with Parikh vector (4,3). ≈ is an equivalence relation over A∗. Definition The abelian complexity of a word ω is the function aω(n) = |{PV (u) | u ∈ Fact(ω)∩An}|, i.e., the function that counts the number of ≈-equivalence classes of factors of length n of ω, for every n ≥ 0. (Coven and Hedlund, 1973) There exist aperiodic words with bounded abelian complexity.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-72
SLIDE 72

The cyclic complexity

It means counting the distinct necklaces occurring in an infinite word.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-73
SLIDE 73

The cyclic complexity

It means counting the distinct necklaces occurring in an infinite word. Definition The cyclic complexity of a word ω is the function cω(n) = |{(u) | u ∈ Fact(ω)∩Σn}|, i.e., the function that counts the number of distinct conjugacy classes of factors of length n of ω, for every n ≥ 0.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-74
SLIDE 74

Some Properties of the Cyclic Complexity

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-75
SLIDE 75

Some Properties of the Cyclic Complexity

Remark Given a word ω, aω(n) ≤ cω(n) ≤ pω(n) for every n.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-76
SLIDE 76

Some Properties of the Cyclic Complexity

Remark Given a word ω, aω(n) ≤ cω(n) ≤ pω(n) for every n. Proposition A word has maximal cyclic complexity if and only if it has maximal factor complexity.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-77
SLIDE 77

Example

Let us consider the periodic word ω = aabaabaabaabaab···. In the figure, the functions cω and pω are depicted.

0,5 1 1,5 2 2,5 3 3,5 1 2 3 4 5 6 7 8 9 10 c p

Note that cω(3) = 1 since aab, aba and baa are conjugate one to each

  • ther.
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-78
SLIDE 78

Cyclic complexity distinguishes periodicity and aperiodicity

Extension of Morse–Hedlund Theorem: Theorem (Cassaigne, Fici, S. and Zamboni, 2014) A word ω is ultimately periodic if and only if it has bounded cyclic complexity.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-79
SLIDE 79

Aperiodic words with minimal factor complexity

Factor complexity provides a characterization for an important class of binary words, the so-called Sturmian words.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-80
SLIDE 80

Aperiodic words with minimal factor complexity

Factor complexity provides a characterization for an important class of binary words, the so-called Sturmian words. (Consequence of Morse–Hedlund Theorem) If a word ω is aperiodic then pω(n) ≥ n +1 for every n ≥ 1. Definition A word ω is Sturmian if and only if pω(n) = n +1 for all n ≥ 1.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-81
SLIDE 81

Aperiodic words with minimal factor complexity

Factor complexity provides a characterization for an important class of binary words, the so-called Sturmian words. (Consequence of Morse–Hedlund Theorem) If a word ω is aperiodic then pω(n) ≥ n +1 for every n ≥ 1. Definition A word ω is Sturmian if and only if pω(n) = n +1 for all n ≥ 1. Sturmian words have very well-known combinatorial properties, for example: Proposition A word x is Sturmian if and only it is balanced and aperiodic.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-82
SLIDE 82

Sturmian words: a geometrical construction

A Sturmian word can be defined by considering the intersections with a squared-lattice of a semi-line having a slope which is an irrational number α, for instance the straight line y = αx.

a a a a a a b b b b

≈ 0,618034

Write b (resp. a) for every intersection with a horizontal (resp. vertical)

  • line. The infinite sequence so obtained is a Sturmian word of slope α.
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-83
SLIDE 83

Sturmian words: a geometrical construction

A Sturmian word can be defined by considering the intersections with a squared-lattice of a semi-line having a slope which is an irrational number α, for instance the straight line y = αx.

a a a a a a b b b b

≈ 0,618034

Write b (resp. a) for every intersection with a horizontal (resp. vertical)

  • line. The infinite sequence so obtained is a Sturmian word of slope α.

The word in the figure is the Fibonacci word obtained by the slope α = φ −1, where φ = (1+ √ 5)/2 is the golden ratio.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-84
SLIDE 84

Slope and factors of a Sturmian words

An important property of Sturmian words is that their factors depend on their slope only. Proposition (Morse and Hedlund, 1938) Let x,y be two Sturmian words. Then Fact(x) = Fact(y) if and only if x and y have the same slope.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-85
SLIDE 85

Sturmian words with different slopes

Remark By definition, factor complexity cannot distinguish Sturmian words with different factors (all have n +1 factors of length n).

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-86
SLIDE 86

Sturmian words with different slopes

Remark By definition, factor complexity cannot distinguish Sturmian words with different factors (all have n +1 factors of length n). Question What about cyclic complexity?

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-87
SLIDE 87

Cyclic complexity distinguishes between Sturmian words with different languages

Theorem (Cassaigne, Fici, S. and Zamboni, 2014) Let x be a Sturmian word. If a word y has the same cyclic complexity as x then, up to renaming letters, y is a Sturmian word having the same slope of x.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-88
SLIDE 88

Cyclic complexity distinguishes between Sturmian words with different languages

Theorem (Cassaigne, Fici, S. and Zamboni, 2014) Let x be a Sturmian word. If a word y has the same cyclic complexity as x then, up to renaming letters, y is a Sturmian word having the same slope of x. Remark That is, not only two Sturmian words with different languages of factors cannot have the same cyclic complexity, but the only words which have the same cyclic complexity of a Sturmian word x are those Sturmian words with the same slope of x.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-89
SLIDE 89

Cyclic complexity and Sturmian words

The cyclic complexity of Sturmian words is unbounded, but it takes value 2 for infinitely many n. Proposition Let x be a Sturmian word. Then cx(n) = 2 if and only if n = 1 or there exists a bispecial factor of x of length n −2. [A factor of a binary word u is bispecial if ua, ub, au, bu are still factors.]

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-90
SLIDE 90

Cyclic complexity and Sturmian words

The cyclic complexity of Sturmian words is unbounded, but it takes value 2 for infinitely many n. Proposition Let x be a Sturmian word. Then cx(n) = 2 if and only if n = 1 or there exists a bispecial factor of x of length n −2. [A factor of a binary word u is bispecial if ua, ub, au, bu are still factors.] Example The cyclic complexity of Fibonacci word F = abaababaabaababaab···.

2 4 6 8 10 12 14 16 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 c_F

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-91
SLIDE 91

Further works on Cyclic Complexity

Aperiodic words The value 2 for the cyclic complexity is the minimal possible for aperiodic words. Cyclic complexity and Languages

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-92
SLIDE 92

Further works on Cyclic Complexity

Aperiodic words The value 2 for the cyclic complexity is the minimal possible for aperiodic words. Sturmian words have minimal cyclic complexity but there exist non-Sturmian aperiodic words with minimal cyclic complexity. Cyclic complexity and Languages

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-93
SLIDE 93

Further works on Cyclic Complexity

Aperiodic words The value 2 for the cyclic complexity is the minimal possible for aperiodic words. Sturmian words have minimal cyclic complexity but there exist non-Sturmian aperiodic words with minimal cyclic complexity. Characterize the aperiodic words with minimal cyclic complexity is still open. Cyclic complexity and Languages

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-94
SLIDE 94

Further works on Cyclic Complexity

Aperiodic words The value 2 for the cyclic complexity is the minimal possible for aperiodic words. Sturmian words have minimal cyclic complexity but there exist non-Sturmian aperiodic words with minimal cyclic complexity. Characterize the aperiodic words with minimal cyclic complexity is still open. Cyclic complexity and Languages The cyclic complexity can be naturally extended to any factorial

  • language. The cyclic complexity is an invariant for several operations
  • n languages, i.e. isomorphism and reverse image.
  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-95
SLIDE 95

Further works on Cyclic Complexity

Aperiodic words The value 2 for the cyclic complexity is the minimal possible for aperiodic words. Sturmian words have minimal cyclic complexity but there exist non-Sturmian aperiodic words with minimal cyclic complexity. Characterize the aperiodic words with minimal cyclic complexity is still open. Cyclic complexity and Languages The cyclic complexity can be naturally extended to any factorial

  • language. The cyclic complexity is an invariant for several operations
  • n languages, i.e. isomorphism and reverse image.

If x and y have the same cyclic complexity, what say about their languages of factors? There exist two periodic words having same cyclic complexity but whose languages of factors are not isomorphic nor related by reverse image. Even two aperiodic words can have same cyclic complexity but different languages of factors. Which additional hypothesis (linear complexity for instance) is needed?

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-96
SLIDE 96

Necklaces and Pattern Matching Problems

Two different (but related) problems: Circular Pattern Matching Circular Dictionary Matching

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-97
SLIDE 97

Circular Pattern Matching Problem

Definition Let Σ a finite alphabet of size σ. Given a text T ∈ Σn, find all the

  • ccurrences of all conjugates of a pattern P ∈ Σm in T.

O(n +m) - Gusfield, 1997. (nlogσ) - Lin and Adjeroh, 2012. sublinear - Chen, Huang, and Lee, 2012. ( nlogσ m

m

) on average - Fredriksson and Grabowski, 2009. Approximate version of the problem: k mismatches are allowed: O(knm2) - Lin and Adjeroh, 2012. ((k +logσ m) n

m) on average - Fredriksson and Navarro, 2004.

((k +logσ m) n

m) on average (reduced processing time and space

requirements) - Barton, Iliopoulos and Pissis, 2015.

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-98
SLIDE 98

Circular Dictionary Matching Problem

Definition Let Σ a finite alphabet of size σ. Given a multiset D of patterns, which are strings over Σ, of total length n, find the occurrences of all conjugates of patterns in a text T. O((|T|+occ)log1+ε n), with constraints on the length of patterns - Hon, Lu, Shah, and Thankachan, 2011. EBWT and circular suffix tree are connected. It is not optimal!

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications

slide-99
SLIDE 99

Thank you

  • M. Sciortino

Circular Words in Finite and Infinite Sequences: Theory and Applications