The Expected Number of Repetitions in Random Words Arseny M. Shur - - PowerPoint PPT Presentation

the expected number of repetitions in random words
SMART_READER_LITE
LIVE PREVIEW

The Expected Number of Repetitions in Random Words Arseny M. Shur - - PowerPoint PPT Presentation

The Expected Number of Repetitions in Random Words Arseny M. Shur Ural Federal University, Ekaterinburg, Russia Shanghai, April 25, 2015 A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 1 / 19


slide-1
SLIDE 1

The Expected Number of Repetitions in Random Words

Arseny M. Shur

Ural Federal University, Ekaterinburg, Russia

Shanghai, April 25, 2015

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 1 / 19

slide-2
SLIDE 2

Combinatorics on Words

A discipline that studies properties of sequences of symbols Born: 1906

  • A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr.
  • Mat. Nat. Kl. 7, 1–22 (1906)

Named: 1983

  • M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of

Mathemetics and Its Applications (1983)

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 2 / 19

slide-3
SLIDE 3

Combinatorics on Words

A discipline that studies properties of sequences of symbols Born: 1906

  • A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr.
  • Mat. Nat. Kl. 7, 1–22 (1906)

Named: 1983

  • M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of

Mathemetics and Its Applications (1983)

Sources: Algebra (terms) Symbolic dynamics (trajectories) Grammars and rewriting systems Algorithms for sequential data Biological strings . . .

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 2 / 19

slide-4
SLIDE 4

Palindromes and Squares

A palindrome is a word which is equal to its reversal, like a a i i b b

  • h

h p

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

slide-5
SLIDE 5

Palindromes and Squares

A palindrome is a word which is equal to its reversal, like a a i i b b

  • h

h p Palindromes are one of the most simple and common repetitions in words, along with squares, which are words consisting of two equal parts, like c c

  • u

u s s

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

slide-6
SLIDE 6

Palindromes and Squares

A palindrome is a word which is equal to its reversal, like a a i i b b

  • h

h p Palindromes are one of the most simple and common repetitions in words, along with squares, which are words consisting of two equal parts, like c c

  • u

u s s Palindromes are in some sense counterparts of squares: in a sequence of states of some finite-state machine, a square indicates repeated behaviour, while a palindrome shows that the machine reversed back to front; among the basic data structures, palindromes correspond to stacks, while squares correspond to queues; as a consequence, the language of all palindromes is context-free, while the language

  • f all squares is not.
  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

slide-7
SLIDE 7

Counting Factors

We consider finite words over finite (k-letter) alphabets; we write w = w[1..n] for a word of length n; words of the form w[i..j] are factors

  • f w. Normally, n is assumed big, and k is fixed.
  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 4 / 19

slide-8
SLIDE 8

Counting Factors

We consider finite words over finite (k-letter) alphabets; we write w = w[1..n] for a word of length n; words of the form w[i..j] are factors

  • f w. Normally, n is assumed big, and k is fixed.

A lot of results on the possible number of distinct palindromic factors and square factors in a word: max number of palindromes is n (Droubay, Pirillo, 2001) max number of squares is between n − O(√n) and 2n − O(log n) (Ilie, 2007) min number of palindromes is k for k ≥ 3 and 8 for k = 2, n ≥ 9 min number of squares is 0 for k ≥ 3 (Thue, 1912) and 3 for k = 2 (Fraenkel, Simpson, 1995) any number of palindromes between min and max is available for k ≥ 4, a word can contain k palindromes and 0 squares simultaneously

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 4 / 19

slide-9
SLIDE 9

Problems and Simple Answers

Problems

Find the expected number of

  • distinct palindromes
  • distinct squares
  • ccurring as factors in a random k-ary word.
  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

slide-10
SLIDE 10

Problems and Simple Answers

Problems

Find the expected number of

  • distinct palindromes
  • distinct squares
  • ccurring as factors in a random k-ary word.

Theorem

The expected number of distinct palindromic factors in a random word

  • f length n over a fixed nontrivial alphabet is Θ(√n).
  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

slide-11
SLIDE 11

Problems and Simple Answers

Problems

Find the expected number of

  • distinct palindromes
  • distinct squares
  • ccurring as factors in a random k-ary word.

Theorem

The expected number of distinct palindromic factors in a random word

  • f length n over a fixed nontrivial alphabet is Θ(√n).

As a by-product of the technique used, we also get

Theorem

The expected number of distinct square factors in a random word of length n over a fixed nontrivial alphabet is Θ(√n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

slide-12
SLIDE 12

Some Explanations

Let k (alphabetic size) be fixed; E(n) is the expectation studied. The expected number Em(n) of distinct palindromic factors

  • f length m in a random word of length n is not greater than

⋆ the total number of k-ary palindromes of length m; ⋆ the expected number of occurrences of palindromic factors of length m in a random word of length n.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 6 / 19

slide-13
SLIDE 13

Some Explanations

Let k (alphabetic size) be fixed; E(n) is the expectation studied. The expected number Em(n) of distinct palindromic factors

  • f length m in a random word of length n is not greater than

⋆ the total number of k-ary palindromes of length m; blue ⋆ the expected number of occurrences of palindromic factors of length m in a random word of length n. red

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 6 / 19

slide-14
SLIDE 14

Some Explanations (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

E(n) = Em(n) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair

  • f graphs equals to the height of the highest point up to a constant

multiple; thus, E(n) = O(√n); some additional considerations show that the upper bound is sharp up to a constant multiple, implying E(n) = Θ(√n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

slide-15
SLIDE 15

Some Explanations (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

E(n) = Em(n) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair

  • f graphs equals to the height of the highest point up to a constant

multiple; thus, E(n) = O(√n); some additional considerations show that the upper bound is sharp up to a constant multiple, implying E(n) = Θ(√n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

slide-16
SLIDE 16

Some Explanations (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

E(n) = Em(n) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair

  • f graphs equals to the height of the highest point up to a constant

multiple; thus, E(n) = O(√n); some additional considerations show that the upper bound is sharp up to a constant multiple, implying E(n) = Θ(√n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

slide-17
SLIDE 17

Some Explanations (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

E(n) = Em(n) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair

  • f graphs equals to the height of the highest point up to a constant

multiple; thus, E(n) = O(√n); some additional considerations show that the upper bound is sharp up to a constant multiple, implying E(n) = Θ(√n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

slide-18
SLIDE 18

Dependence on k

Refinement of the obtained result: consider E(n, k) instead of E(n) and find the dependence of the constant in the Θ(√n) expression on k.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

slide-19
SLIDE 19

Dependence on k

Refinement of the obtained result: consider E(n, k) instead of E(n) and find the dependence of the constant in the Θ(√n) expression on k. intuition: more letters – more luck needed to get a palindrome;

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

slide-20
SLIDE 20

Dependence on k

Refinement of the obtained result: consider E(n, k) instead of E(n) and find the dependence of the constant in the Θ(√n) expression on k. intuition: more letters – more luck needed to get a palindrome; broken by the picture: the peak on the right graph is ≈ √ kn;

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

slide-21
SLIDE 21

Dependence on k

Refinement of the obtained result: consider E(n, k) instead of E(n) and find the dependence of the constant in the Θ(√n) expression on k. intuition: more letters – more luck needed to get a palindrome; broken by the picture: the peak on the right graph is ≈ √ kn; is E(n, k) = Θ( √ kn)? Not so easy.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

slide-22
SLIDE 22

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer , we get √ kn

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-23
SLIDE 23

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer [half-integer], we get √ kn [2√n]

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-24
SLIDE 24

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer [half-integer], we get √ kn [2√n] similar for pe, but the values are √ k times less

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-25
SLIDE 25

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer [half-integer], we get √ kn [2√n] similar for pe, but the values are √ k times less note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!)

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-26
SLIDE 26

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer [half-integer], we get √ kn [2√n] similar for pe, but the values are √ k times less note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!) ◮ the upper bound oscillates between the orders of √n and √ kn

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-27
SLIDE 27

Dependence on k (Ctd)

Length 2m m pe km

n−2m+1 km

Length 2m+1 m po km+1

n−2m km

If po is an integer [half-integer], we get √ kn [2√n] similar for pe, but the values are √ k times less note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!) ◮ the upper bound oscillates between the orders of √n and √ kn ◮ what’s next?

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 9 / 19

slide-28
SLIDE 28

Balls and Bins

To get more intuition, suppose (even if this is not true) that for a random k-ary word of length n all events of type “to contain a given palindrome of length m” are independent and equiprobable.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 10 / 19

slide-29
SLIDE 29

Balls and Bins

To get more intuition, suppose (even if this is not true) that for a random k-ary word of length n all events of type “to contain a given palindrome of length m” are independent and equiprobable. Balls: palindromic factors of length m of our random word

w[i1..j1] w[i2..j2] w[i3..j3] w[i4..j4] w[i5..j5]

· · ·

w[is..js]

Bins: distinct palindromes of length m

aaaaa aabaa ababa

· · ·

bbbbb

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 10 / 19

slide-30
SLIDE 30

Balls and Bins

To get more intuition, suppose (even if this is not true) that for a random k-ary word of length n all events of type “to contain a given palindrome of length m” are independent and equiprobable. Balls: palindromic factors of length m of our random word

w[i1..j1] w[i2..j2] w[i3..j3] w[i4..j4] w[i5..j5]

· · ·

w[is..js]

Bins: distinct palindromes of length m

aaaaa aabaa ababa

· · ·

bbbbb

Folklore Proposition

For N bins and CN balls, the expected number of empty bins is ≈Ne−C.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 10 / 19

slide-31
SLIDE 31

Testing The Model

Conjecture

The function E(n, k) oscillates between its maximums E(n, k) =

  • 1 − 1

e + 4 k−1 − k+1 2(k3−1) − O

  • 1

kek

√ kn + O(

√ k log n √n

) for integer po and minimums E(n, k) =

  • 3 − 1

e + 4 k−1 − k2+1 2(k3−1) − O

  • 1

ek

√n + O(

√ k log n √n

) for integer pe.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 11 / 19

slide-32
SLIDE 32

Testing The Model

Conjecture

The function E(n, k) oscillates between its maximums E(n, k) =

  • 1 − 1

e + 4 k−1 − k+1 2(k3−1) − O

  • 1

kek

√ kn + O(

√ k log n √n

) for integer po and minimums E(n, k) =

  • 3 − 1

e + 4 k−1 − k2+1 2(k3−1) − O

  • 1

ek

√n + O(

√ k log n √n

) for integer pe. A big amount of experimental data for C(k) = E(n, k)/√n for different k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a novel data structure (palindromic tree).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 11 / 19

slide-33
SLIDE 33

Testing The Model

Conjecture

The function E(n, k) oscillates between its maximums E(n, k) =

  • 1 − 1

e + 4 k−1 − k+1 2(k3−1) − O

  • 1

kek

√ kn + O(

√ k log n √n

) for integer po and minimums E(n, k) =

  • 3 − 1

e + 4 k−1 − k2+1 2(k3−1) − O

  • 1

ek

√n + O(

√ k log n √n

) for integer pe. A big amount of experimental data for C(k) = E(n, k)/√n for different k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a novel data structure (palindromic tree). His data supports the conjecture in general cannot definitely tell whether the coefficients are correct

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 11 / 19

slide-34
SLIDE 34

Testing The Model

Conjecture

The function E(n, k) oscillates between its maximums E(n, k) =

  • 1 − 1

e + 4 k−1 − k+1 2(k3−1) − O

  • 1

kek

√ kn + O(

√ k log n √n

) for integer po and minimums E(n, k) =

  • 3 − 1

e + 4 k−1 − k2+1 2(k3−1) − O

  • 1

ek

√n + O(

√ k log n √n

) for integer pe. A big amount of experimental data for C(k) = E(n, k)/√n for different k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a novel data structure (palindromic tree). His data supports the conjecture in general cannot definitely tell whether the coefficients are correct One of the problems: for small alphabets, the suggested maximums and minimums are almost indistinguishable

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 11 / 19

slide-35
SLIDE 35

Analysis

Bad news: our assumption was totally wrong, because the events “to contain a given palindrome of length m” are dependent and have different probabilities. aaa · · · aaa is less probable than baa · · · aab, and each of them “suppresses” the other.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 12 / 19

slide-36
SLIDE 36

Analysis

Bad news: our assumption was totally wrong, because the events “to contain a given palindrome of length m” are dependent and have different probabilities. aaa · · · aaa is less probable than baa · · · aab, and each of them “suppresses” the other. Why the predictions with balls and bins were good?

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 12 / 19

slide-37
SLIDE 37

Analysis

Bad news: our assumption was totally wrong, because the events “to contain a given palindrome of length m” are dependent and have different probabilities. aaa · · · aaa is less probable than baa · · · aab, and each of them “suppresses” the other. Why the predictions with balls and bins were good? Good news: the probabilities for all words of length m are quite close (any word of length m is more probable as a factor than any word of length m+1); moreover, most of the palindromes have almost the same (or even exactly the same) probability. There is also a way to avoid considering dependencies.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 12 / 19

slide-38
SLIDE 38

Analysis

Bad news: our assumption was totally wrong, because the events “to contain a given palindrome of length m” are dependent and have different probabilities. aaa · · · aaa is less probable than baa · · · aab, and each of them “suppresses” the other. Why the predictions with balls and bins were good? Good news: the probabilities for all words of length m are quite close (any word of length m is more probable as a factor than any word of length m+1); moreover, most of the palindromes have almost the same (or even exactly the same) probability. There is also a way to avoid considering dependencies. Approach: the theory of factor avoidance.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 12 / 19

slide-39
SLIDE 39

Forget about dependence

A word u avoids a word w if w is not a factor of u.

Lemma

E(n, k, m) =

  • |w|=m,

w∈P

  • 1 − Aw (n)

kn

  • , where
  • E(n, k, m) is the expected number of distinct palindromes of length m

in a random word of length n

  • P is the set of all palindromes
  • Aw(n) is the number of words of length n avoiding the word w (over a

fixed k-letter alphabet)

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 13 / 19

slide-40
SLIDE 40

Forget about dependence

A word u avoids a word w if w is not a factor of u.

Lemma

E(n, k, m) =

  • |w|=m,

w∈P

  • 1 − Aw (n)

kn

  • , where
  • E(n, k, m) is the expected number of distinct palindromes of length m

in a random word of length n

  • P is the set of all palindromes
  • Aw(n) is the number of words of length n avoiding the word w (over a

fixed k-letter alphabet) Since E(n, k) =

n/2

  • m=1

E(n, k, m), all we need is a good asymptotics for Aw(n).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 13 / 19

slide-41
SLIDE 41

Number of words avoiding a single factor

A word u is a border of a word w if u is both a prefix and a suffix of w (including the case u = w) With a word w of length m we associate its border array, which is a word ˆ w[1..m] over {0, 1} such that w[i] = 1 if and only if w has a border of length m−i+1 The border array can be interpreted as the array of coefficients of a real-valued border polynomial fw(x) such that ˆ w[i] is the coefficient of xm−i. Since ˆ w[1] = 1, this polynomial has degree m−1

Example

The word w = aabaabaa has non-empty borders w, aabaa, aa, and a.

  • Its border array ˆ

w equals 10010011.

  • Its border polynomial is fw(x) = x7 + x4 + x + 1.
  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 14 / 19

slide-42
SLIDE 42

Number of words avoiding a single factor (2)

Recall that fw(k) is the border polynomial of w.

Theorem (Guibas, Odlyzko, 1978, 1981)

1) The number Aw(n) of words of length n avoiding a given word w of length m > 3 is Aw(n) = Cwθn

w + O(1.7n),

where θw = k − 1 fw(k) − f ′

w(k)

f 3

w(k) − O

m2 k3m

  • ,

Cw = 1 1 − (k − θ)2f ′

w(θ) .

2) The condition fu(k) < fw(k) implies Au(n) ≤ Aw(n) for all n ≥ 0 and, in particular, θu ≤ θw.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 15 / 19

slide-43
SLIDE 43

Number of words avoiding a single factor (2)

Recall that fw(k) is the border polynomial of w.

Theorem (Guibas, Odlyzko, 1978, 1981)

1) The number Aw(n) of words of length n avoiding a given word w of length m > 3 is Aw(n) = Cwθn

w + O(1.7n),

where θw = k − 1 fw(k) − f ′

w(k)

f 3

w(k) − O

m2 k3m

  • ,

Cw = 1 1 − (k − θ)2f ′

w(θ) .

2) The condition fu(k) < fw(k) implies Au(n) ≤ Aw(n) for all n ≥ 0 and, in particular, θu ≤ θw. fu(k) < fw(k) iff ˆ u < ˆ w as an integer written in binary

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 15 / 19

slide-44
SLIDE 44

Number of words avoiding a single factor (2)

Recall that fw(k) is the border polynomial of w.

Theorem (Guibas, Odlyzko, 1978, 1981)

1) The number Aw(n) of words of length n avoiding a given word w of length m > 3 is Aw(n) = Cwθn

w + O(1.7n),

where θw = k − 1 fw(k) − f ′

w(k)

f 3

w(k) − O

m2 k3m

  • ,

Cw = 1 1 − (k − θ)2f ′

w(θ) .

2) The condition fu(k) < fw(k) implies Au(n) ≤ Aw(n) for all n ≥ 0 and, in particular, θu ≤ θw. fu(k) < fw(k) iff ˆ u < ˆ w as an integer written in binary W.h.p., if w is a palindrome, then ˆ w = 10 · · · 0z, where |z| = O(log |w|)

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 15 / 19

slide-45
SLIDE 45

Estimating E(n, k)

Since almost all palindromes of length m have “almost equal” border polynomials, we can derive the following formula: E(n, k, m) =    kε 1 − e−

1 k2ε

√n + O

  • log n

√n

  • ,

m is even, kε 1 − e−

1 k2ε

√ kn + O

  • log n

√n

  • ,

m is odd, where m = 2(pe + ε) = 2(po + ε) + 1.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 16 / 19

slide-46
SLIDE 46

Estimating E(n, k)

Since almost all palindromes of length m have “almost equal” border polynomials, we can derive the following formula: E(n, k, m) =    kε 1 − e−

1 k2ε

√n + O

  • log n

√n

  • ,

m is even, kε 1 − e−

1 k2ε

√ kn + O

  • log n

√n

  • ,

m is odd, where m = 2(pe + ε) = 2(po + ε) + 1. The function g(x) = x(1 − e−1/x2), appearing as the coefficient (for x = kε) has a tricky behaviour. In particular, g(1) = 1 − 1/e ≈ 0.6321, but this is not the maximum value! maxx>0 g(x) ≈ 0.6382 is reached at x ≈ 0.8921 thus, the coefficients suggested by the balls-and-bins approach were slightly incorrect

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 16 / 19

slide-47
SLIDE 47

The final result

Theorem

Let k ≥ 2. (1) The expected palindromic richness E(n, k) of a random k-ary word

  • f length n is Θ(√n) as n → ∞ with k fixed.

(2) The ratio E(n,k)

√n

has no limit as n → ∞ with k fixed. (3) The function C(k) = lim infn→∞

E(n,k) √n

is Θ(1) as k → ∞. (4) The function C(k) = lim supn→∞

E(n,k) √n

is Θ( √ k) as k → ∞. (5) limk→∞ C(k) = 3 − 1/e. (6) limk→∞ C(k)/ √ k = χ, where χ ≈ 0.6382 is the maximum of the function f(x) = x(1 − e−1/x2) in the interval (0, ∞).

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 17 / 19

slide-48
SLIDE 48

The final result

Theorem

Let k ≥ 2. (1) The expected palindromic richness E(n, k) of a random k-ary word

  • f length n is Θ(√n) as n → ∞ with k fixed.

(2) The ratio E(n,k)

√n

has no limit as n → ∞ with k fixed. (3) The function C(k) = lim infn→∞

E(n,k) √n

is Θ(1) as k → ∞. (4) The function C(k) = lim supn→∞

E(n,k) √n

is Θ( √ k) as k → ∞. (5) limk→∞ C(k) = 3 − 1/e. (6) limk→∞ C(k)/ √ k = χ, where χ ≈ 0.6382 is the maximum of the function f(x) = x(1 − e−1/x2) in the interval (0, ∞). Some particular values:

C(2) ≈ 6.17315, C(2) = 6.17368 C(3) ≈ 4.40121, C(3) = 4.41410 C(10) ≈ 3.02693, C(10) = 3.41133 C(50) ≈ 2.70152, C(50) = 5.09183

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 17 / 19

slide-49
SLIDE 49

Squares

Forget about squares. They are much alike even-length palindromes. The only slight difference is in borders, but still, almost all squares have “almost equal” border polynomials. The rest is easy, because there is no analogs of odd-length palindromes.

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 18 / 19

slide-50
SLIDE 50

Thank you for your attention!

  • A. M. Shur (Ural Federal U)

Expected Number of Repetitions Shanghai, April 25, 2015 19 / 19