Tighter Bounds for the Sum of Irreducible LCP Values Juha Krkkinen 1 - - PowerPoint PPT Presentation

tighter bounds for the sum of irreducible lcp values
SMART_READER_LITE
LIVE PREVIEW

Tighter Bounds for the Sum of Irreducible LCP Values Juha Krkkinen 1 - - PowerPoint PPT Presentation

Tighter Bounds for the Sum of Irreducible LCP Values Juha Krkkinen 1 Dominik Kempa 1 Marcin Pitkowski 2 1 University of Helsinki 2 Nicolaus Copernicus University Outline 1 Cyclic words 2 Irreducible LCP values 3 Upper bound for the sum of


slide-1
SLIDE 1

Tighter Bounds for the Sum

  • f Irreducible LCP Values

Juha Kärkkäinen1 Dominik Kempa1 Marcin Piątkowski2

1 University of Helsinki 2 Nicolaus Copernicus University

slide-2
SLIDE 2

Outline

1 Cyclic words 2 Irreducible LCP values 3 Upper bound for the sum of irreducible values 4 Lower bound for the sum of irreducible values

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-3
SLIDE 3

Cyclic suffixes

W = { {v1, v2, v3} } = { {aab, aab, ab} } suf(W ) 1, 0 a a b a a b · · · a2 a1 a0 an . . . a3 1, 1 a b a a b a · · · 1, 2 b a a b a a · · · 2, 0 a a b a a b · · · 2, 1 a b a a b a · · · 2, 2 b a a b a a · · · 3, 0 a b a b a b · · · 3, 1 b a b a b a · · ·

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-4
SLIDE 4

Cyclic suffix array

W = { {v1, v2, v3} } = { {aab, aab, ab} } SAW suf(W ) 1, 0 a a b a a b · · · a2 a1 a0 an . . . a3 2, 0 a a b a a b · · · 1, 1 a b a a b a · · · 2, 1 a b a a b a · · · 3, 0 a b a b a b · · · 1, 2 b a a b a a · · · 2, 2 b a a b a a · · · 3, 1 b a b a b a · · ·

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-5
SLIDE 5

Cyclic suffix array

W = { {v1, v2, v3} } = { {aab, aab, ab} } SAW suf(W ) 1, 0 a a b a a b · · · a2 a1 a0 an . . . a3 2, 0 a a b a a b · · · 1, 1 a b a a b a · · · 2, 1 a b a a b a · · · 3, 0 a b a b a b · · · 1, 2 b a a b a a · · · 2, 2 b a a b a a · · · 3, 1 b a b a b a · · ·

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-6
SLIDE 6

Longest common prefix and distinguishing prefix arrays

W = { {v1, v2, v3} } = { {aab, aab, ab} } SAW suf(W ) LCPW DPW 1, 0 a a b a a b · · · − − 2, 0 a a b a a b · · · ∞ ∞ 1, 1 a b a a b a · · · 1 2 2, 1 a b a a b a · · · ∞ ∞ 3, 0 a b a b a b · · · 3 4 1, 2 b a a b a a · · · 1 2, 2 b a a b a a · · · ∞ ∞ 3, 1 b a b a b a · · · 2 3

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-7
SLIDE 7

Burrows-Wheeler transform

W = { {v1, v2, v3} } = { {aab, aab, ab} } SAW suf(W ) LCPW DPW BWT(W ) 1, 0 a a b a a b · · · − − b 2, 0 a a b a a b · · · ∞ ∞ b 1, 1 a b a a b a · · · 1 2 a 2, 1 a b a a b a · · · ∞ ∞ a 3, 0 a b a b a b · · · 3 4 b 1, 2 b a a b a a · · · 1 a 2, 2 b a a b a a · · · ∞ ∞ a 3, 1 b a b a b a · · · 2 3 a

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-8
SLIDE 8

Cyclic equivalence

Cyclic equivalence Two multisets of words V and W are cyclically equivalent if suf(V ) = suf(W ). Example multiset W = { {aab, aab, ab} } aabaabaabaab · · · abaabaabaaba · · · baabaabaabaa · · · aabaabaabaab · · · abaabaabaaba · · · baabaabaabaa · · · abababababab · · · babababababa · · · Equivalence class { {aab, aab, ab} } { {aab, aab, ba} } { {aba, aab, ab} } { {aba, aab, ba} } { {baa, aab, ab} } { {baa, aab, ba} } { {aab, aba, ab} } { {aab, aba, ba} } . . . . . . { {aabaab, ab} } { {aabaab, ba} } { {abaaba, ab} } { {abaaba, ba} } { {baabaa, ab} } { {baabaa, ba} }

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-9
SLIDE 9

Cyclic equivalence

Lemma Let W = { {wi} }s

i=1 be a multiset of cyclic words. Then:

1

There exists a set of cyclic words V = {vi}t

i=1 such that

suf(W ) = suf(V ).

2

There exists a multiset of primitive cyclic words U = { {ui} }p

i=1 such

that suf(W ) = suf(V ).

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-10
SLIDE 10

Cyclic equivalence

Lemma Let W = { {wi} }s

i=1 be a multiset of cyclic words. Then:

1

There exists a set of cyclic words V = {vi}t

i=1 such that

suf(W ) = suf(V ).

2

There exists a multiset of primitive cyclic words U = { {ui} }p

i=1 such

that suf(W ) = suf(V ). Remark If two multisets of words V and W are cyclically equivalent, then LCPV = LCPW , DPV = DPW and BWTV = BWTW .

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-11
SLIDE 11

Cyclic equivalence

Lemma Let W = { {wi} }s

i=1 be a multiset of cyclic words. Then:

1

There exists a set of cyclic words V = {vi}t

i=1 such that

suf(W ) = suf(V ).

2

There exists a multiset of primitive cyclic words U = { {ui} }p

i=1 such

that suf(W ) = suf(V ). Remark If two multisets of words V and W are cyclically equivalent, then LCPV = LCPW , DPV = DPW and BWTV = BWTW . Theorem (Mantaci, Restivo, Rosone, Sciortino – 2007) The mapping from a word v to the cyclical equivalence class of IBWT(v) is a bijection

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-12
SLIDE 12

Irreducible LCP and DP values

Irreducible values A value LCPW [i] (respectively DPW [i]) is irreducible if BWTW [i] = BWTW [i − 1]. SAW suf(W ) LCPW DPW BWT(W ) 1, 0 a a b a a b · · · − − b 2, 0 a a b a a b · · · ∞ ∞ b 1, 1 a b a a b a · · · 1 2 a 2, 1 a b a a b a · · · ∞ ∞ a 3, 0 a b a b a b · · · 3 4 b 1, 2 b a a b a a · · · 1 a 2, 2 b a a b a a · · · ∞ ∞ a 3, 1 b a b a b a · · · 2 3 a

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-13
SLIDE 13

Sum of irreducible LCP values

Irreducible values W = { {wi} }s

i=1

and

n

  • i=1

|wi| = n Σilcp(W ) – sum of irreducible LCP values Σidp(W ) – sum of distinguishing prefixes lengths Theorem (Kärkkäinen, Manzini, Puglisi – 2009) Σilcp(W ) = O(n log n)

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-14
SLIDE 14

New upper bounds for the sum of irreducible lcp values

Theorem For any multiset W of words of total length n > 0, we have Σilcp(W ) Σidp(W ) n⌈lg n⌉ − 2⌈lg n⌉ + 1 Theorem For any multiset W of words of total length n > 0 such that BWT(W ) has r runs, we have Σilcp(W ) + r − 1 = Σidp(W ) n⌈lg r⌉ − 2⌈lg r⌉ + 1

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-15
SLIDE 15

Reverse suffixes

W = { {a, aab, abb, b} }

(1)

aaaaaa · · ·

(2)

baabaa · · ·

(3)

abaaba · · ·

(4)

bbabba · · ·

(5)

aabaab · · ·

(6)

babbab · · ·

(7)

abbabb · · ·

(8)

bbbbbb · · · STree(W ) a b a b a b a b a b a b a b

  • (1)

(5) (3) (7) (2) (6) (4) (8)

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-16
SLIDE 16

Reverse suffixes

W = { {a, aab, abb, b} }

(1)

aaaaaa · · ·

(2)

baabaa · · ·

(3)

abaaba · · ·

(4)

bbabba · · ·

(5)

aabaab · · ·

(6)

babbab · · ·

(7)

abbabb · · ·

(8)

bbbbbb · · · STree(W ) a b a b a b a b a b a b a b

  • (1)

(5) (3) (7) (2) (6) (4) (8)

Dispersal pair with respect to The pair of leaves (u, v) of an ordered tree is called a dispersal pair if u < v the subtree rooted at their nearest common ancestor contains no leaf w such that u < w < v.

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-17
SLIDE 17

Reverse suffixes

W = { {a, aab, abb, b} }

(1)

aaaaaa · · ·

(2)

baabaa · · ·

(3)

abaaba · · ·

(4)

bbabba · · ·

(5)

aabaab · · ·

(6)

babbab · · ·

(7)

abbabb · · ·

(8)

bbbbbb · · · STree(W ) a b a b a b a b a b a b a b

  • (1)

(5) (3) (7) (2) (6) (4) (8)

Lemma D

  • STree(W ), W
  • = Σidp(W )

D(T, ) – the number of dispersal pairs in T

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-18
SLIDE 18

n log n upper bound

d(1) = 0 d(n) = max

i∈[1..⌊n/2⌋] d(n, i)

when n > 1 d(n, k) = d(k) + d(n − k) + min{2k, n − 1} where n > 0 and k ∈ [0..⌊n/2⌋]. Lemma d(n) = max{D(T, )}, where the maximum is taken over any rooted tree T with n leaves and any total order on its leaves.

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-19
SLIDE 19

n log n upper bound

Lemma d(n) = n⌈lg n⌉ − 2⌈lg n⌉ + 1 Theorem For any multiset W of words of total length n > 0, we have Σilcp(W ) Σidp(W ) d(n) n⌈lg n⌉ − 2⌈lg n⌉ + 1

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-20
SLIDE 20

n log r upper bound

Lemma If BWT(W ) has r runs, then

  • Du
  • STree(W ), W
  • < r for every

vertex u in STree(W ).

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-21
SLIDE 21

n log r upper bound

dr(1) = 0 dr(n) = max

i∈[1..⌊n/2⌋] dr(n, i)

when n > 1 dr(n, k) = dr(k) + dr(n − k) + min{2k, n − 1, r − 1} where r > 0, n > 0 and k ∈ [0..⌊n/2⌋]. Lemma dr(n) = max{d(T, )}, where the maximum is taken over any rooted tree T with n leaves and any total order on its leaves s.t. |Du(T, )| < r for every vertex u ∈ T.

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-22
SLIDE 22

n log r upper bound

Lemma For any 2 r n we have dr(n) n⌈lg r⌉ − 2⌈lg r⌉ + 1. Theorem For any multiset W of words of total length n > 0 such that BWT(W ) has r runs, we have Σilcp(W ) + r − 1 = Σidp(W ) dr(n) n⌈lg r⌉ − 2⌈lg r⌉ + 1

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-23
SLIDE 23

n log n lower bound for Σidp

De Bruijn set A set of words W = {wi}t

i=1 over the alphabet A is a de Bruijn set of

  • rder k if t

i=1 |wi| = 2k and every word v ∈ Ak is a prefix of exactly

  • ne word in suf(W ).

W = {a, aab, abb, b}

4

  • i=1

|wi| = 23 W1,1 = a a a a a a · · · W2,1 = a a b a a b · · · W2,3 = a b a a b a · · · W3,1 = a b b a b b · · · W2,3 = b a a b a a · · · W3,2 = b a b b a b · · · W4,3 = b b a b b a · · · W4,1 = b b b b b b · · ·

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-24
SLIDE 24

n log n lower bound for Σidp

De Bruijn set A set of words W = {wi}t

i=1 over the alphabet A is a de Bruijn set of

  • rder k if t

i=1 |wi| = 2k and every word v ∈ Ak is a prefix of exactly

  • ne word in suf(W ).

W = {a, aab, abb, b}

4

  • i=1

|wi| = 23 W1,1 = a a a a a a · · · W2,1 = a a b a a b · · · W2,3 = a b a a b a · · · W3,1 = a b b a b b · · · W2,3 = b a a b a a · · · W3,2 = b a b b a b · · · W4,3 = b b a b b a · · · W4,1 = b b b b b b · · · Lemma (Higgins – 2012) For k 1, and any u ∈ Uk = {ab, ba}2k−1, W = IBWT(u) is a de Bruijn set of order k.

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-25
SLIDE 25

n log n lower bound for Σidp

Wk = IBWT

  • (ab)k−1

a b a b a b a b a b a b a b

  • W3 = {a, aab, abb, b}

Theorem Σilcp(Wk) =

k−1

  • i=1

i2i = k · 2k − 2k+1 + 2

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-26
SLIDE 26

n log n lower bound for Σidp

Theorem For any k 1, there exists a word w of length n = 2k such that Σidp(w) = n log n − O(n).

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-27
SLIDE 27

n log n lower bound for Σidp

Theorem For any k 1, there exists a word w of length n = 2k such that Σidp(w) = n log n − O(n).

  • j

i

  • i + 1
  • k
  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-28
SLIDE 28

n log n lower bound for Σidp

Theorem For any k 1, there exists a word w of length n = 2k such that Σidp(w) = n log n − O(n).

  • j

i

  • i + 1
  • k
  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-29
SLIDE 29

n log r lower bound for Σidp

Wk,j = IBWT

  • (ab)2k−1
  • u

aj2k−1 a · · · · · · a b · · · · · · b u

2k−1 2k−1

a · · · · · · a b · · · · · · b u a · · · · · · a a · · · · · · a

2k−1 j · 2k−1 2k−1

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-30
SLIDE 30

n log r lower bound for Σidp

Wk,j = IBWT

  • (ab)2k−1
  • u

aj2k−1 a · · · · · · a b · · · · · · b u

2k−1 2k−1

a · · · · · · a b · · · · · · b u a · · · · · · a a · · · · · · a

2k−1 j · 2k−1 2k−1

Wk,j =

j+1

  • i=1

Si,k,j Si,k,j =

  • aibaj{a, baj}k−1 : i j

aj+1{a, baj}k−1 : i > j Lemma For j 1 we have Σilcp(Wk,j) = (j + 2)k2k−1 − 2k+1 + j + 1.

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-31
SLIDE 31

n log r lower bound for Σidp

Theorem For any r = 2k + 1, k 1, and n r such that 2k−1|n, there exists a word w of length n such that BWT(w) contains r − o(r) runs and Σidp(w) = n log r − O(n).

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-32
SLIDE 32

Concluding remarks

New upper bound for Σilcp and Σidp New upper bound for Σilcp and Σidp related to the number of runs in BWT Lower bounds for Σilcp and Σidp matching upper bounds Lower bounds for Σilcp and Σidp – single word case

  • J. Kärkkäinen, D. Kempa, M. Piątkowski

Tighter Bounds for the Sum of Irreducible LCP Values

slide-33
SLIDE 33

THANK YOU KIITOS DZI ˛

EKUJ ˛ E

GRAZIE