Sparse Suffix Tree Construction in Small Space Philip Bille, Inge Li - - PowerPoint PPT Presentation

sparse suffix tree construction in small space
SMART_READER_LITE
LIVE PREVIEW

Sparse Suffix Tree Construction in Small Space Philip Bille, Inge Li - - PowerPoint PPT Presentation

Sparse Suffix Tree Construction in Small Space Philip Bille, Inge Li Grtz, Hjalte Wedel Vildhj (Technical University of Denmark) Johannes Fischer, (Karlsruhe Institute of Technology) Tsvi Kopelowitz, (Weizmann Institute of Science) Benjamin


slide-1
SLIDE 1

Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj (Technical University of Denmark) Johannes Fischer, (Karlsruhe Institute of Technology) Tsvi Kopelowitz, (Weizmann Institute of Science) Benjamin Sach (University of Warwick)

Sparse Suffix Tree Construction in Small Space

slide-2
SLIDE 2

The sparse suffix array (SSA)

T b n a a T a s n n

slide-3
SLIDE 3

The sparse suffix array (SSA)

T b n a a T a s n n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-4
SLIDE 4

The sparse suffix array (SSA)

T b n a a T a s n n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-5
SLIDE 5

The sparse suffix array (SSA)

T b n a a T a s n n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-6
SLIDE 6

The sparse suffix array (SSA)

T b n a a T a s n n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sort the suffixes lexicographically 4 a a s n

slide-7
SLIDE 7

The sparse suffix array (SSA)

T b n a a T a s n n Sort the suffixes lexicographically 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-8
SLIDE 8

The sparse suffix array (SSA)

T b n a a T a s n n Suffix Array 2 1 7 3 6 5 Sort the suffixes lexicographically 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-9
SLIDE 9

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

n Suffix Array 2 1 7 3 6 5 Sort the suffixes lexicographically 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-10
SLIDE 10

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n Suffix Array 2 1 7 3 6 5 Sort the suffixes lexicographically 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-11
SLIDE 11

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-12
SLIDE 12

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-13
SLIDE 13

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

slide-14
SLIDE 14

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

O(b)

slide-15
SLIDE 15

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n 4 n

O(b)

slide-16
SLIDE 16

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n

O(b)

slide-17
SLIDE 17

The sparse suffix array (SSA)

T b n a a T a s n

  • Can be built in O(n) time log

and O(n) extra space

  • What if we only care about a few of the suffixes?

n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n

The sparse text indexing problem has been open since the 1960s O(b) . . . with first, partial results from 1996 onwards

slide-18
SLIDE 18

The sparse suffix array (SSA)

T b n a a T a s n n a a a s n a s n a s n b Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n

O(b)

slide-19
SLIDE 19

The sparse suffix array (SSA)

T b n a a T a s n n a a a s n a s n a s n b

  • O((n + b2) log2 b) time with high

probability (Las-Vegas)

  • O(n log2 b) time (Monte-Carlo)
  • both in O(b) extra space

Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n

O(b)

slide-20
SLIDE 20

The sparse suffix tree (SST)

T b n a a T a s n n a a a s n a s n a s n b

O(b)

  • O((n + b2) log2 b) time with high

probability (Las-Vegas)

  • O(n log2 b) time (Monte-Carlo)
  • both in O(b) space

a s na s nas nas s n a s bananas

Conversion between SSA and SST is simple and takes O(n log b) time

slide-21
SLIDE 21

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

a n

slide-22
SLIDE 22

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a n

slide-23
SLIDE 23

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a n

3

slide-24
SLIDE 24

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a n

4

slide-25
SLIDE 25

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a

  • LCP data structures are typically based

n

  • n the suffix array or suffix tree.
slide-26
SLIDE 26

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a

  • LCP data structures are typically based
  • We do the opposite - we use batched LCP queries

n

  • n the suffix array or suffix tree.

to construct the sparse suffix array

slide-27
SLIDE 27

LCPs - a fundamental tool for string algorithms

T b a b c a b a a b c b a b

For any (i, j), the longest common prefix is the largest ℓ such that T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] it’s the furthest you can go before hitting a mismatch

i j a

  • LCP data structures are typically based
  • We do the opposite - we use batched LCP queries

n

  • n the suffix array or suffix tree.

to construct the sparse suffix array

  • These LCP queries will be answered using Karp-Rabin fingerprints

to ensure that the space remains small

slide-28
SLIDE 28

Karp-Rabin fingerprints of strings

S a b a b a b c b c c

φ(S) = |S|−1

k=0 S[k]rk mod p

Here p = Θ(n4) is a prime and 1 ≤ r < p is a random integer with high probability,

S1 = S2 iff φ(S1) = φ(S2)

slide-29
SLIDE 29

Karp-Rabin fingerprints of strings

S a b a b a b c b c c

φ(S) = |S|−1

k=0 S[k]rk mod p

Here p = Θ(n4) is a prime and 1 ≤ r < p is a random integer with high probability,

S1 = S2 iff φ(S1) = φ(S2)

Observe that φ(S) fits in an O(log n) bit word

slide-30
SLIDE 30

Karp-Rabin fingerprints of strings

S a b a b a b c b c c

φ(S) = |S|−1

k=0 S[k]rk mod p

Here p = Θ(n4) is a prime and 1 ≤ r < p is a random integer with high probability,

S1 = S2 iff φ(S1) = φ(S2)

Observe that φ(S) fits in an O(log n) bit word Given φ(S[0, ℓ]) and φ(S[0, r]) we can compute φ(S[ℓ + 1, r]) in O(1) time

slide-31
SLIDE 31

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

slide-32
SLIDE 32

Simple, Monte-Carlo batched LCP queries

T

1 1 2 2 3 3 4 4 Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

slide-33
SLIDE 33

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

T

1 1 2 2 3 3 4 4

slide-34
SLIDE 34

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

T

1 1 2 2 3 3 4 4

slide-35
SLIDE 35

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

T

1 1 2 2 3 3 4 4

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

slide-36
SLIDE 36

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints 1

T

2 3 4 b

slide-37
SLIDE 37

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b

slide-38
SLIDE 38

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-39
SLIDE 39

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-40
SLIDE 40

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-41
SLIDE 41

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-42
SLIDE 42

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-43
SLIDE 43

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-44
SLIDE 44

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-45
SLIDE 45

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-46
SLIDE 46

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

slide-47
SLIDE 47

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

  • In each pass we store (at most) 4b prefix fingerprints
slide-48
SLIDE 48

Simple, Monte-Carlo batched LCP queries

Input : a string, T of length n and b pairs, (i, j) T[i . . . i + ℓ − 1] = T[j . . . j + ℓ − 1] Output : for each pair (i, j) output the largest ℓ s.t.

  • We find the largest ℓ for each pair by binary search (in parallel)

comparisons are performed using fingerprints

T

1 2 3 4 b prefix fingerprint

  • In each pass we store (at most) 4b prefix fingerprints

this takes O(n log b) time, O(b) space and is correct whp.

slide-49
SLIDE 49

Building the sparse suffix array using batched LCPs

T b n a a T a s n

slide-50
SLIDE 50

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-51
SLIDE 51

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

slide-52
SLIDE 52

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n

slide-53
SLIDE 53

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n

slide-54
SLIDE 54

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n

slide-55
SLIDE 55

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n because n s < 2 4 <

slide-56
SLIDE 56

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

because n s < 2 4 <

slide-57
SLIDE 57

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s The LCP of two suffixes gives us their order 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • Pick a random pivot and compare each other suffix to it
  • This partitions the suffixes in O(n log b) time and O(b) space
slide-58
SLIDE 58

Building the sparse suffix array using batched LCPs

T b n a a T a s n The LCP of two suffixes gives us their order 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • Pick a random pivot and compare each other suffix to it
  • This partitions the suffixes in O(n log b) time and O(b) space
slide-59
SLIDE 59

Building the sparse suffix array using batched LCPs

T b n a a T a s n The LCP of two suffixes gives us their order 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • Pick a random pivot and compare each other suffix to it
  • This partitions the suffixes in O(n log b) time and O(b) space
  • Recurse on each partition (the batch still contains b LCPs)
slide-60
SLIDE 60

Building the sparse suffix array using batched LCPs

T b n a a T a s n The LCP of two suffixes gives us their order 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • Pick a random pivot and compare each other suffix to it
  • This partitions the suffixes in O(n log b) time and O(b) space
  • Recurse on each partition (the batch still contains b LCPs)
slide-61
SLIDE 61

Building the sparse suffix array using batched LCPs

T b n a a T a s n The LCP of two suffixes gives us their order 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • The depth of the recursion is O(log b) whp. so. . .

The total time is O(n log2 b) and the space is O(b)

slide-62
SLIDE 62

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • The depth of the recursion is O(log b) whp. so. . .

The total time is O(n log2 b) and the space is O(b)

slide-63
SLIDE 63

Building the sparse suffix array using batched LCPs

T b n a a T a s n 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s 4 a a s n

  • We perform randomised quicksort on the b suffixes

using batched LCPs for suffix comparisons

  • The depth of the recursion is O(log b) whp. so. . .

The total time is O(n log2 b) and the space is O(b) This algorithm is Monte-Carlo and Las-Vegas. It can be made Monte-Carlo only by aborting the quicksort early

slide-64
SLIDE 64

The sparse suffix array (SSA)

T b n a a T a s n n a a a s n a s n a s n b

  • O((n + b2) log2 b) time with high

probability (Las-Vegas)

  • O(n log2 b) time (Monte-Carlo)
  • both in O(b) space

Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n

slide-65
SLIDE 65

Verifying the sparse suffix array

Suffix Array n 2 1 7 3 6 5 4 T b n a a T a s n How can we tell if this suffix array is correct?

slide-66
SLIDE 66

Verifying the sparse suffix array

Suffix Array n 2 1 7 3 6 5 4 T b n a a T a s n How can we tell if this suffix array is correct? Check that 2 4 < , 4 6 < , 6 1 < , 1 3 < . . .

slide-67
SLIDE 67

Verifying the sparse suffix array

Suffix Array n 2 1 7 3 6 5 4 T b n a a T a s n How can we tell if this suffix array is correct? Check that 2 4 < , 4 6 < , 6 1 < , 1 3 < . . .

slide-68
SLIDE 68

Verifying the sparse suffix array

Suffix Array n 2 1 7 3 6 5 4 T b n a a T a s n How can we tell if this suffix array is correct? Check that 2 4 < , 4 6 < , 6 1 < , 1 3 < . . . We could check 2 4 < using an LCP query if we verified it

slide-69
SLIDE 69

Verifying the sparse suffix array

Suffix Array n 2 1 7 3 6 5 4 T b n a a T a s n How can we tell if this suffix array is correct? Check that 2 4 < , 4 6 < , 6 1 < , 1 3 < . . . We could check 2 4 < using an LCP query if we verified it

slide-70
SLIDE 70

A first example

T

1 2 3

slide-71
SLIDE 71

A first example

T

1 2 3

slide-72
SLIDE 72

A first example

T

1 2 3

slide-73
SLIDE 73

A first example

T

1 2 3

slide-74
SLIDE 74

A first example

T

1 2 3

slide-75
SLIDE 75

A first example

T

1 2 3 If yellow (1) and blue (2) match then the right half of green (3) matches

slide-76
SLIDE 76

A first example

T

1 2 3 If yellow (1) and blue (2) match then the right half of green (3) matches This is a lock-stepped cycle

slide-77
SLIDE 77

A first example

T

1 2 3 If yellow (1) and blue (2) match then the right half of green (3) matches This is a lock-stepped cycle

slide-78
SLIDE 78

A second example

T

1 2 3

slide-79
SLIDE 79

A second example

T

1 2 3 If yellow (1),blue (2) and green (3) match then 3

4 of green (3) is periodic

slide-80
SLIDE 80

A second example

T

1 2 3 If yellow (1),blue (2) and green (3) match then 3

4 of green (3) is periodic

This is an unlocked cycle

slide-81
SLIDE 81

A second example

T

1 2 3

slide-82
SLIDE 82

A second example

T

1 2 3 These tricks only work when the offsets are small

  • ffsets
slide-83
SLIDE 83

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
slide-84
SLIDE 84

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
slide-85
SLIDE 85

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries

close

slide-86
SLIDE 86

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries

close

slide-87
SLIDE 87

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries

far

slide-88
SLIDE 88

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries

far

slide-89
SLIDE 89

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries

far

slide-90
SLIDE 90

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
slide-91
SLIDE 91

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
slide-92
SLIDE 92

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
  • We can apply one of the two tricks

to any short cycle

slide-93
SLIDE 93

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
  • We can apply one of the two tricks

to any short cycle (length at most 2 log b + 1)

slide-94
SLIDE 94

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
  • We can apply one of the two tricks

to any short cycle (length at most 2 log b + 1)

  • This breaks the cycle (because we delete an edge)
slide-95
SLIDE 95

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
  • We can apply one of the two tricks

to any short cycle (length at most 2 log b + 1)

  • This breaks the cycle (because we delete an edge)

Fact If every node has degree at least three there is a short cycle

slide-96
SLIDE 96

The overall idea

T

1 2 3

  • We build a graph which encodes the structure of the queries
  • Finding a short cycle in the graph takes O(b) time
  • This gives the additive O(b2 log b) term

Fact If every node has degree at least three there is a short cycle

  • All other steps take O(n log b) time over all rounds

(and use O(b) space)

slide-97
SLIDE 97

Summary

T b n a a T a s n n a a a s n a s n a s n b

  • O((n + b2) log2 b) time with high

probability (Las-Vegas)

  • O(n log2 b) time (Monte-Carlo)
  • both in O(b) space

Suffix Array 2 1 7 3 6 5 1 b n a a a s n n a a 2 a s n 3 n a a s n 5 a s n 6 a s 7 s Sparse Suffix Array 2 6 5 4 a a s n 4 b n