Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , - - PowerPoint PPT Presentation

time space trade offs for longest common extensions
SMART_READER_LITE
LIVE PREVIEW

Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , - - PowerPoint PPT Presentation

Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , Inge Li Grtz 1 , Benjamin Sach 2 , and Hjalte Wedel Vildhj 1 1 Technical University of Denmark, DTU Informatics, { phbi,ilg,hwvi } @imm.dtu.dk 2 University of Warwick,


slide-1
SLIDE 1

Time-Space Trade-Offs for Longest Common Extensions

Philip Bille1, Inge Li Gørtz1, Benjamin Sach2, and Hjalte Wedel Vildhøj1

1Technical University of Denmark, DTU Informatics, {phbi,ilg,hwvi}@imm.dtu.dk 2University of Warwick, Department of Computer Science, sach@dcs.warwick.ac.uk

CPM 2012, Helsinki July 4, 2012

1 / 56

slide-2
SLIDE 2

The Longest Common Extension Problem

Definition

Problem: Preprocess a string T of length n to support LCE queries:

◮ LCE(i, j) = The length of the longest common prefix of the suffixes

starting at position i and j in T. Example

T = b a n a n a s

LCE(2, 4) = ?

1 2 3 4 5 6 7

2 / 56

slide-3
SLIDE 3

The Longest Common Extension Problem

Definition

Problem: Preprocess a string T of length n to support LCE queries:

◮ LCE(i, j) = The length of the longest common prefix of the suffixes

starting at position i and j in T. Example

T = b a n a n a s

LCE(2, 4) = ?

a n a n a s a n a s

1 2 3 4 5 6 7

3 / 56

slide-4
SLIDE 4

The Longest Common Extension Problem

Definition

Problem: Preprocess a string T of length n to support LCE queries:

◮ LCE(i, j) = The length of the longest common prefix of the suffixes

starting at position i and j in T. Example

T = b a n a n a s

LCE(2, 4) = 3

a n a n a s a n a s

1 2 3 4 5 6 7

4 / 56

slide-5
SLIDE 5

The Longest Common Extension Problem

Definition

Problem: Preprocess a string T of length n to support LCE queries:

◮ LCE(i, j) = The length of the longest common prefix of the suffixes

starting at position i and j in T. Example

T = b a n a n a s

LCE(2, 5) = 0

a n a n a s n a s

1 2 3 4 5 6 7

5 / 56

slide-6
SLIDE 6

The Longest Common Extension Problem

Definition

Problem: Preprocess a string T of length n to support LCE queries:

◮ LCE(i, j) = The length of the longest common prefix of the suffixes

starting at position i and j in T. Example

T = b a n a n a s

LCE(2, 5) = 0

a n a n a s n a s

1 2 3 4 5 6 7 ◮ We assume that the input is given in read-only memory and is not

included in the space complexity.

6 / 56

slide-7
SLIDE 7

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j LCE(i, j) =

1 2 3 4 5 6 7

7 / 56

slide-8
SLIDE 8

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j LCE(i, j) = 1

1 2 3 4 5 6 7

8 / 56

slide-9
SLIDE 9

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j LCE(i, j) = 2

1 2 3 4 5 6 7

9 / 56

slide-10
SLIDE 10

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j LCE(i, j) = 3

1 2 3 4 5 6 7

10 / 56

slide-11
SLIDE 11

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j Time: O(n) Space: O(1) LCE(i, j) = 3

1 2 3 4 5 6 7

11 / 56

slide-12
SLIDE 12

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j Time: O(n) Space: O(1) LCE(i, j) = 3

1 2 3 4 5 6 7

#2: Store the suffix tree

2

nas

4

s na

6

s a

1

bananas

3

nas

5

s na

7

s

12 / 56

slide-13
SLIDE 13

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j Time: O(n) Space: O(1) LCE(i, j) = 3

1 2 3 4 5 6 7

#2: Store the suffix tree

NCA(2, 4) 2

nas

4

s na

6

s a

1

bananas

3

nas

5

s na

7

s

LCE(i, j) = |NCA(i, j)| = 3

13 / 56

slide-14
SLIDE 14

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j Time: O(n) Space: O(1) LCE(i, j) = 3

1 2 3 4 5 6 7

#2: Store the suffix tree

NCA(2, 4) 2

nas

4

s na

6

s a

1

bananas

3

nas

5

s na

7

s

Time: O(1) Space: O(n) LCE(i, j) = |NCA(i, j)| = 3

14 / 56

slide-15
SLIDE 15

Two Simple Solutions

#1: Store nothing

T = b a n a n a s

i j Time: O(n) Space: O(1) LCE(i, j) = 3

1 2 3 4 5 6 7

#2: Store the suffix tree

NCA(2, 4) 2

nas

4

s na

6

s a

1

bananas

3

nas

5

s na

7

s

Time: O(1) Space: O(n) LCE(i, j) = |NCA(i, j)| = 3

Trade-off?

15 / 56

slide-16
SLIDE 16

Our Results

Time: O (n) Space: O (1) Time: O (1) Space: O (n)

Trade-off? Less space Faster

Store nothing Store suffix tree

16 / 56

slide-17
SLIDE 17

Our Results

Time: O (n) Space: O (1) Time: O

  • τ log
  • LCE(i,j)

τ

  • Space:

O n

τ

  • Time:

O (τ) Space: O

  • n

√τ

  • Time:

O (1) Space: O (n)

Trade-off? Less space Faster

Randomized Deterministic Store nothing Store suffix tree

Trade-off parameter τ, 1 ≤ τ ≤ n

17 / 56

slide-18
SLIDE 18

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

18 / 56

slide-19
SLIDE 19

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

19 / 56

slide-20
SLIDE 20

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

20 / 56

slide-21
SLIDE 21

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

21 / 56

slide-22
SLIDE 22

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

22 / 56

slide-23
SLIDE 23

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

23 / 56

slide-24
SLIDE 24

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

24 / 56

slide-25
SLIDE 25

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

i j

25 / 56

slide-26
SLIDE 26

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Difference Covers

A difference cover modulo τ is a set of integers D ⊆ {0, 1, . . . , τ − 1} such that for any distance d ∈ {0, 1, . . . , τ − 1}, D contains two elements separated by distance d modulo τ. Ex: The set D = {1, 2, 4} is a difference cover modulo 5. d 1 2 3 4 i, j 1, 1 2, 1 1, 4 4, 1 1, 2

1 2 4 3 1 4 2 3

26 / 56

slide-27
SLIDE 27

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

D D D D

Difference Covers

A difference cover modulo τ is a set of integers D ⊆ {0, 1, . . . , τ − 1} such that for any distance d ∈ {0, 1, . . . , τ − 1}, D contains two elements separated by distance d modulo τ. Ex: The set D = {1, 2, 4} is a difference cover modulo 5. d 1 2 3 4 i, j 1, 1 2, 1 1, 4 4, 1 1, 2

1 2 4 3 1 4 2 3

27 / 56

slide-28
SLIDE 28

A Deterministic Solution

Idea: Store a subset of the n suffixes in a compacted trie.

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

D D D D

Lemma (Colbourn and Ling1)

For any τ, a difference cover modulo τ of size at most √ 1.5τ + 6 can be computed in O(√τ) time.

Analysis

Time: O(τ) Space: O(#stored suffixes) = O n

τ |D|

  • = O
  • n

√τ

  • 1C. J. Colbourn and A. C. Ling. Quorums from difference covers. Inf. Process. Lett.

75(1-2):9–12, 2000

28 / 56

slide-29
SLIDE 29

A Randomized Solution (Monte Carlo)

Rabin-Karp Fingerprints

Let p be a sufficiently large prime and choose b ∈ Zp uniformly at random. φ(S) =

|S|

  • k=1

S[k]bk mod p .

T = d b c a a b c a b c a a b c a c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

29 / 56

slide-30
SLIDE 30

A Randomized Solution (Monte Carlo)

Rabin-Karp Fingerprints

Let p be a sufficiently large prime and choose b ∈ Zp uniformly at random. φ(S) =

|S|

  • k=1

S[k]bk mod p .

T = d b c a a b c a b c a a b c a c = 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2

φ(T[2 . . . 7]) = 1b1 + 2b2 + 0b3 + 0b4 + 1b5 + 2b6 mod p

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

30 / 56

slide-31
SLIDE 31

A Randomized Solution (Monte Carlo)

Rabin-Karp Fingerprints

Let p be a sufficiently large prime and choose b ∈ Zp uniformly at random. φ(S) =

|S|

  • k=1

S[k]bk mod p .

T = d b c a a b c a b c a a b c a c = 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2

φ(T[2 . . . 7]) = 1b1 + 2b2 + 0b3 + 0b4 + 1b5 + 2b6 mod p

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Crucial property: With high probability φ is collision-free on substrings of T, i.e., φ(S1) = φ(S2) iff S1 = S2.

31 / 56

slide-32
SLIDE 32

A Randomized Solution (Monte Carlo)

Rabin-Karp Fingerprints

Let p be a sufficiently large prime and choose b ∈ Zp uniformly at random. φ(S) =

|S|

  • k=1

S[k]bk mod p .

T = d b c a a b c a b c a a b c a c = 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2

φ(T[2 . . . 7]) = 1b1 + 2b2 + 0b3 + 0b4 + 1b5 + 2b6 mod p

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Crucial property: With high probability φ is collision-free on substrings of T, i.e., φ(S1) = φ(S2) iff S1 = S2. Also important: φ(T[i . . . j + 1]) can be computed from φ(T[i . . . j]) in O(1) time.

32 / 56

slide-33
SLIDE 33

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

slide-34
SLIDE 34

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

S

φ

′′

φ

Observation: If S is block aligned we can compute φ(S) in O(1) time. Otherwise, the time needed is O(τ).

34 / 56

slide-35
SLIDE 35

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

S

φ

′′

φ

Observation: If S is block aligned we can compute φ(S) in O(1) time. Otherwise, the time needed is O(τ).

35 / 56

slide-36
SLIDE 36

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

36 / 56

slide-37
SLIDE 37

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

37 / 56

slide-38
SLIDE 38

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • 38 / 56
slide-39
SLIDE 39

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • 39 / 56
slide-40
SLIDE 40

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • 40 / 56
slide-41
SLIDE 41

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • 41 / 56
slide-42
SLIDE 42

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • ×

×

42 / 56

slide-43
SLIDE 43

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • ×

×

  • 43 / 56
slide-44
SLIDE 44

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • ×

×

  • ×

×

44 / 56

slide-45
SLIDE 45

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • ×

×

  • ×

×

  • 45 / 56
slide-46
SLIDE 46

A Randomized Solution (Monte Carlo)

How to answer a query

Idea: Store fingerprints of suffixes starting at every τ’th position in T. T =

Blocks of τ chars

i j

  • ×

×

  • ×

×

  • Analysis

Time: Only O(log( LCE

τ )) fingerprint comparisons each taking time O(τ).

Hence query time O

  • τ log

LCE

τ

  • .

Space: O n

τ

  • .

46 / 56

slide-47
SLIDE 47

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing?

47 / 56

slide-48
SLIDE 48

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

48 / 56

slide-49
SLIDE 49

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j

49 / 56

slide-50
SLIDE 50

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j . . . this cuts down the number of fingerprints we need to check!

50 / 56

slide-51
SLIDE 51

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j . . . this cuts down the number of fingerprints we need to check! General idea: For each ℓ ≥ 0 in increasing order, check that for all i, j, φ(T[i . . . i + τ·2ℓ − 1]) = φ(T[jτ . . . jτ + τ·2ℓ − 1]) iff T[i . . . i + τ·2ℓ − 1] = T[jτ . . . jτ + τ·2ℓ − 1] T =

φ(T[i . . . i + τ · 2ℓ − 1])

T =

51 / 56

slide-52
SLIDE 52

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j . . . this cuts down the number of fingerprints we need to check! General idea: For each ℓ ≥ 0 in increasing order, check that for all i, j, φ(T[i . . . i + τ·2ℓ − 1]) = φ(T[jτ . . . jτ + τ·2ℓ − 1]) iff T[i . . . i + τ·2ℓ − 1] = T[jτ . . . jτ + τ·2ℓ − 1] T =

φ(T[i . . . i + τ · 2ℓ − 1]) = φ(T[jτ . . . jτ + τ · 2ℓ − 1])

T =

52 / 56

slide-53
SLIDE 53

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j . . . this cuts down the number of fingerprints we need to check! General idea: For each ℓ ≥ 0 in increasing order, check that for all i, j, φ(T[i . . . i + τ·2ℓ − 1]) = φ(T[jτ . . . jτ + τ·2ℓ − 1]) iff T[i . . . i + τ·2ℓ − 1] = T[jτ . . . jτ + τ·2ℓ − 1] T =

φ(T[i . . . i + τ · 2ℓ − 1]) = φ(T[jτ . . . jτ + τ · 2ℓ − 1]) φ(T[i . . . i + τ · 2ℓ−1 − 1])

?

= φ(T[jτ . . . jτ + τ · 2ℓ−1 − 1])

T =

53 / 56

slide-54
SLIDE 54

A Randomized Solution (Las Vegas)

Question: Can we verify that φ is collision free during preprocessing? Challenge: Doing this quickly while using O( n

τ ) space.

Observation: Whenever we compare two fingerprints, we can ensure that one of them is of the form T[jτ . . . jτ + τ · 2ℓ − 1] for some ℓ, j . . . this cuts down the number of fingerprints we need to check! General idea: For each ℓ ≥ 0 in increasing order, check that for all i, j, φ(T[i . . . i + τ·2ℓ − 1]) = φ(T[jτ . . . jτ + τ·2ℓ − 1]) iff T[i . . . i + τ·2ℓ − 1] = T[jτ . . . jτ + τ·2ℓ − 1] T =

φ(T[i . . . i + τ · 2ℓ − 1]) = φ(T[jτ . . . jτ + τ · 2ℓ − 1]) φ(T[i + τ · 2ℓ−1 . . . i + τ · 2ℓ − 1])

?

= φ(T[jτ + τ · 2ℓ−1 . . . jτ + τ · 2ℓ − 1])

T =

54 / 56

slide-55
SLIDE 55

Conclusions

We gave three time-space trade-offs for LCE on a single string:

◮ A deterministic solution

◮ O(τ) query time ◮ O(n/√τ) space (even during preprocessing) ◮ O(n2/√τ) preprocessing time

◮ A Monte-Carlo solution

◮ O (τ log (LCE(i, j)/τ)) query time (correct with high prob.) ◮ O(n/τ) space (even during preprocessing) ◮ O(n) preprocessing time.

◮ A Las-Vegas solution

◮ O (τ log (LCE(i, j)/τ)) query time (correct with certainty) ◮ O(n/τ) space (even during preprocessing) ◮ O(n log n) preprocessing time with high prob. 55 / 56

slide-56
SLIDE 56

Conclusions

We gave three time-space trade-offs for LCE on two strings:

◮ A deterministic solution

◮ O(τ) query time ◮ O(n/τ + m/√τ) space (even during preprocessing) ◮ O(nm/√τ) preprocessing time

◮ A Monte-Carlo solution

◮ O (τ log (LCE(i, j)/τ)) query time (correct with high prob.) ◮ O((n + m)/τ) space (even during preprocessing) ◮ O(n) preprocessing time.

◮ A Las-Vegas solution

◮ O (τ log (LCE(i, j)/τ)) query time (correct with certainty) ◮ O((n + m)/τ) space (even during preprocessing) ◮ O(n log n) preprocessing time with high prob. 56 / 56