 
              Time-Space Trade-Offs for Longest Common Extensions Philip Bille 1 , Inge Li Gørtz 1 , Benjamin Sach 2 , and Hjalte Wedel Vildhøj 1 1 Technical University of Denmark, DTU Informatics, { phbi,ilg,hwvi } @imm.dtu.dk 2 University of Warwick, Department of Computer Science, sach@dcs.warwick.ac.uk CPM 2012, Helsinki July 4, 2012 1 / 56
The Longest Common Extension Problem Definition Problem: Preprocess a string T of length n to support LCE queries: ◮ LCE ( i , j ) = The length of the longest common prefix of the suffixes starting at position i and j in T . Example 1 2 3 4 5 6 7 T b a n a n a s LCE ( 2 , 4 ) = ? = 2 / 56
The Longest Common Extension Problem Definition Problem: Preprocess a string T of length n to support LCE queries: ◮ LCE ( i , j ) = The length of the longest common prefix of the suffixes starting at position i and j in T . Example 1 2 3 4 5 6 7 T b a n a n a s LCE ( 2 , 4 ) = ? = a n a s a n a n a s 3 / 56
The Longest Common Extension Problem Definition Problem: Preprocess a string T of length n to support LCE queries: ◮ LCE ( i , j ) = The length of the longest common prefix of the suffixes starting at position i and j in T . Example 1 2 3 4 5 6 7 T b a n a n a s LCE ( 2 , 4 ) = 3 = a n a s a n a n a s 4 / 56
The Longest Common Extension Problem Definition Problem: Preprocess a string T of length n to support LCE queries: ◮ LCE ( i , j ) = The length of the longest common prefix of the suffixes starting at position i and j in T . Example 1 2 3 4 5 6 7 T b a n a n a s LCE ( 2 , 5 ) = 0 = n a s a n a n a s 5 / 56
The Longest Common Extension Problem Definition Problem: Preprocess a string T of length n to support LCE queries: ◮ LCE ( i , j ) = The length of the longest common prefix of the suffixes starting at position i and j in T . Example 1 2 3 4 5 6 7 T b a n a n a s LCE ( 2 , 5 ) = 0 = n a s a n a n a s ◮ We assume that the input is given in read-only memory and is not included in the space complexity. 6 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = = i j 7 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 1 = i j 8 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 2 = i j 9 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = i j 10 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = Time: O ( n ) i j Space: O ( 1 ) 11 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = Time: O ( n ) i j Space: O ( 1 ) #2: Store the suffix tree bananas na a s 1 7 nas na s s 6 3 5 nas s 2 4 12 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = Time: O ( n ) i j Space: O ( 1 ) #2: Store the suffix tree bananas na a s 1 7 nas na s s NCA ( 2 , 4 ) 6 3 5 nas s LCE ( i , j ) = | NCA ( i , j ) | = 3 2 4 13 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = Time: O ( n ) i j Space: O ( 1 ) #2: Store the suffix tree bananas na a s 1 7 nas na s s Time: O ( 1 ) Space: O ( n ) NCA ( 2 , 4 ) 6 3 5 nas s LCE ( i , j ) = | NCA ( i , j ) | = 3 2 4 14 / 56
Two Simple Solutions #1: Store nothing 1 2 3 4 5 6 7 T b a n a n a s LCE ( i , j ) = 3 = Time: O ( n ) i j Space: O ( 1 ) #2: Store the suffix tree bananas Trade-off? na a s 1 7 nas na s s Time: O ( 1 ) Space: O ( n ) NCA ( 2 , 4 ) 6 3 5 nas s LCE ( i , j ) = | NCA ( i , j ) | = 3 2 4 15 / 56
Our Results Store nothing Time: O ( n ) Space: O ( 1 ) Less space Faster Trade-off? Time: O ( 1 ) Space: O ( n ) Store suffix tree 16 / 56
Our Results Trade-off parameter τ , 1 ≤ τ ≤ n Store nothing Time: O ( n ) Space: O ( 1 ) Randomized � � �� LCE ( i , j ) Time: O τ log Less space τ � n � Faster Space: O τ Trade-off? Time: O ( τ ) � � n Space: O √ τ Deterministic Time: O ( 1 ) Space: O ( n ) Store suffix tree 17 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c 18 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 19 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 20 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 21 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 22 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 23 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 24 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c j i 25 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c Difference Covers A difference cover modulo τ is a set of integers D ⊆ { 0 , 1 , . . . , τ − 1 } such that for any distance d ∈ { 0 , 1 , . . . , τ − 1 } , D contains two elements separated by distance d modulo τ . Ex: The set D = { 1 , 2 , 4 } is a difference cover modulo 5. 4 4 d 0 1 2 3 4 3 3 0 2 i , j 1 , 1 2 , 1 1 , 4 4 , 1 1 , 2 2 1 1 26 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c D D D D Difference Covers A difference cover modulo τ is a set of integers D ⊆ { 0 , 1 , . . . , τ − 1 } such that for any distance d ∈ { 0 , 1 , . . . , τ − 1 } , D contains two elements separated by distance d modulo τ . Ex: The set D = { 1 , 2 , 4 } is a difference cover modulo 5. 4 4 d 0 1 2 3 4 3 3 0 2 i , j 1 , 1 2 , 1 1 , 4 4 , 1 1 , 2 2 1 1 27 / 56
A Deterministic Solution Idea: Store a subset of the n suffixes in a compacted trie. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c D D D D Lemma (Colbourn and Ling 1 ) √ 1 . 5 τ + 6 can be For any τ , a difference cover modulo τ of size at most computed in O ( √ τ ) time. Analysis Time: O ( τ ) � n � � n � Space: O ( #stored suffixes ) = O τ | D | = O √ τ 1 C. J. Colbourn and A. C. Ling. Quorums from difference covers. Inf. Process. Lett. 75(1-2):9–12, 2000 28 / 56
A Randomized Solution (Monte Carlo) Rabin-Karp Fingerprints Let p be a sufficiently large prime and choose b ∈ Z p uniformly at random. | S | S [ k ] b k mod p . � φ ( S ) = k = 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c 29 / 56
A Randomized Solution (Monte Carlo) Rabin-Karp Fingerprints Let p be a sufficiently large prime and choose b ∈ Z p uniformly at random. | S | S [ k ] b k mod p . � φ ( S ) = k = 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2 = φ ( T [ 2 . . . 7 ]) = 1 b 1 + 2 b 2 + 0 b 3 + 0 b 4 + 1 b 5 + 2 b 6 mod p 30 / 56
A Randomized Solution (Monte Carlo) Rabin-Karp Fingerprints Let p be a sufficiently large prime and choose b ∈ Z p uniformly at random. | S | S [ k ] b k mod p . � φ ( S ) = k = 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2 = φ ( T [ 2 . . . 7 ]) = 1 b 1 + 2 b 2 + 0 b 3 + 0 b 4 + 1 b 5 + 2 b 6 mod p Crucial property: With high probability φ is collision-free on substrings of T , i.e., φ ( S 1 ) = φ ( S 2 ) iff S 1 = S 2 . 31 / 56
A Randomized Solution (Monte Carlo) Rabin-Karp Fingerprints Let p be a sufficiently large prime and choose b ∈ Z p uniformly at random. | S | S [ k ] b k mod p . � φ ( S ) = k = 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 T = d b c a a b c a b c a a b c a c 3 1 2 0 0 1 2 0 1 2 0 0 1 2 0 2 = φ ( T [ 2 . . . 7 ]) = 1 b 1 + 2 b 2 + 0 b 3 + 0 b 4 + 1 b 5 + 2 b 6 mod p Crucial property: With high probability φ is collision-free on substrings of T , i.e., φ ( S 1 ) = φ ( S 2 ) iff S 1 = S 2 . Also important: φ ( T [ i . . . j + 1 ]) can be computed from φ ( T [ i . . . j ]) in O ( 1 ) time. 32 / 56
Recommend
More recommend