string indexing in the word ram model part 3
play

String indexing in the Word RAM model, part 3 Pawe Gawrychowski - PowerPoint PPT Presentation

String indexing in the Word RAM model, part 3 Pawe Gawrychowski University of Wrocaw & Max-Planck-Institut fr Informatik Pawe Gawrychowski String indexing in the Word RAM model III 1 / 30 We want to reduce the space usage. The goal


  1. String indexing in the Word RAM model, part 3 Paweł Gawrychowski University of Wrocław & Max-Planck-Institut für Informatik Paweł Gawrychowski String indexing in the Word RAM model III 1 / 30

  2. We want to reduce the space usage. The goal will be to construct a structure of size ( 1 + 1 n ǫ ) n + O ( log log n ) allowing answering any lookup ( i ) in O ( log ǫ n ) time, for any ǫ ∈ ( 0 , 1 ] . Idea We had ℓ = log log n levels of recursion. Now we will try to simulate jumping ǫℓ levels at once, so that we only have to store 1 ǫ levels. Paweł Gawrychowski String indexing in the Word RAM model III 2 / 30

  3. We want to reduce the space usage. The goal will be to construct a structure of size ( 1 + 1 n ǫ ) n + O ( log log n ) allowing answering any lookup ( i ) in O ( log ǫ n ) time, for any ǫ ∈ ( 0 , 1 ] . Idea We had ℓ = log log n levels of recursion. Now we will try to simulate jumping ǫℓ levels at once, so that we only have to store 1 ǫ levels. Paweł Gawrychowski String indexing in the Word RAM model III 2 / 30

  4. The first step is to replace Ψ k with Φ k . � j if SA k [ j ] = SA k [ i ] + 1 Φ k ( i ) = 1 if SA k [ i ] = n k So, we store the successor for every SA k [ i ] . Now if we store all Φ k ( i ) in a list, then computing Φ k ( i ) is really taking the i th element of a list, and the vectors B k are no longer necessary. Paweł Gawrychowski String indexing in the Word RAM model III 3 / 30

  5. The first step is to replace Ψ k with Φ k . � j if SA k [ j ] = SA k [ i ] + 1 Φ k ( i ) = 1 if SA k [ i ] = n k So, we store the successor for every SA k [ i ] . Now if we store all Φ k ( i ) in a list, then computing Φ k ( i ) is really taking the i th element of a list, and the vectors B k are no longer necessary. Paweł Gawrychowski String indexing in the Word RAM model III 3 / 30

  6. Lemma n Φ 0 can be stored in n + O ( log log n ) bits, so that accessing any entry takes O ( 1 ) time. Lemma 1 n For k > 0, Φ k can be stored in n ( 1 + 2 k − 1 ) + O ( 2 k log log n ) bits, so that accessing any entry takes O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 4 / 30

  7. Lemma n Φ 0 can be stored in n + O ( log log n ) bits, so that accessing any entry takes O ( 1 ) time. Lemma 1 n For k > 0, Φ k can be stored in n ( 1 + 2 k − 1 ) + O ( 2 k log log n ) bits, so that accessing any entry takes O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 4 / 30

  8. Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30

  9. Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30

  10. Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30

  11. But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

  12. But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

  13. But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

  14. But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

  15. But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

  16. Succinct dictionaries Pagh 2001 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in B + O ( log log U ) + o ( n ) bits of space, so that a membership query can be answered in O ( 1 ) time. � �� � U where B = . We can also add O ( 1 ) time rank queries, but it log 2 n requires a little bit of work. We will see a (small fragment) of a much weaker result. Brodnik and Munro 1999 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in O ( B ) bits of space, so that a membership query can be answered in O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 7 / 30

  17. Succinct dictionaries Pagh 2001 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in B + O ( log log U ) + o ( n ) bits of space, so that a membership query can be answered in O ( 1 ) time. � �� � U where B = . We can also add O ( 1 ) time rank queries, but it log 2 n requires a little bit of work. We will see a (small fragment) of a much weaker result. Brodnik and Munro 1999 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in O ( B ) bits of space, so that a membership query can be answered in O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 7 / 30

  18. We allow O ( B ) = O ( n log U n ) bits of space. We can clearly encode the whole set in such space, but the question is whether we can answer a membership query efficiently! Let r = U n . We consider four cases: very sparse r ∈ [ U ǫ , ∞ ] , then we have O ( n log U ) space, so we can explicitly list of the elements. We use some form of perfect hashing. moderately sparse r ∈ [ log λ U , U ǫ ] , see the next slide. moderately dense r ∈ [ 1 α , log λ U ] , complicated! dense r ∈ [ 2 , 1 α ] , then we can use O ( U ) bits of space, so we store a bitmap. Paweł Gawrychowski String indexing in the Word RAM model III 8 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend