String indexing in the Word RAM model, part 3 Pawe Gawrychowski - PowerPoint PPT Presentation

String indexing in the Word RAM model, part 3 Paweł Gawrychowski University of Wrocław & Max-Planck-Institut für Informatik Paweł Gawrychowski String indexing in the Word RAM model III 1 / 30

We want to reduce the space usage. The goal will be to construct a structure of size ( 1 + 1 n ǫ ) n + O ( log log n ) allowing answering any lookup ( i ) in O ( log ǫ n ) time, for any ǫ ∈ ( 0 , 1 ] . Idea We had ℓ = log log n levels of recursion. Now we will try to simulate jumping ǫℓ levels at once, so that we only have to store 1 ǫ levels. Paweł Gawrychowski String indexing in the Word RAM model III 2 / 30

The first step is to replace Ψ k with Φ k . � j if SA k [ j ] = SA k [ i ] + 1 Φ k ( i ) = 1 if SA k [ i ] = n k So, we store the successor for every SA k [ i ] . Now if we store all Φ k ( i ) in a list, then computing Φ k ( i ) is really taking the i th element of a list, and the vectors B k are no longer necessary. Paweł Gawrychowski String indexing in the Word RAM model III 3 / 30

Lemma n Φ 0 can be stored in n + O ( log log n ) bits, so that accessing any entry takes O ( 1 ) time. Lemma 1 n For k > 0, Φ k can be stored in n ( 1 + 2 k − 1 ) + O ( 2 k log log n ) bits, so that accessing any entry takes O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 4 / 30

Now to determine SA [ i ] = SA 0 [ i ] , we use Ψ 0 to walk along indices i , i ′ , i ′′ , ... such that SA 0 [ i ] + 1 = SA 0 [ i ′ ] , SA 0 [ i ′ ] + 1 = SA 0 [ i ′′ ] , ... until we reach an index stored in SA 1 . But how to detect this? Succinct dictionary A bit vector B [ 1 .. n ] , where only n ′ elements are ones, can be stored in � n � O ( log ) bits, so that a lookup and a rank take O ( 1 ) time. n ′ So we store all i such that SA 0 [ i ] is visible by 2 ǫℓ in such succinct dictionary. The length of such walk is at most 2 ǫℓ = O ( log ǫ n ) . Paweł Gawrychowski String indexing in the Word RAM model III 5 / 30

But what is the space bound? n log n n 1 1 � + n + O ( log log n ) + n ( 1 + 2 k − 1 + O ( 2 k log log n )) 2 ℓ k = i ǫℓ, 0 < i <ǫ − 1 plus the space taken by the succinct dictionaries, which is O ( n ǫℓ ℓ ) = O ( n log log n log ǫ n ) , so we get the claimed space complexity. The space taken by the dictionaries is bounded as follows: � n � for k = 0, O ( log ) , 1 n ǫℓ n k ǫℓ � � generally at the k th super level we need O ( log ) . 2 n ( k + 1 ) ǫℓ which is O ( n k ǫℓ ǫℓ ) . Paweł Gawrychowski String indexing in the Word RAM model III 6 / 30

Succinct dictionaries Pagh 2001 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in B + O ( log log U ) + o ( n ) bits of space, so that a membership query can be answered in O ( 1 ) time. � �� U where B = . We can also add O ( 1 ) time rank queries, but it log 2 n requires a little bit of work. We will see a (small fragment) of a much weaker result. Brodnik and Munro 1999 A static dictionary storing a subset of [ 1 , U ] of size n can be stored in O ( B ) bits of space, so that a membership query can be answered in O ( 1 ) time. Paweł Gawrychowski String indexing in the Word RAM model III 7 / 30

We allow O ( B ) = O ( n log U n ) bits of space. We can clearly encode the whole set in such space, but the question is whether we can answer a membership query efficiently! Let r = U n . We consider four cases: very sparse r ∈ [ U ǫ , ∞ ] , then we have O ( n log U ) space, so we can explicitly list of the elements. We use some form of perfect hashing. moderately sparse r ∈ [ log λ U , U ǫ ] , see the next slide. moderately dense r ∈ [ 1 α , log λ U ] , complicated! dense r ∈ [ 2 , 1 α ] , then we can use O ( U ) bits of space, so we store a bitmap. Paweł Gawrychowski String indexing in the Word RAM model III 8 / 30

String indexing in the Word RAM model, part 3 Pawe Gawrychowski - PowerPoint PPT Presentation

String indexing in the Word RAM model, part 3 Pawe Gawrychowski University of Wrocaw & Max-Planck-Institut fr Informatik Pawe Gawrychowski String indexing in the Word RAM model III 1 / 30 We want to reduce the space usage. The goal

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

String indexing in the Word RAM model, part 4 Pawe Gawrychowski University of Wrocaw &

String indexing in the Word RAM model, part 2 Pawe Gawrychowski University of Wrocaw &

The String Class Trace Code Constructing a String String s = "Java"; String

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

String Objectives Discuss string handling System.String class

EMS RAM PUMPS EMS RAM PUMPS INDUSTRIES LTD INDUSTRIES LTD Press ENTER to continue EMS

Random Access Memory (RAM) Key features RAM is traditionally packaged as a chip.

String Indexing for Patterns with Wildcards Philip Bille 1 , Inge Li Grtz 1 , Hjalte Wedel

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

Jo Journey of In Indonesia Accounting Standards Now committed towards our journey to 2018

APNA 30th Annual Conference Session 3015: October 21, 2016 Psyc hia tric Nurse s a nd Se xua l

Dynamic Session Key Agreement Brian Weis Overview IEEE 802.1AE-2006 (MACsec) IEEE

Compiler Design July 2004 Page 1 of 100 S. Arun-Kumar Go Back sak@cse.iitd.ernet.in

The Epoch of Disk Settling: z ~ 1 to Today Susan Kassin (NPP Fellow, NASA Goddard), Ben Weiner

Bisimilarities Induced by Relations on Home Page Actions Title Page S.

Stirling Alloa Kincardine Railway Brief History of the Route 1850 1968 Alloa

Standard and Natural Policy Gradients for Discounted Rewards Aaron Mishkin August 8, 2020 UBC

String indexing in the Word RAM model, part 3 Pawe Gawrychowski - PowerPoint PPT Presentation

String indexing in the Word RAM model, part 3 Pawe Gawrychowski University of Wrocaw & Max-Planck-Institut fr Informatik Pawe Gawrychowski String indexing in the Word RAM model III 1 / 30 We want to reduce the space usage. The goal

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

String indexing in the Word RAM model, part 4 Pawe Gawrychowski University of Wrocaw &amp;

String indexing in the Word RAM model, part 2 Pawe Gawrychowski University of Wrocaw &amp;

The String Class Trace Code Constructing a String String s = &quot;Java&quot;; String

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

String Objectives Discuss string handling System.String class

EMS RAM PUMPS EMS RAM PUMPS INDUSTRIES LTD INDUSTRIES LTD Press ENTER to continue EMS

Random Access Memory (RAM) Key features RAM is traditionally packaged as a chip.

String Indexing for Patterns with Wildcards Philip Bille 1 , Inge Li Grtz 1 , Hjalte Wedel

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

Jo Journey of In Indonesia Accounting Standards Now committed towards our journey to 2018

APNA 30th Annual Conference Session 3015: October 21, 2016 Psyc hia tric Nurse s a nd Se xua l

Dynamic Session Key Agreement Brian Weis Overview IEEE 802.1AE-2006 (MACsec) IEEE

Compiler Design July 2004 Page 1 of 100 S. Arun-Kumar Go Back sak@cse.iitd.ernet.in

The Epoch of Disk Settling: z ~ 1 to Today Susan Kassin (NPP Fellow, NASA Goddard), Ben Weiner

Bisimilarities Induced by Relations on Home Page Actions Title Page S.

Stirling Alloa Kincardine Railway Brief History of the Route 1850 1968 Alloa

Standard and Natural Policy Gradients for Discounted Rewards Aaron Mishkin August 8, 2020 UBC

String indexing in the Word RAM model, part 4 Pawe Gawrychowski University of Wrocaw &

String indexing in the Word RAM model, part 2 Pawe Gawrychowski University of Wrocaw &

The String Class Trace Code Constructing a String String s = "Java"; String