statistical encoding of succinct data structures
play

Statistical Encoding of Succinct Data Structures alez 1 Gonzalo - PowerPoint PPT Presentation

Outline Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz 1 Department of Computer Science Universidad de Chile Combinatorial Pattern Matching, 2006 Gonz alez, Navarro Statistical Encoding of Succinct


  1. Outline Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz´ 1 Department of Computer Science Universidad de Chile Combinatorial Pattern Matching, 2006 Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  2. Outline Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  3. Outline Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  4. Outline Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  5. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  6. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Previous work In recent work, Sadakane and Grossi [SODA’06] introduced a scheme to represent any sequence S using n nH k ( S ) + O ( log σ n (( k + 1 ) log σ + log log n )) bits of space. The representation permits us to extract any substring of size Θ( log σ n ) in constant time, and thus it completely replaces S under the RAM model. This permits converting any succinct structure using o ( n log σ ) bits of space on top of S , into a compressed structure using nH k ( S ) + o ( n log σ ) bits overall, for any k = o ( log σ n ) . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  7. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Previous work In recent work, Sadakane and Grossi [SODA’06] introduced a scheme to represent any sequence S using n nH k ( S ) + O ( log σ n (( k + 1 ) log σ + log log n )) bits of space. The representation permits us to extract any substring of size Θ( log σ n ) in constant time, and thus it completely replaces S under the RAM model. This permits converting any succinct structure using o ( n log σ ) bits of space on top of S , into a compressed structure using nH k ( S ) + o ( n log σ ) bits overall, for any k = o ( log σ n ) . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  8. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Previous work In recent work, Sadakane and Grossi [SODA’06] introduced a scheme to represent any sequence S using n nH k ( S ) + O ( log σ n (( k + 1 ) log σ + log log n )) bits of space. The representation permits us to extract any substring of size Θ( log σ n ) in constant time, and thus it completely replaces S under the RAM model. This permits converting any succinct structure using o ( n log σ ) bits of space on top of S , into a compressed structure using nH k ( S ) + o ( n log σ ) bits overall, for any k = o ( log σ n ) . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  9. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Our work We extend previous works, by obtaining slightly better space complexity and the same time complexity using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. We prove some results on the applicability of the scheme for full-text self-indexing. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  10. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Our work We extend previous works, by obtaining slightly better space complexity and the same time complexity using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. We prove some results on the applicability of the scheme for full-text self-indexing. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  11. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Our work We extend previous works, by obtaining slightly better space complexity and the same time complexity using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. We prove some results on the applicability of the scheme for full-text self-indexing. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  12. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Definition rank 1 ( S , i ) = number of ones in S [ 1 . . . i ] . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  13. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  14. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  15. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  16. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  17. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  18. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure rank 1 ( S , 14 ) = 5 + 1 + 1. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  19. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

  20. Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary The k -th order empirical entropy Definition The empirical entropy is defined for any string S and can be used to measure the performance of compression algorithms without any assumption on the input. The k -th order empirical entropy captures the dependence of symbols upon their context. For k ≥ 0, nH k ( S ) provides a lower bound to the output of any compressor that considers a context of size k to encode every symbol of S . H k ( S ) = 1 � | w S | H 0 ( w S ) . (1) n w ∈ Σ k Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend