Statistical Encoding of Succinct Data Structures alez 1 Gonzalo - PowerPoint PPT Presentation

Outline Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz´ 1 Department of Computer Science Universidad de Chile Combinatorial Pattern Matching, 2006 Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Outline Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Previous work In recent work, Sadakane and Grossi [SODA’06] introduced a scheme to represent any sequence S using n nH k ( S ) + O ( log σ n (( k + 1 ) log σ + log log n )) bits of space. The representation permits us to extract any substring of size Θ( log σ n ) in constant time, and thus it completely replaces S under the RAM model. This permits converting any succinct structure using o ( n log σ ) bits of space on top of S , into a compressed structure using nH k ( S ) + o ( n log σ ) bits overall, for any k = o ( log σ n ) . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Motivation Our work We extend previous works, by obtaining slightly better space complexity and the same time complexity using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. We prove some results on the applicability of the scheme for full-text self-indexing. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Definition rank 1 ( S , i ) = number of ones in S [ 1 . . . i ] . Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Example: a simple rank structure rank 1 ( S , 14 ) = 5 + 1 + 1. Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary Outline Background 1 Motivation The k -th order empirical entropy Statistical encoding Entropy-bound succinct data structure 2 Idea Data structures Decoding Algorithm Space requirement Supporting appends Application to full-text indexing 3 Succinct full-text self-indexes The Burrows-Wheeler Transform The wavelet tree Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Background Motivation Entropy-bound succinct data structure k -th order empirical entropy Application to full-text indexing Statistical encoding Summary The k -th order empirical entropy Definition The empirical entropy is defined for any string S and can be used to measure the performance of compression algorithms without any assumption on the input. The k -th order empirical entropy captures the dependence of symbols upon their context. For k ≥ 0, nH k ( S ) provides a lower bound to the output of any compressor that considers a context of size k to encode every symbol of S . H k ( S ) = 1 � | w S | H 0 ( w S ) . (1) n w ∈ Σ k Gonz´ alez, Navarro Statistical Encoding of Succinct Data Structures

Statistical Encoding of Succinct Data Structures alez 1 Gonzalo - PowerPoint PPT Presentation

Outline Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz 1 Department of Computer Science Universidad de Chile Combinatorial Pattern Matching, 2006 Gonz alez, Navarro Statistical Encoding of Succinct

In-memory processing of big data via succinct data structures Rajeev Raman University of

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

in Succinct Games Hesam Nikpey Pooya Shati Social and Economical Networks Dr. Fazli Spring

Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

PI is not at least as succinct as MODS Nikolay Kaleyski July 7, 2017 Nikolay Kaleyski PI is not

Combinatorial entropy and succinct data structures Gilles Schaeffer based in part on joined

Succinct Data Structures for NLP-at-Scale Matthias Petri Trevor Cohn Computing and Information

Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew

Succinct Data Structures for Retrieval and Approximate Membership Martin Dietzfelbinger

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Presenting a live 90-minute webinar with interactive Q&A Statistics in Employment Class

Understanding the Effects of Real-World Behavior in Statistical Disclosure Attacks Simon Oya ,

International Conference on Physical Protection of Nuclear Material and Nuclear Facilities Vienna,

IIT Mumbai First and Last Leg Optimization 127 203 179 212 255 255 175 215 149 195

A Digital Research Platform for the Semantic Reconstruction of Giacomo Leopardis Zibaldone

Recommended Practices for the design of business surveys questionnaires Stefania Macchia

Data Analysis, New Knowledge, and then What? Perspectives on Mobilizing Computable Biomedical

Text-to-Speech synthesis using OpenMARY An introduction and practical tutorial Marc Schrder,

Statistical Encoding of Succinct Data Structures alez 1 Gonzalo - PowerPoint PPT Presentation

Outline Statistical Encoding of Succinct Data Structures alez 1 Gonzalo Navarro 1 Rodrigo Gonz 1 Department of Computer Science Universidad de Chile Combinatorial Pattern Matching, 2006 Gonz alez, Navarro Statistical Encoding of Succinct

In-memory processing of big data via succinct data structures Rajeev Raman University of

FUNCTIONALLY OBLIVIOUS (AND SUCCINCT) Edward Kmett BUILDING BETTER TOOLS Cache-Oblivious

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

in Succinct Games Hesam Nikpey Pooya Shati Social and Economical Networks Dr. Fazli Spring

Succinct 2D Dictionary Matching with No Slowdown Shoshana Neuburger and Dina Sokol City

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

PI is not at least as succinct as MODS Nikolay Kaleyski July 7, 2017 Nikolay Kaleyski PI is not

Combinatorial entropy and succinct data structures Gilles Schaeffer based in part on joined

Succinct Data Structures for NLP-at-Scale Matthias Petri Trevor Cohn Computing and Information

Succinct Trie Indexes Made Practical Huanchen Zhang David G. Andersen, Michael Kaminsky, Andrew

Succinct Data Structures for Retrieval and Approximate Membership Martin Dietzfelbinger

Deep Encode: Machine Learning for Per-Title Encoding Daniel Silhavy| IBC20| Per-Title Encoding

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

1 Comparison of Encoding Comparison of Encoding Schemes (1) Schemes (2) Signal Spectrum

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Presenting a live 90-minute webinar with interactive Q&amp;A Statistics in Employment Class

Understanding the Effects of Real-World Behavior in Statistical Disclosure Attacks Simon Oya ,

International Conference on Physical Protection of Nuclear Material and Nuclear Facilities Vienna,

IIT Mumbai First and Last Leg Optimization 127 203 179 212 255 255 175 215 149 195

A Digital Research Platform for the Semantic Reconstruction of Giacomo Leopardis Zibaldone

Recommended Practices for the design of business surveys questionnaires Stefania Macchia

Data Analysis, New Knowledge, and then What? Perspectives on Mobilizing Computable Biomedical

Text-to-Speech synthesis using OpenMARY An introduction and practical tutorial Marc Schrder,

Presenting a live 90-minute webinar with interactive Q&A Statistics in Employment Class