Smallest grammar by recompression Artur Je z Max Planck Institute - PowerPoint PPT Presentation

Smallest grammar by recompression Artur Je˙ z Max Planck Institute for Informatics 17.06.2013

Grammar based-compression Represent w as a CFG generating it. 17.06.2013 2/17

Grammar based-compression Represent w as a CFG generating it. Advantages it is usually small (at most quadratic vs. LZ) compression is fast it is exponential on good data 17.06.2013 2/17

Grammar based-compression Represent w as a CFG generating it. Advantages it is usually small (at most quadratic vs. LZ) compression is fast it is exponential on good data extracts hierarchical structure it is easy to work on 17.06.2013 2/17

Grammar based-compression Represent w as a CFG generating it. Advantages it is usually small (at most quadratic vs. LZ) compression is fast it is exponential on good data extracts hierarchical structure it is easy to work on related to LZW and LZ 17.06.2013 2/17

Smallest grammar Problem Given w return smallest CFG G w such that L ( G w ) = w . 17.06.2013 3/17

Smallest grammar Problem Given w return smallest CFG G w such that L ( G w ) = w . With O ( 1 ) increase in size, this is an SLP . Definition (SLP: Straight Line Programme) CFG with ordered nonterminals X 1 , X 2 , . . . Chomsky normal form for X i → X j X k we have j , k < i 17.06.2013 3/17

What is known Best approximation ratio O ( log ( n / g )) , where g is the size of the optimal grammar. 17.06.2013 4/17

What is known Best approximation ratio O ( log ( n / g )) , where g is the size of the optimal grammar. Rytter – represent w as LZ, size ℓ ≤ g – translation of LZ into SLP , size O ( ℓ log ( n /ℓ )) ≤ O ( g log ( n / g )) – the intermediate grammar is balanced (AVL-type condition) 17.06.2013 4/17

What is known Best approximation ratio O ( log ( n / g )) , where g is the size of the optimal grammar. Rytter – represent w as LZ, size ℓ ≤ g – translation of LZ into SLP , size O ( ℓ log ( n /ℓ )) ≤ O ( g log ( n / g )) – the intermediate grammar is balanced (AVL-type condition) Charikar et al.: – similar as Rytter – different balance criterion (length of word) 17.06.2013 4/17

What is known Best approximation ratio O ( log ( n / g )) , where g is the size of the optimal grammar. Rytter – represent w as LZ, size ℓ ≤ g – translation of LZ into SLP , size O ( ℓ log ( n /ℓ )) ≤ O ( g log ( n / g )) – the intermediate grammar is balanced (AVL-type condition) Charikar et al.: – similar as Rytter – different balance criterion (length of word) Sakamoto – local replacement rules (plus a global partition): pairs and blocks – analysis vs LZ 17.06.2013 4/17

What is known Best approximation ratio O ( log ( n / g )) , where g is the size of the optimal grammar. Rytter – represent w as LZ, size ℓ ≤ g – translation of LZ into SLP , size O ( ℓ log ( n /ℓ )) ≤ O ( g log ( n / g )) – the intermediate grammar is balanced (AVL-type condition) Charikar et al.: – similar as Rytter – different balance criterion (length of word) Sakamoto – local replacement rules (plus a global partition): pairs and blocks – analysis vs LZ Linear time. 17.06.2013 4/17

This talk Very simple linear-time algorithm, O ( log ( n / g )) approximation. 17.06.2013 5/17

This talk Very simple linear-time algorithm, O ( log ( n / g )) approximation. analysis in the recompression framework, vs. SLP – very robust – good: easier to show better approximation? – bad: might be in fact larger 17.06.2013 5/17

This talk Very simple linear-time algorithm, O ( log ( n / g )) approximation. analysis in the recompression framework, vs. SLP – very robust – good: easier to show better approximation? – bad: might be in fact larger not balanced – good: easier to show approximation? – bad: worse for further processing 17.06.2013 5/17

This talk Very simple linear-time algorithm, O ( log ( n / g )) approximation. analysis in the recompression framework, vs. SLP – very robust – good: easier to show better approximation? – bad: might be in fact larger not balanced – good: easier to show approximation? – bad: worse for further processing height O ( log n ) , when a ℓ has height 1 17.06.2013 5/17

This talk Very simple linear-time algorithm, O ( log ( n / g )) approximation. analysis in the recompression framework, vs. SLP – very robust – good: easier to show better approximation? – bad: might be in fact larger not balanced – good: easier to show approximation? – bad: worse for further processing height O ( log n ) , when a ℓ has height 1 Algorithm similar to Sakamoto, different analysis. 17.06.2013 5/17

Example a a a b a b c a b a b b a b c b a 17.06.2013 6/17

Example a 3 b a b c a b a b b a b c b a a 3 → a 3 17.06.2013 6/17

Example a 3 b a b c a b a b 2 a b c b a a 3 → a 3 , b 2 → b 2 17.06.2013 6/17

Example a 3 b d c d a b 2 d c b a a 3 → a 3 , b 2 → b 2 , d → ab 17.06.2013 6/17

Example a 3 b d c d a b 2 d c e a 3 → a 3 , b 2 → b 2 , d → ab, e → ba 17.06.2013 6/17

Example a 3 b d c d a b 2 d c e a 3 → a 3 , b 2 → b 2 , d → ab, e → ba Intuition Phases: compress only pairs and block from the beginning of a phase. Treat nonterminals as letters. To speed up, we make some pair compression simultaneously (partition Σ to Σ ℓ , Σ r , pairs from Σ ℓ Σ r ) 17.06.2013 6/17

Algorithm 1: while | T | > 1 do 17.06.2013 7/17

Algorithm 1: while | T | > 1 do L ← list of letters in T 2: for each a ∈ L do ⊲ Blocks compression 3: compress maximal blocks of a ⊲ O ( | T | ) 4: 17.06.2013 7/17

Algorithm 1: while | T | > 1 do L ← list of letters in T 2: for each a ∈ L do ⊲ Blocks compression 3: compress maximal blocks of a ⊲ O ( | T | ) 4: P ← list of pairs 5: find partition of Σ into Σ ℓ and Σ r 6: ⊲ Try to maximize the occurrences from Σ ℓ Σ r in T . 7: 17.06.2013 7/17

Algorithm 1: while | T | > 1 do L ← list of letters in T 2: for each a ∈ L do ⊲ Blocks compression 3: compress maximal blocks of a ⊲ O ( | T | ) 4: P ← list of pairs 5: find partition of Σ into Σ ℓ and Σ r 6: ⊲ Try to maximize the occurrences from Σ ℓ Σ r in T . 7: for ab ∈ P ∩ Σ ℓ Σ r do ⊲ These pairs do not overlap 8: compress pair ab ⊲ Pair compression 9: 17.06.2013 7/17

Algorithm 1: while | T | > 1 do L ← list of letters in T 2: for each a ∈ L do ⊲ Blocks compression 3: compress maximal blocks of a ⊲ O ( | T | ) 4: P ← list of pairs 5: find partition of Σ into Σ ℓ and Σ r 6: ⊲ Try to maximize the occurrences from Σ ℓ Σ r in T . 7: for ab ∈ P ∩ Σ ℓ Σ r do ⊲ These pairs do not overlap 8: compress pair ab ⊲ Pair compression 9: 10: return the constructed grammar 17.06.2013 7/17

Partition 1 / 4 appearances covered A partition Σ ℓ Σ r such that 1 / 4 of pairs is covered. 17.06.2013 8/17

Partition 1 / 4 appearances covered A partition Σ ℓ Σ r such that 1 / 4 of pairs is covered. After block compression aa does not appear. Random partition: 1 / 4 pairs can be covered. derandomise (expected value) we need number of appearances of ab : RadixSort O ( | T | ) . 17.06.2013 8/17

Size reduction Size drop Consider set of two consecutive letters ab in T . For 1 / 4 of them one letter is compressed in a phase. Length drops by a constant factor. 17.06.2013 9/17

Size reduction Size drop Consider set of two consecutive letters ab in T . For 1 / 4 of them one letter is compressed in a phase. – if a = b : it is compressed Length drops by a constant factor. 17.06.2013 9/17

Size reduction Size drop Consider set of two consecutive letters ab in T . For 1 / 4 of them one letter is compressed in a phase. – if a = b : it is compressed – if a � = b : 1 / 4 of those pairs is in Σ ℓ Σ r When we consider ab we replace it, unless one letter was already replaced. Length drops by a constant factor. 17.06.2013 9/17

Size reduction Size drop Consider set of two consecutive letters ab in T . For 1 / 4 of them one letter is compressed in a phase. – if a = b : it is compressed – if a � = b : 1 / 4 of those pairs is in Σ ℓ Σ r When we consider ab we replace it, unless one letter was already replaced. Length drops by a constant factor. Towards running time It is enough to show that one round runs in O ( | T | ) . 17.06.2013 9/17

Running time Partition O ( | T | ) time. Block compression By RadixSort, O ( | T | ) time. Pair compression By RadixSort, O ( | T | ) time. 17.06.2013 10/17

Number of nonterminals Representation cost 17.06.2013 11/17

Number of nonterminals Representation cost when c replaces ab we add rule c → ab , representation cost 1 17.06.2013 11/17

Number of nonterminals Representation cost when c replaces ab we add rule c → ab , representation cost 1 when a ℓ 1 , a ℓ 2 , . . . , a ℓ k are replaced with a ℓ 1 , a ℓ 2 , . . . , a ℓ k ( ℓ 1 < ℓ 2 . . . < ℓ k ): 17.06.2013 11/17

Smallest grammar by recompression Artur Je z Max Planck Institute - PowerPoint PPT Presentation

Smallest grammar by recompression Artur Je z Max Planck Institute for Informatics 17.06.2013 Grammar based-compression Represent w as a CFG generating it. 17.06.2013 2/17 Grammar based-compression Represent w as a CFG generating it.

Exact JPEG recompression and forensics using interval arithmetic Andrew B. Lewis and Markus G.

Working Together What does his future hold? Carres Grammar School Carres Grammar School

Fully compressed pattern matching by recompression Artur Je University of Wrocaw 9 VII 2012

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Computational Geometry Lecture 6: Smallest enclosing circles and more Computational Geometry

GRAMMAR THROUGH HUMOR BRANDY SHOOKS & WHITNEY SCHARER TEACHING GRAMMAR THROUGH HUMOR Having

General Context-Free Grammar Parsing: Application of grammar rewrite rules A phrase structure

Introduction to English Linguistics 4: Grammar and Syntax I Grammar and Syntax Grammar The

Grammar: The Heart of Numeracy 18 Nov, 2017 0B 2017 NNN2 Grammar: The Heart of Numeracy 1 0B

Introduction to English Linguistics 4: Grammar and Syntax Grammar and Syntax Grammar The rules

General Context-Free Grammar Parsing Application of grammar rewrite rules A phrase structure

Practical Evaluation of Fish Recompression Tools Bryan Fluech John Stevely Betty Staugler

Local recompression Word Equations and Beyond Artur Je Max Planck Institute for Informatics

Mapping the Design Space of a Recuperated, Recompression, Precompression Supercritical Carbon

Randome Variables and Expectation Example: Finding the k -Smallest Element in an ordered set.

Randome Variables and Expectation Example: Finding the k -Smallest Element in an ordered set.

Weathering the Headwinds to Canadas Economic Growth Greater Moncton Chamber of Commerce

Outline of the Presentation 1. Introduction 2. Literature Review a. Concrete Strength Assessment

2021 & 2025 Final LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le

PREDICTION OF COMPRESSION-AFTER-IMPACT (CAI) STRENGTH OF CFRP LAMINATED COMPOSITES J. Lee 1 *, C.

Chapter 5: Generalized Linear Models by Curtis Gary Dean, FCAS, MAAA, CFA Ball State University:

Extending Nearly-Linear Models Chiara Corsato, Renato Pelessoni and Paolo Vicig University of

CARD, DOBKIN AND MAESTAS (AER, 2008): THE EFFECT OF NEARLY UNIVERSAL INSURANCE COVERAGE ON HEALTH

Illustrating the Statistical Process with Regression Josh Tabor Daren Starnes Canyon del Oro

Smallest grammar by recompression Artur Je z Max Planck Institute - PowerPoint PPT Presentation

Smallest grammar by recompression Artur Je z Max Planck Institute for Informatics 17.06.2013 Grammar based-compression Represent w as a CFG generating it. 17.06.2013 2/17 Grammar based-compression Represent w as a CFG generating it.

Exact JPEG recompression and forensics using interval arithmetic Andrew B. Lewis and Markus G.

Working Together What does his future hold? Carres Grammar School Carres Grammar School

Fully compressed pattern matching by recompression Artur Je University of Wrocaw 9 VII 2012

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Computational Geometry Lecture 6: Smallest enclosing circles and more Computational Geometry

GRAMMAR THROUGH HUMOR BRANDY SHOOKS &amp; WHITNEY SCHARER TEACHING GRAMMAR THROUGH HUMOR Having

General Context-Free Grammar Parsing: Application of grammar rewrite rules A phrase structure

Introduction to English Linguistics 4: Grammar and Syntax I Grammar and Syntax Grammar The

Grammar: The Heart of Numeracy 18 Nov, 2017 0B 2017 NNN2 Grammar: The Heart of Numeracy 1 0B

Introduction to English Linguistics 4: Grammar and Syntax Grammar and Syntax Grammar The rules

General Context-Free Grammar Parsing Application of grammar rewrite rules A phrase structure

Practical Evaluation of Fish Recompression Tools Bryan Fluech John Stevely Betty Staugler

Local recompression Word Equations and Beyond Artur Je Max Planck Institute for Informatics

Mapping the Design Space of a Recuperated, Recompression, Precompression Supercritical Carbon

Randome Variables and Expectation Example: Finding the k -Smallest Element in an ordered set.

Randome Variables and Expectation Example: Finding the k -Smallest Element in an ordered set.

Weathering the Headwinds to Canadas Economic Growth Greater Moncton Chamber of Commerce

Outline of the Presentation 1. Introduction 2. Literature Review a. Concrete Strength Assessment

2021 &amp; 2025 Final LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le

PREDICTION OF COMPRESSION-AFTER-IMPACT (CAI) STRENGTH OF CFRP LAMINATED COMPOSITES J. Lee 1 *, C.

Chapter 5: Generalized Linear Models by Curtis Gary Dean, FCAS, MAAA, CFA Ball State University:

Extending Nearly-Linear Models Chiara Corsato, Renato Pelessoni and Paolo Vicig University of

CARD, DOBKIN AND MAESTAS (AER, 2008): THE EFFECT OF NEARLY UNIVERSAL INSURANCE COVERAGE ON HEALTH

Illustrating the Statistical Process with Regression Josh Tabor Daren Starnes Canyon del Oro

GRAMMAR THROUGH HUMOR BRANDY SHOOKS & WHITNEY SCHARER TEACHING GRAMMAR THROUGH HUMOR Having

2021 & 2025 Final LCR Study Results for LA Basin and San Diego-Imperial Valley Areas David Le