Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. - PowerPoint PPT Presentation

Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. Szpankowski Inria – Purdue U the material is made available under the CC-BY-4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode

Lempel Ziv algorithm • Among the 10 most daily used algorithms – Unix, gif, pdf, etc

Huge literature in IT and algorithm community D. Aldous, and P. Shields, A Diffusion Limit for a Class of Random-Growing Binary • Trees, Probab. Th. Rel. Fields, 1988. Merhav, Universal Coding with Minimum Probability of Codeword Length • Overflow, IEEE Trans. Information Theory, 1991 P. Jacquet, W. Szpankowski, Asymptotic behavior of the Lempel-Ziv parsing scheme • and digital search trees. Theoretical Computer Science, 1995 W Schachinger On the variance of a class of inductive valuations of data structures • for digital search, Theoretical computer science, 1995 N. Merhav, and J. Ziv, On the Amount of Statistical Side Information Required for • Lossy Data Compression, IEEE Trans. Information Theory, 1997 R. Neininger and L. Rüschendorf, A General Limit Theorem for Recursive • Algorithms and Combinatorial Structures, The Annals of Applied Probability, 2004 J. Fayolle, M. D. Ward, Analysis of the average depth in a suffix tree under a • Markov model, DMTCS, 2005. K. Leckey, R. Neininger and W. Szpankowski, Towards More Realistic Probabilistic • Models for Data Structures: The External Path Length in Tries under the Markov Model, Algorithms, SODA, 2013

LZ compression process • A text • is fragmented into phrases (not grammatical). • Each phrase replaced by a short code (#+symbol)

Phrase breaking process • The next phrase is the longest copy of a previously seen phrase 1 2 3 • Plus an extra symbol 1 2 3 a • The code of the new phrase is 2+a • Final code sequence 0+a 1+b 1+a 2+a

Breaking process via Digital Search Trees • Build the DST of the current phrases • Use the path made by the remaining text to find the next phrase a

Two models • The DST “ m ” model • The LZ “ n ” model – m independent infinite – A text of length n broken strings inserted in a DST into LZ phrases – L m the path length – M n number of phrases 1 2 3 …… a

Equivalence of DST m and LZ n models • When the text source is memoryless the two models are equivalent – backward independence: the current DST and the rest of the text are independent Jacquet, P., & Szpankowski, W. (1995). Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees. Theoretical Computer Science, 144(1-2), 161-197.

The memoryless source on m model • Infinite Text is from a memoryless source • Tractable because phrases are independent – P. Jacquet, W. Szpankowski, Asymptotic behavior of the Lempel- Ziv parsing scheme and digital search trees. Theoretical Computer Science, 1995 • For m phrases the proportion of covered text L m – Tends to be normal when 𝑛 → ∞ – Mean 𝐹 𝑀 ! = ℓ 𝑛 = ! " log 𝑛 + 𝛾(𝑛) with 𝛾 𝑛 = 𝑃(1) – Variance 𝑤𝑏𝑠 𝑀 ! = ! " 𝑤(𝑛) with 𝑤(𝑛) = 𝑃(log 𝑛)

The probability generating function and the non linear differential equation • Let ∑ !,# 𝑄 𝑀 ! = 𝑜 𝑣 # $ # !! = 𝑄(𝑨, 𝑣) 𝜖 𝜖𝑨 𝑄 𝑨, 𝑣 = 𝑄 𝑞 ! 𝑣𝑨, 𝑣 𝑄(𝑞 " 𝑣𝑨, 𝑣) exp − 𝑦 $ 𝑄 𝑀 # − 𝐹[𝑀 # ] 1 ∈ [𝑦, 𝑦 + 𝑒𝑦[ → 𝑒𝑦 2 2𝜌 𝑤𝑏𝑠(𝑀 # )

From phrase to text compression • Number of phrases M n – Using renewal: 𝑄 𝑁 = > 𝑛 = 𝑄 𝑀 > < 𝑜 • Asymptotically normal • Mean 𝐹 𝑁 % = ℓ &' 𝑜 + 𝑃 𝑜 ( , 𝑥𝑗𝑢ℎ 𝜀 > 1/2 𝐹 𝑁 % ~ ℎ𝑜 log 𝑜 𝑤 ℓ &' 𝑜 • variance 𝑜 𝑤𝑏𝑠 𝑁 % ~ $ = 𝑃 log $ 𝑜 ℓ ) ℓ &' 𝑜 – Compression rate: 𝐷 = = (log 𝑜 + log 𝐵) ? ! = • Average redundancy 𝐹 𝐷 % − ℎ~ℎ log 𝐵 − 𝛾 ℓ &' 𝑜 1 = 𝑃 log 𝑜 log 𝑜

DST m model and LZ n model no longer equivalent for markovian text • A Markovian generation incurs dependencies time forward and time backward correlation bbaababbaababaaababbaababababbabbaababbbbaabaababbaaaabbabbbabbbbba

Our result about LZ compression performance on a Markovian text • The number of phrase ∀𝜀 > 1/2 𝐹 𝑁 % = ℓ &' 𝑜 + 𝑃 𝑜 ( , 𝑥𝑗𝑢ℎ ℓ 𝑛 ~𝑛 log 𝑛 ℎ 𝑤𝑏𝑠 𝑁 % = 𝑃(𝑜 $( ) • The distribution of first symbol in phrases is determined and does NOT converge to the stationary distribution of Markov. • Redundancy satisfies 1 𝐹 𝐷 % = 𝑃 log 𝑜

The main top difficulty • The DST m model and LZ n model are non longer equivalent • We need

How far can we go with the m model on markovian sources a b • Classic markovian source – One must track the initial symbol ! = 𝑜) ! 𝑄 = 𝑄 𝑀 # = 𝑜 𝑏𝑚𝑚 𝑡𝑢𝑏𝑠𝑢𝑡 𝑥𝑗𝑢ℎ 𝑏 = 𝑄(𝑀 # #,% 𝜖 𝜖𝑨 𝑄 ! 𝑨, 𝑣 = 𝑄 ! (𝑞 !! 𝑣𝑨, 𝑣)𝑄 " (𝑞 !" 𝑣𝑨, 𝑣) – Path length asymptotically normal = 𝑛 ! 𝐹 𝑀 # ℎ (log 𝑛 + 𝛾 ! (𝑛)) ! 𝑤𝑏𝑠 𝑀 # = 𝑛𝑤 ! 𝑛 = 𝑃(𝑛 log 𝑛) • Jacquet, P., Szpankowski, W., & Tang, J. (2001). Average profile of the Lempel-Ziv parsing scheme for a Markovian source. Algorithmica, 31(3), 318-360.

̅ ̅ m model basic results • Asymptotically indifferent of first symbol 𝛾 + 𝑛 = 𝛾 𝑛 + 𝑃(𝑛 &, ) 𝛾 𝑛 = 𝛾 + 𝑄 - (log 𝑛) – with 𝑄 ! (. ) periodic when the transition matrix is rational, 𝛾 𝑛 = 𝛾 , otherwise

Extended m model with tail symbol • The tail symbol is the next symbol after insertion in the DST a b – It would be the first symbol of the next phrase in the n model – T m number of tail symbols equal to “ a ” b + 𝑄 = 𝑄 𝑈 # = 𝑙 & 𝑀 # = 𝑜 𝑏𝑚𝑚 𝑡𝑢𝑏𝑠𝑢 𝑥𝑗𝑢ℎ 𝑑) 𝑑 ∈ {𝑏, 𝑐} #,.,% 𝑣 % 𝑤 . 𝑨 # + 𝑄 + 𝑨, 𝑣, 𝑤 = ] 𝑄 #,.,% 𝑛! #,.,% 𝜖 𝜖𝑨 𝑄 + 𝑨, 𝑣, 𝑤 = 𝑞 +! 𝑤 + 𝑞 +" 𝑄 ! 𝑞 +! 𝑣𝑨, 𝑣, 𝑤 𝑄 " (𝑞 +" 𝑣𝑨, 𝑣, 𝑤)

Extended m model analytical results • Refining the techniques of the previous m models (limited to binary alphabet) # , 𝑈 # ) is asymptotically normal – (𝑀 " " # = 𝑛𝜐 # (𝑛) , 𝜐 # 𝑛 = 𝜐 𝑛 + 𝑃(𝑛 $% ) – 𝐹 𝑈 " • 𝜐 𝑛 = ̅ 𝜐 + 𝑄 _ (log 𝑛) with P 1 (.) periodic when the transition matrix is rational, 𝜐 𝑛 = ̅ 𝜐 , otherwise • Notice the asymptotic tail symbol distribution is NOT the Markov stationary distribution # , 𝑈 # – 𝑑𝑝𝑤 𝑀 " = 𝑃(𝑛 log 𝑛) "

The remaining very hard nut to crack • Coming back to the n model – Remember : DST m model and LZ n model are NOT equivalent with Markov sources

How will it be if m and n models were equivalent for Markov • LZ n model: let 𝒬 ",' = 𝑄(𝑛 𝑔𝑗𝑠𝑡𝑢 𝑞ℎ𝑠𝑏𝑡𝑓𝑡 ℎ𝑏𝑤𝑓 𝑢𝑝𝑢𝑏𝑚 𝑚𝑓𝑜𝑕𝑢ℎ 𝑜) • With memoryless sources we have 𝒬 ",' = 𝑄 ",' because m model and n models are equivalent • For a Markov source a convolution from initial symbol and tail symbols? ! " 𝒬 #,% = ] 𝑄 𝑄 # ! ,.,% ! #&# ! ,# ! &.,%&% ! a b # ! ,.,% ! • But this is wrong! b

What is failing in the transition DST to LZ? • Carving phrases in the text a b a b b b a 𝜏 = (𝑏, 𝑐, 𝑏, 𝑐, 𝑐, 𝑐) • Arranging phrase in a DST a a b a a b b b a b b b 𝜏 ! = 𝑏, 𝑐, 𝑐 . 𝜏 " = 𝑏, 𝑐, 𝑐

Enumerating permutations in the n and m models • Let σ a permutation of m symbols – σ indicates the sequence of tail symbols in the text (n model). 𝒬 0,% = 𝑄(𝑛 𝑔𝑗𝑠𝑡𝑢 𝑢𝑏𝑗𝑚 𝑡𝑧𝑛𝑐𝑝𝑚 𝑔𝑝𝑚𝑚𝑝𝑥 𝜏 & 𝑑𝑝𝑤𝑓𝑠 𝑚𝑓𝑜𝑕𝑢ℎ 𝑜) – 𝜏 ! indicates the tail symbol sequence in DST c-subtree (m model) + 𝑄 = 𝑄 𝐸𝑇𝑈 𝑢𝑏𝑗𝑚 𝑡𝑧𝑛𝑐𝑝𝑚 𝑔𝑝𝑚𝑚𝑝𝑥 𝜏 & 𝑞𝑏𝑢ℎ 𝑚𝑓𝑜𝑕𝑢ℎ 𝑗𝑡 𝑜 𝑡𝑓𝑟𝑣𝑓𝑜𝑑𝑓𝑡 𝑡𝑢𝑏𝑠𝑢 𝑥𝑗𝑢ℎ 𝑑) 0,% ! ! ",$ = ∑ % &" 𝒬 = ∑ % &", % / &' 𝑄 – We have 𝒬 %,$ and 𝑄 ",',$ %,$ – But we will see that don’t have the m-n convolution ! " 𝒬 #,% = ] 𝑄 𝑄 0 ! ,% ! 0 " ,%&% ! 0 ! 1|0 " |3# – In other words ! " 𝒬 #,% ≠ ] 𝑄 𝑄 # ! ,.,% ! #&# ! ,# ! &.,%&% ! # ! ,% ! ,.

The lost permutations • The following case is not feasible a a a b a b b a a a b b

Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. - PowerPoint PPT Presentation

Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. Szpankowski Inria Purdue U the material is made available under the CC-BY-4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode Lempel Ziv algorithm Among

Lempel- -Ziv Ziv- -Welch (LZW) Welch (LZW) Lempel Data Compressing Model Data Compressing

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Tribute to Jean Claude ZIV Jean Claude ZIV Jean Claude & CODATU In 1980, with two French

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

steepest descent O FF -L INE scheme LZ macro schemes ([Ziv, Lempel 77], [Ziv,

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Multigrid preconditioning for anisotropic positive semidefinite block Toeplitz systems Rainer

Phase Transition and Anisotropic Deformations of Neutron Star Matter Susan Nelmes Durham

A canonical basis for covering quantum groups Sean Clark Joint work with D. Hill and W. Wang

Dealing with Hamiltonian Structure: Challenges and Successes David S. Watkins

Partial ordering of inhomogeneous Markov chains with applications to Markov chain Monte Carlo

The Odds-algorithm based on sequential updating and its performance F.Thomas Bruss Guy Louchard

Phase Transition Phenomena in Integral Geometry Martin Lotz Warwick University joint work with

2BSDEs with Continuous Coefficients Dylan POSSAMAI Ecole Polytechnique Paris New advances in

Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. - PowerPoint PPT Presentation

Analysis of Lempel-Ziv 78 for Markov sources Ph Jacquet, W. Szpankowski Inria Purdue U the material is made available under the CC-BY-4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode Lempel Ziv algorithm Among

Lempel- -Ziv Ziv- -Welch (LZW) Welch (LZW) Lempel Data Compressing Model Data Compressing

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Tribute to Jean Claude ZIV Jean Claude ZIV Jean Claude &amp; CODATU In 1980, with two French

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

steepest descent O FF -L INE scheme LZ macro schemes ([Ziv, Lempel 77], [Ziv,

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Multigrid preconditioning for anisotropic positive semidefinite block Toeplitz systems Rainer

Phase Transition and Anisotropic Deformations of Neutron Star Matter Susan Nelmes Durham

A canonical basis for covering quantum groups Sean Clark Joint work with D. Hill and W. Wang

Dealing with Hamiltonian Structure: Challenges and Successes David S. Watkins

Partial ordering of inhomogeneous Markov chains with applications to Markov chain Monte Carlo

The Odds-algorithm based on sequential updating and its performance F.Thomas Bruss Guy Louchard

Phase Transition Phenomena in Integral Geometry Martin Lotz Warwick University joint work with

2BSDEs with Continuous Coefficients Dylan POSSAMAI Ecole Polytechnique Paris New advances in

Tribute to Jean Claude ZIV Jean Claude ZIV Jean Claude & CODATU In 1980, with two French