SLIDE 1
6.975 Week 5 Universal Compression Via Grammar Based Codes Presenter: Emin Martinian
SLIDE 2 Grammar Based Compression
- Initial data may contain complex relationships.
- Transform data to a “basis” with independent components.
- Use simple, memoryless compression on these components.
SLIDE 3
Example:
Suppose we want to compress x = c c c c a b a b c c c a b a b c c c a b. Let A1 → a b, x = c c c c A1 A1 c c c A1 A1 c c c A1 A2 → c c c, x = A2 c A1 A1 A2 A1 A1 A2 A1 A3 → A1 A1 A2, x = A2 c A3 A3 A1
SLIDE 4
Gx vs. Lempel-Ziv Transform:
x = ccc, c, ababccc, ababccc, ab c, cc, ca, b, a, bc, cca, ba, bcc, cab A0 → A1A2A3A4A5A6A7A8A9A10 A1 → c A2 → A1c A0 → A2cA3A3A1 A3 → A1a A1 → ab A4 → b A2 → ccc A5 → a A3 → A1A1A2 A6 → A4c A7 → A2a A8 → A4b A9 → A6c A10 → A3b
SLIDE 5
Context Free Grammar: G = (V, T, P, S)
V ={A0, A1, A2, A3} T ={a, b, c} P ={A0 → A2cA3A3A1, A1 → ab, A2 → ccc, A3 → A1A1A2} S =A0 L(G) = all strings derivable from G. Grammar Transform: x → Gx where L(Gx) = {x}.
SLIDE 6 Advantages of Grammar Based Codes
- Better matching of source correlations
- Optimization for complexity, causality, side information, error
resilience, etc.
- Universal lossless compression
SLIDE 7 Asymptotically Compact Grammars
Asymptotically compact grammars defined as grammars which satisfy
- ∀x, Gx ∈ G∗(A)
- limn→∞ maxx∈An |Gx|
|x| = 0
Theorem 7: asymptotically compact grammars yield universal compression.
SLIDE 8 Requirements for the set G∗(A)
- 1. ∀A ∈ V (G), one rule in P(G) has left member A.
- 2. The empty string is not the right member of any rule.
- 3. L(G) is non-empty
- 4. G has no useless symbols.
- 5. Canonical variable naming.
- 6. f ∞
G (A) = f ∞ G (B) for A = B.
SLIDE 9 Irreducible Grammar Transforms:
A grammar, G, is called irreducible if
- 1. G ∈ G∗(A)
- 2. ∀A ∈ V (G)/A0, A appears at least twice in the right members
- f P(G).
- 3. No (Y1, Y2) ∈ V (G) ∪ T(G) exists where Y1Y2 appears more
than once as a substring of P(G). Kieffer and Yang present rules to reduce any grammar to one satisfying these conditions.
SLIDE 10 Encoding G = ( V, T, P, S )
- Canonical V (G) described by |V (G)| and requires |V (G)| bits
in unary encoding.
- T ∈ P(A) described by |A| bits in one-hot encoding.
- S = A0 in canonical encoding and requires 0 bits.
Total = |V (G)| +|A| +0
SLIDE 11 Encoding G = ( V, T, P, S )
To encode P we must describe fG(A0), fG(A1), . . . , fG(A|V (G)|−1)
- r equivalently we must describe
|A0|, |A1|, . . . , |A|V (G)|−1|. using |G| bits in unary encoding and ρG
∆
= fG(A0) fG(A1) . . . , fG(A|V (G)|−1).
SLIDE 12 Encoding G = ( V, T, P, S )
Instead of encoding ρG directly, define ωG
∆
= ρG with first occurence of each variable removed. Encode ρG by
- Indicating removed entries (|G| bits)
- Sending frequencies of V (G) ∪ T(G) occuring in ωG using
unary encoding (|G| bits)
- Using frequencies to entropy code ωG (⌈H∗(ωG)⌉ bits)
Total ≤ A + 4|G| + ⌈H∗(ωG)⌉
SLIDE 13 Bounding ⌈H∗(ωG)⌉ for G ∈ G∗(A)
- There exists a σ = σ1σ2 . . . σt ∼ ωG, with f ∞
G (σ) = x.
- Let π be the parsing π = (f ∞
G (σ1), f ∞ G (σ2), . . . , f ∞ G (σt)) = x.
- If ∀(A → α) ∈ P, |α| > 1, then f ∞
G (·) is a one-to-one map
between σi and πi so H∗(ωG) = H∗(σ) = H∗(π). In any case, H∗(ωG) ≤ H∗(π) + |G|.
SLIDE 14 Bounding ⌈H∗(π)⌉
Consider a kth-order finite state source, µ, and define τ(y)
∆
= max
s0
m
p(si, yi|si−1) We design τ(y) to overestimate the probability y. To get a valid pdf, we normalize by Qkk−1|y|−2 to obtain p∗(y) = Qkk−1|y|−2τ(y), Qk ≥ 1/2.
SLIDE 15 Bounding ⌈H∗(π)⌉ (Continued)
Combining H∗(π) = min
q t
− log q(πi) ≤
t
− log p∗(πi) with µ(x) ≤ τ(x) ≤
t
τ(πi) = t
p∗(πi) t
{2k|πi|2}
H∗(π) ≤ − log µ(x) + t(1 + log k) + 2
t
log |πi|.
SLIDE 16 Summary For Encoding Gx:
- Code length ≤ − log µ(x) + |A| + 5|Gx| + O
- |Gx|
|x| log |x| |Gx|
- .
- Many parsings have H∗(π) near − log µ(x) + O
- ν
- |Gx|
|x|
- .
- Obtaining universal codes requires choosing a
parsing/grammar to make |Gx|
|x| small.
SLIDE 17 Bounding |Gx|/|x| for Gx ∈ G∗(A)
- Consider “worst case” Gx ∈ G∗(A) which maximizes |Gx|.
- But for Gx ∈ G∗(A), rule expansions must be unique.
- So there are at most |A|l rules expanding to length l.
- Create all rules of length l before any of length l + 1.
SLIDE 18 Bounding |Gx|/|x| for Gx ∈ G∗(A)
Exhausting all rules of length ≤ l requires |x| ≥
l
j|A|j ≥ l|A|l+1 (|A| − 1)2 . For Gx ∈ G∗(A), rules are like Ai → Ai′α (i.e., |Ai| = 2). |Gx| ≤
l
2|A|j = 2(|A|l+1 − 1) |A| − 1 . Therefore |Gx| |x| ≤ 2(|A|l+1 − 1) |A| − 1 · (|A| − 1)2 l|A|l+1 ≤ 2(|A| − 1) l → 0.
SLIDE 19 Encoding Summary:
- Grammar encoding takes ≤ A + 4|G| + ⌈H∗(ωG)⌉ bits.
- H∗(ωG) ≈ H∗(π) ≈ − log µ(x) + ν
- |Gx|
|x|
|x| → 0.
SLIDE 20 Conclusions
- Grammar based codes provide a framework to build universal
codes.
- Many different parsings, π = (π1, π2, . . . , πt), yield
H∗(π) = O (− log µ(x) + t).
- Irreducible grammars yield π with t ≤ |Gx| and |Gx|
|x| → 0 and
also allow efficient encoding of π.
SLIDE 21 Further Thoughts...
Can grammar ideas be used in
- universal lossy compression?
- universal prediction/estimation?
- error correction coding?