6.975 Week 5 Universal Compression Via Grammar Based Codes - - PowerPoint PPT Presentation

6 975 week 5 universal compression via grammar based
SMART_READER_LITE
LIVE PREVIEW

6.975 Week 5 Universal Compression Via Grammar Based Codes - - PowerPoint PPT Presentation

6.975 Week 5 Universal Compression Via Grammar Based Codes Presenter: Emin Martinian Grammar Based Compression Initial data may contain complex relationships. Transform data to a basis with independent components. Use simple,


slide-1
SLIDE 1

6.975 Week 5 Universal Compression Via Grammar Based Codes Presenter: Emin Martinian

slide-2
SLIDE 2

Grammar Based Compression

  • Initial data may contain complex relationships.
  • Transform data to a “basis” with independent components.
  • Use simple, memoryless compression on these components.
slide-3
SLIDE 3

Example:

Suppose we want to compress x = c c c c a b a b c c c a b a b c c c a b. Let A1 → a b, x = c c c c A1 A1 c c c A1 A1 c c c A1 A2 → c c c, x = A2 c A1 A1 A2 A1 A1 A2 A1 A3 → A1 A1 A2, x = A2 c A3 A3 A1

slide-4
SLIDE 4

Gx vs. Lempel-Ziv Transform:

x = ccc, c, ababccc, ababccc, ab c, cc, ca, b, a, bc, cca, ba, bcc, cab A0 → A1A2A3A4A5A6A7A8A9A10 A1 → c A2 → A1c A0 → A2cA3A3A1 A3 → A1a A1 → ab A4 → b A2 → ccc A5 → a A3 → A1A1A2 A6 → A4c A7 → A2a A8 → A4b A9 → A6c A10 → A3b

slide-5
SLIDE 5

Context Free Grammar: G = (V, T, P, S)

V ={A0, A1, A2, A3} T ={a, b, c} P ={A0 → A2cA3A3A1, A1 → ab, A2 → ccc, A3 → A1A1A2} S =A0 L(G) = all strings derivable from G. Grammar Transform: x → Gx where L(Gx) = {x}.

slide-6
SLIDE 6

Advantages of Grammar Based Codes

  • Better matching of source correlations
  • Optimization for complexity, causality, side information, error

resilience, etc.

  • Universal lossless compression
slide-7
SLIDE 7

Asymptotically Compact Grammars

Asymptotically compact grammars defined as grammars which satisfy

  • ∀x, Gx ∈ G∗(A)
  • limn→∞ maxx∈An |Gx|

|x| = 0

Theorem 7: asymptotically compact grammars yield universal compression.

slide-8
SLIDE 8

Requirements for the set G∗(A)

  • 1. ∀A ∈ V (G), one rule in P(G) has left member A.
  • 2. The empty string is not the right member of any rule.
  • 3. L(G) is non-empty
  • 4. G has no useless symbols.
  • 5. Canonical variable naming.
  • 6. f ∞

G (A) = f ∞ G (B) for A = B.

slide-9
SLIDE 9

Irreducible Grammar Transforms:

A grammar, G, is called irreducible if

  • 1. G ∈ G∗(A)
  • 2. ∀A ∈ V (G)/A0, A appears at least twice in the right members
  • f P(G).
  • 3. No (Y1, Y2) ∈ V (G) ∪ T(G) exists where Y1Y2 appears more

than once as a substring of P(G). Kieffer and Yang present rules to reduce any grammar to one satisfying these conditions.

slide-10
SLIDE 10

Encoding G = ( V, T, P, S )

  • Canonical V (G) described by |V (G)| and requires |V (G)| bits

in unary encoding.

  • T ∈ P(A) described by |A| bits in one-hot encoding.
  • S = A0 in canonical encoding and requires 0 bits.

Total = |V (G)| +|A| +0

slide-11
SLIDE 11

Encoding G = ( V, T, P, S )

To encode P we must describe fG(A0), fG(A1), . . . , fG(A|V (G)|−1)

  • r equivalently we must describe

|A0|, |A1|, . . . , |A|V (G)|−1|. using |G| bits in unary encoding and ρG

= fG(A0) fG(A1) . . . , fG(A|V (G)|−1).

slide-12
SLIDE 12

Encoding G = ( V, T, P, S )

Instead of encoding ρG directly, define ωG

= ρG with first occurence of each variable removed. Encode ρG by

  • Indicating removed entries (|G| bits)
  • Sending frequencies of V (G) ∪ T(G) occuring in ωG using

unary encoding (|G| bits)

  • Using frequencies to entropy code ωG (⌈H∗(ωG)⌉ bits)

Total ≤ A + 4|G| + ⌈H∗(ωG)⌉

slide-13
SLIDE 13

Bounding ⌈H∗(ωG)⌉ for G ∈ G∗(A)

  • There exists a σ = σ1σ2 . . . σt ∼ ωG, with f ∞

G (σ) = x.

  • Let π be the parsing π = (f ∞

G (σ1), f ∞ G (σ2), . . . , f ∞ G (σt)) = x.

  • If ∀(A → α) ∈ P, |α| > 1, then f ∞

G (·) is a one-to-one map

between σi and πi so H∗(ωG) = H∗(σ) = H∗(π). In any case, H∗(ωG) ≤ H∗(π) + |G|.

slide-14
SLIDE 14

Bounding ⌈H∗(π)⌉

Consider a kth-order finite state source, µ, and define τ(y)

= max

s0

  • s1,s2,...,sm

m

  • i=1

p(si, yi|si−1) We design τ(y) to overestimate the probability y. To get a valid pdf, we normalize by Qkk−1|y|−2 to obtain p∗(y) = Qkk−1|y|−2τ(y), Qk ≥ 1/2.

slide-15
SLIDE 15

Bounding ⌈H∗(π)⌉ (Continued)

Combining H∗(π) = min

q t

  • i=1

− log q(πi) ≤

t

  • i=1

− log p∗(πi) with µ(x) ≤ τ(x) ≤

t

  • i=1

τ(πi) = t

  • i=1

p∗(πi) t

  • i=1

{2k|πi|2}

  • yields

H∗(π) ≤ − log µ(x) + t(1 + log k) + 2

t

  • i=1

log |πi|.

slide-16
SLIDE 16

Summary For Encoding Gx:

  • Code length ≤ − log µ(x) + |A| + 5|Gx| + O
  • |Gx|

|x| log |x| |Gx|

  • .
  • Many parsings have H∗(π) near − log µ(x) + O
  • ν
  • |Gx|

|x|

  • .
  • Obtaining universal codes requires choosing a

parsing/grammar to make |Gx|

|x| small.

slide-17
SLIDE 17

Bounding |Gx|/|x| for Gx ∈ G∗(A)

  • Consider “worst case” Gx ∈ G∗(A) which maximizes |Gx|.
  • But for Gx ∈ G∗(A), rule expansions must be unique.
  • So there are at most |A|l rules expanding to length l.
  • Create all rules of length l before any of length l + 1.
slide-18
SLIDE 18

Bounding |Gx|/|x| for Gx ∈ G∗(A)

Exhausting all rules of length ≤ l requires |x| ≥

l

  • j=1

j|A|j ≥ l|A|l+1 (|A| − 1)2 . For Gx ∈ G∗(A), rules are like Ai → Ai′α (i.e., |Ai| = 2). |Gx| ≤

l

  • j=1

2|A|j = 2(|A|l+1 − 1) |A| − 1 . Therefore |Gx| |x| ≤ 2(|A|l+1 − 1) |A| − 1 · (|A| − 1)2 l|A|l+1 ≤ 2(|A| − 1) l → 0.

slide-19
SLIDE 19

Encoding Summary:

  • Grammar encoding takes ≤ A + 4|G| + ⌈H∗(ωG)⌉ bits.
  • H∗(ωG) ≈ H∗(π) ≈ − log µ(x) + ν
  • |Gx|

|x|

  • For G ∈ G∗(A), |Gx|

|x| → 0.

slide-20
SLIDE 20

Conclusions

  • Grammar based codes provide a framework to build universal

codes.

  • Many different parsings, π = (π1, π2, . . . , πt), yield

H∗(π) = O (− log µ(x) + t).

  • Irreducible grammars yield π with t ≤ |Gx| and |Gx|

|x| → 0 and

also allow efficient encoding of π.

slide-21
SLIDE 21

Further Thoughts...

Can grammar ideas be used in

  • universal lossy compression?
  • universal prediction/estimation?
  • error correction coding?