Applications of Random Coding and Algebraic Coding Theories to - - PowerPoint PPT Presentation

applications of random coding and algebraic coding
SMART_READER_LITE
LIVE PREVIEW

Applications of Random Coding and Algebraic Coding Theories to - - PowerPoint PPT Presentation

Universal Lossless Coding Performance Bounds 1 Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding Performance Bounds Gil I. Shamir Department of Electrical & Computer Engineering


slide-1
SLIDE 1

Universal Lossless Coding Performance Bounds 1 ✬ ✫ ✩ ✪

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding Performance Bounds

Gil I. Shamir Department of Electrical & Computer Engineering University of Utah Salt Lake City, UT 84112 U.S.A. DIMACS - 2003 Workshop on Algebraic Coding Theory and Information Theory DIMACS Center, Rutgers University, Piscataway, NJ December 15-18, 2003

slide-2
SLIDE 2

Universal Lossless Coding Performance Bounds 2 ✬ ✫ ✩ ✪

Overview

Research Problem

  • Average Case Universal Lossless Compression
  • Performance Lower Bounds (on Redundancy - best possible performance of any

scheme for a specific model)

Research Approach

  • Use Redundancy-Capacity Theorems to obtain bounds
  • Lower bound the relevant capacity for given source model

Models Discussed

  • finite number of parameters parametric sources
  • i.i.d. sources with large alphabets
  • patterns induced by i.i.d. sources
  • piecewise stationary sources
  • piecewise stationary sources with slowly varying statistics
  • switching sources
slide-3
SLIDE 3

Universal Lossless Coding Performance Bounds 3 ✬ ✫ ✩ ✪

Universal Coding and Redundancy

Problem Layout

  • A sequence xn of length n, governed by Pθ,
  • θ unknown in a known class Λ,
  • uniquely decipherable code L (·) may depend on Λ but independent of θ.
  • Unknown parameters cost redundancy.

Average Redundancy

  • f code L (·) for n-sequences drawn by source θ

Rn (L, θ)

= 1 nEθL (Xn) − Hθ (Xn)

  • Eθ - mean w.r.t. θ,
  • Hθ - per symbol entropy.

Average Universality Measure of a Class Λ

  • Maximin R−

n (Λ) and Minimax R+ n (Λ) average redundancies - best code for

some worst average (over xn) case. [Davisson, 1973]

  • Average redundancy for most sources [Rissanen, 1984] (strongest sense).
slide-4
SLIDE 4

Universal Lossless Coding Performance Bounds 4 ✬ ✫ ✩ ✪

Redundancy-Capacity Theorem

Weak Version [Implied from Davisson, 1973, Gallager, 1976]

Let n → ∞. Let ϕ be a set of M points θ in the class Λk, that are distinguishable by xn. Then, the minimax and maximin redundancies satisfy R+

n (Λk) = R− n (Λk) ≥ (1 − ε) log M

n

Strong Random Coding Version [Merhav & Feder, 1995, 1996]

Let n → ∞. Define a distribution over Λk, and partition most of the class Λε into disjoint countable sets ϕ, where the marginal of each θ ∈ ϕ is equal, and there are Mφ ≥ M sources in ϕ, distinguishable by xn. Then, Rn (L, θ) ≥ (1 − ε) log M n , for every code L (·), and almost every θ ∈ Λk.

Distinguishability

θ and θ′ distinguishable if xn generated by θ appears to be generated by θ′ with probability that goes to 0 and vice versa.

slide-5
SLIDE 5

Universal Lossless Coding Performance Bounds 5 ✬ ✫ ✩ ✪

Use of Redundancy-Capacity Theorem

Weak Version for Λk

  • 1. Demonstrate how to find ϕ.
  • 2. Lower bound M.
  • 3. Prove that all θ ∈ ϕ are distinguishable by xn.

Strong Version for Λk

  • 1. Demonstrate how to define most of the class Λε.
  • 2. Show that Λε is most of the class.
  • 3. Show how to partition Λε such that every source in Λε is in exactly one ϕ,

and sources in ϕ are uniformly distributed with the uniform prior on Λk. Lower bound M.

  • 4. Prove that for every valid ϕ, all θ ∈ ϕ are distinguishable by xn.

Compound Classes

If Λ =

k Λk, redundancy for θ ∈ Λk consists of Intra-class redundancy in Λk, and

Inter-class redundancy distinguishing Λk from Λ.

slide-6
SLIDE 6

Universal Lossless Coding Performance Bounds 6 ✬ ✫ ✩ ✪

Redundancy Capacity - Demo

Λ Λ Λ Λk Λ Λ Λ Λε

ε ε ε

θ ∈ ϕ ϕ ϕ ϕ1

1 1 1

θ ∈ ϕ ϕ ϕ ϕ2

2 2 2

θ ∈ ϕ ϕ ϕ ϕ3

3 3 3

Mϕ = 13 Mϕ = 10 Mϕ = 12

M = 10

  • The volume of Λk outside Λε assumed negligible.
  • Any θ is contained in a unique ϕ and has equal probability to other θ′ ∈ ϕ.
  • In every ϕ all points distinguishable by xn.

By theorem, for every code and almost every θ ∈ Λk, Rn (L, θ) ≥ (1 − ε) log 10 n

slide-7
SLIDE 7

Universal Lossless Coding Performance Bounds 7 ✬ ✫ ✩ ✪

Finite k-dimensional Parametric Sources

  • ϕ determined by initial shift u in a grid (one ϕ sufficient for maximin)
  • θ ∈ ϕ distinguishable if ϕ is a grid with spacing n−0.5(1−ε)

Rn (L, θ) ≥ (1 − ε) k 2 log n n for every code L (·) and almost every θ ∈ Λ [Rissanen, 1984] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X θ ∈ ϕ ϕ ϕ ϕ1

1 1 1

θ ∈ ϕ ϕ ϕ ϕ2

2 2 2

θ ∈ ϕ ϕ ϕ ϕ3

3 3 3

n-0.5(1-ε)

u2 u1 u1 u2 u3 Initial shifts that define set ϕ ϕ ϕ ϕ

slide-8
SLIDE 8

Universal Lossless Coding Performance Bounds 8 ✬ ✫ ✩ ✪

Distinguishability

Setting and Proof in most sources sense

  • Choose a random grid ϕ (as in random coding).
  • Generate xn by a given θ ∈ ϕ.
  • Let ˆ

θ be the Maximum Likelihood estimator of θ from xn.

  • Let ˆ

θg be the grid point whose components are nearest ˆ θ.

  • Prove that Pe = Pr

ˆ

θg = θ | θ

  • → 0 as n → ∞.

Use union bound on components of θ: Pe ≤

k

  • j=1

Pr

ˆ

θgi = θi

k

  • j=1

n · 2

−n·minxn∈Ai D

θi || Pθi

2(log k)+(log n)−cnε/2 → 0.

  • Ai - the event that ˆ

θgi = θi.

  • D

θi || Pθi

c n1−ε for ˆ

θ ∈ Ai, c is constant.

slide-9
SLIDE 9

Universal Lossless Coding Performance Bounds 9 ✬ ✫ ✩ ✪

I.I.D. Sources - Large Alphabet k - Minimax

[Shamir, 2003]

Problems with Large k

  • Volume of Λk is 1/(k − 1)! (decreases with n), because

k−1

  • i=1

θi ≤ 1.

  • Too large spacing in grid n−0.5(1−ε) results in loose bound.
  • Too small spacing (nk)−0.5(1−ε) results in lack of distinguishability in grids.

Solution

  • Build non-uniform grids.
  • Spacing near a

n proportional to √a n1−ε/2.

  • Number of grid points preceding a

n proportional to √a nε/2.

Drawback

  • This structure violates the requirements of the strong version, and thus is only

good for minimax/maximin redundancies.

slide-10
SLIDE 10

Universal Lossless Coding Performance Bounds 10 ✬ ✫ ✩ ✪

Minimax/Maximin Redundancy - I.I.D. Large k

  • ϕ is grid below,
  • θ ∈ ϕ distinguishable by above definition (proved as in finite parametric case),
  • bounding number of points in grid results in

R+

n (Λk) = R− n (Λk) ≥ (1 − ε) (k − 1)

2n log n k X X X X X X X X X X X X a0.5n-(1-ε/2) X X X X X X X X X X X X X a/n

slide-11
SLIDE 11

Universal Lossless Coding Performance Bounds 11 ✬ ✫ ✩ ✪

Most Sources - I.I.D. Large k

Key Realizations

  • Non-uniform grid above is not useful here.
  • All sources outside a k − 1 dimensional sphere with radius r = n−0.5(1−ε)

around θ are distinguishable from θ by xn.

Method

  • Pack as many as possible spheres with radius r and volume Vk−1(r) in the k − 1

dimensional space Λk of volume 1/(k − 1)!.

  • Place θ ∈ ϕ at centers of the spheres (whole grid shifted for random selection).
  • Factor in packing density 2−(k−1) to reduce number of points.

M ≥ 1 (k − 1)!Vk−1 (r) 2(k−1).

Result

Rn (L, θ) ≥ (1 − ε) (k − 1) 2n log n k for every code L(·) and almost every θ ∈ Λk. [Shamir, 2003] Note: Second order term is lower than that of minimax/maximin bound.

slide-12
SLIDE 12

Universal Lossless Coding Performance Bounds 12 ✬ ✫ ✩ ✪

Patterns Induced by I.I.D. Sources

Motivation

  • Classical compression considers known small alphabets.
  • Sometimes alphabet is unknown and possibly large.
  • Coding cost of unknown alphabet is inevitable.

Approach

  • Use the inevitable cost to improve compression.
  • Code sequence patterns in a second stage.

Patterns

  • Indices assigned to original sequence letters in order of first occurrence.
  • Example: The strings: xn = ‘lossless’, ‘sellsoll’, ‘12331433’, ‘76887288’ all

have the same pattern Ψ (xn) = ‘12331433’.

  • Individual sequence redundancy studied in [Aberg, et al., 1997, Orlitsky et al.,

2002-].

slide-13
SLIDE 13

Universal Lossless Coding Performance Bounds 13 ✬ ✫ ✩ ✪

I.I.D. Induced Patterns - Derivation

  • Any θ′ which is a permutation of θ appears to be the same source.

Example: typical sequences - similar patterns θ = {0.1, 0.2} θ′ = {0.7, 0.2} xn = 1223333333 xn = 3221111111 Ψ (xn) = 1223333333 Ψ (xn) = 1223333333

  • There are at most k! such permutations.

Remaining Space

1 1

1/2 1/2

1 X X

Same Types

X X

Same Types

Original Space Remaining Space k = 2 k = 3

Note: for k = 3 this is true for any combination of 2 out of 3 letters.

slide-14
SLIDE 14

Universal Lossless Coding Performance Bounds 14 ✬ ✫ ✩ ✪

Pattern Redundancy Bounds

  • The grid (in both maximin and most source cases) reduces

MΨ ≥ Mi.i.d. k!

  • For k ≥ n1/3 too many permutations eliminated more than once, but worst

smaller k can be assumed.

  • More sequences contribute to correct decision in the grid to allow

distinguishability.

Bounds [Shamir, 2003]

  • Average minimax lower bound

R+

n [Ψ (Λk)] ≥

    

k−1 2n log n1−ε k3 + k−1 2n log πe3 2 − O

  • log k

n

  • ,

for k ≤

  • πn1−ε

2

1/3

  • π

2

1/3 · (1.5 log e) · n−(2+ε)/3 − O

  • log n

n

  • ,

for k >

  • πn1−ε

2

1/3

  • Average most-sources lower bound

Rn [L, ψ (θ)] ≥

    

k−1 2n log n1−ε k3 − k−1 2n log 8π e3 − O

  • log k

n

  • ,

for k ≤ 1

2 ·

  • n1−ε

π

1/3

1.5 log e 2π1/3 · n−(2+ε)/3 − O

  • log n

n

  • ,

for k > 1

2 ·

  • n1−ε

π

1/3

slide-15
SLIDE 15

Universal Lossless Coding Performance Bounds 15 ✬ ✫ ✩ ✪

Piecewise Stationary Sources - PSS’s

Definition of PSS ψ

= (θ, t) ∈ Λq ⊂ Λ

  • PSS - emits data divided into independent stationary segments separated by

abrupt changes in statistics

  • Λ - nth order class of PSS’s (contains all possible combinations of the

k-dimensional parameters for n-sequences)

  • Λq - All PSS’s in Λ with q segments
  • θ

= {θ1, θ2, . . . , θq} - segmental parameters

  • t

= {t1, t2, . . . , tq−1} - transition path (TP)

Redundancy bound [Shamir, 2000]

Rn (L, ψ) ≥ (1 − ε)

1

2kq + q − 1

log(n/q)

n for every L (·), for almost every ψ ∈ Λq, for every q. in the minimax/maximin senses.

slide-16
SLIDE 16

Universal Lossless Coding Performance Bounds 16 ✬ ✫ ✩ ✪

Bound Derivation - PSS’s

Finite Number of Segments q

  • 1. Λε contains all ψ for which all segments long (longer than n1−ε/2), and all

transitions are large.

  • 2. Λε is most class for fixed q.
  • 3. Partition Λε into sets as follows:
  • Parse n-tuple to phrases of length l = n1−ε.
  • For all ψ ∈ ϕ, ∀i, ti is a point in the same phrase in a grid with spacing lε.
  • θi is a point in a grid as defined for stationary sources.
  • ∀ψ ∈ ϕ, ti and θi must be from grids with identical initial shifts.
  • 4. Distinguish among ψ ∈ ϕ as follows:
  • Use phrases entirely inside segments to estimate θi.
  • Given ˆ

θ, estimate transitions from respective grids. By definition of the grids, the bound for finite q [Merhav, 1993] results.

slide-17
SLIDE 17

Universal Lossless Coding Performance Bounds 17 ✬ ✫ ✩ ✪

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

ψ ∈ ϕ ϕ1 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only red points

  • 0.5(1-ε)
  • X

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

t1 is one grid point t2 is one grid point

PSS with q = 3

θ1 is one grid point θ2 is one grid point θ3 is one grid point

ψ ∈ ϕ ϕ2 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only blue points ψ ∈ ϕ ϕ3 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only green points ε

Phrase

Set ϕ i contains all combinations with

  • ne point from each
  • f the five grids
slide-18
SLIDE 18

Universal Lossless Coding Performance Bounds 18 ✬ ✫ ✩ ✪

General Bound Derivation - PSS’s

Large q

  • 1. Λε defined is not most class.
  • 2. For very large q, probability of error in at least one of the source parameters

significantly increases the overall error probability.

Solutions to Asymptotic Problems

  • 1. Λε contains sources for which most segments are long and most transitions

are large.

  • 2. Reduce sets ϕ to improve distinguishability for very large q.

Two different Cases

  • q ≫ n/q - almost similar to fixed q (modified according to modification 1

above).

  • q ≫ n/q - requires additional algebraic coding techniques for

distinguishability.

slide-19
SLIDE 19

Universal Lossless Coding Performance Bounds 19 ✬ ✫ ✩ ✪

General Bound Derivation - PSS’s, Cont.

Second Case: q ≫ n/q

  • Too many parameters.
  • Error in estimating one results in error in estimating ψ.

Solution - Reduce ϕ by Linear Block Codes:

  • Let η > 0 be arbitrarily small,
  • q′ - number of ‘free’ segmental parameters,
  • c′ - number of ‘free’ transition times.
  • (1 − η) q′ segmental parameters and (1 − η) c′ transitions chosen from grids.
  • Remaining parameters are parity checks.
  • Grids’ resolutions chosen to yield Galois Fields.
  • Each grid point is assigned an element in the proper Galois Field.
  • Codes designed to correct up to αηq′ errors (exist: Gilbert-Varshamov).

Guarantees distinguishability even for q ≫ n/q, resulting in the same asymptotic bound (ε is now larger).

slide-20
SLIDE 20

Universal Lossless Coding Performance Bounds 20 ✬ ✫ ✩ ✪

Additional Source Classes

PSS’s with Slowly Linearly Varying Statistics [Shamir, 2001]

  • q segments, transition duration of (n/q)α:

Rn (L, ψ) ≥ (1 − ε)

kq

2 + (q − 1)

  • 1 − α

2

log(n/q)

n

  • If durations unknown,

Rn (L, ψ) ≥ (1 − ε)

1

2kq + q − 1

log(n/q)

n Hierarchical version of redundancy-capacity for compound class must be used. Insignificant cost above PSS’s.

Switching Sources - s states [Shamir, 2001]

If s ≤ (n/q)0.5k(1−ε). Then, for every code L (·) and almost all sources Rn (L, ψ) ≥ (1 − ε) n

ks

2 log(n/s) + (q − 1) log(n/q) + (q − s) log s

  • Otherwise,

Rn (L, ψ) ≥ (1 − ε) n

1

2kq + q − 1

  • log(n/q)
slide-21
SLIDE 21

Universal Lossless Coding Performance Bounds 21 ✬ ✫ ✩ ✪

Summary and Conclusions

  • 1. The redundancy-capacity theorem is very useful to derive lower bounds on
  • minimax/maximin redundancy in universal coding,
  • redundancy for most sources in universal coding.
  • 2. Lower bounds on redundancy in both cases were obtained for
  • finite number of parameters parametric sources,
  • i.i.d. sources with large alphabets,
  • patterns induced by i.i.d. sources,
  • piecewise stationary sources,
  • piecewise stationary sources with slowly varying statistics,
  • switching sources.
  • 3. Different techniques from coding theory were used:
  • random coding,
  • sphere packing,
  • algebraic code distance bounds.