Applications of Random Coding and Algebraic Coding Theories to - - PowerPoint PPT Presentation
Applications of Random Coding and Algebraic Coding Theories to - - PowerPoint PPT Presentation
Universal Lossless Coding Performance Bounds 1 Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding Performance Bounds Gil I. Shamir Department of Electrical & Computer Engineering
Universal Lossless Coding Performance Bounds 2 ✬ ✫ ✩ ✪
Overview
Research Problem
- Average Case Universal Lossless Compression
- Performance Lower Bounds (on Redundancy - best possible performance of any
scheme for a specific model)
Research Approach
- Use Redundancy-Capacity Theorems to obtain bounds
- Lower bound the relevant capacity for given source model
Models Discussed
- finite number of parameters parametric sources
- i.i.d. sources with large alphabets
- patterns induced by i.i.d. sources
- piecewise stationary sources
- piecewise stationary sources with slowly varying statistics
- switching sources
Universal Lossless Coding Performance Bounds 3 ✬ ✫ ✩ ✪
Universal Coding and Redundancy
Problem Layout
- A sequence xn of length n, governed by Pθ,
- θ unknown in a known class Λ,
- uniquely decipherable code L (·) may depend on Λ but independent of θ.
- Unknown parameters cost redundancy.
Average Redundancy
- f code L (·) for n-sequences drawn by source θ
Rn (L, θ)
△
= 1 nEθL (Xn) − Hθ (Xn)
- Eθ - mean w.r.t. θ,
- Hθ - per symbol entropy.
Average Universality Measure of a Class Λ
- Maximin R−
n (Λ) and Minimax R+ n (Λ) average redundancies - best code for
some worst average (over xn) case. [Davisson, 1973]
- Average redundancy for most sources [Rissanen, 1984] (strongest sense).
Universal Lossless Coding Performance Bounds 4 ✬ ✫ ✩ ✪
Redundancy-Capacity Theorem
Weak Version [Implied from Davisson, 1973, Gallager, 1976]
Let n → ∞. Let ϕ be a set of M points θ in the class Λk, that are distinguishable by xn. Then, the minimax and maximin redundancies satisfy R+
n (Λk) = R− n (Λk) ≥ (1 − ε) log M
n
Strong Random Coding Version [Merhav & Feder, 1995, 1996]
Let n → ∞. Define a distribution over Λk, and partition most of the class Λε into disjoint countable sets ϕ, where the marginal of each θ ∈ ϕ is equal, and there are Mφ ≥ M sources in ϕ, distinguishable by xn. Then, Rn (L, θ) ≥ (1 − ε) log M n , for every code L (·), and almost every θ ∈ Λk.
Distinguishability
θ and θ′ distinguishable if xn generated by θ appears to be generated by θ′ with probability that goes to 0 and vice versa.
Universal Lossless Coding Performance Bounds 5 ✬ ✫ ✩ ✪
Use of Redundancy-Capacity Theorem
Weak Version for Λk
- 1. Demonstrate how to find ϕ.
- 2. Lower bound M.
- 3. Prove that all θ ∈ ϕ are distinguishable by xn.
Strong Version for Λk
- 1. Demonstrate how to define most of the class Λε.
- 2. Show that Λε is most of the class.
- 3. Show how to partition Λε such that every source in Λε is in exactly one ϕ,
and sources in ϕ are uniformly distributed with the uniform prior on Λk. Lower bound M.
- 4. Prove that for every valid ϕ, all θ ∈ ϕ are distinguishable by xn.
Compound Classes
If Λ =
k Λk, redundancy for θ ∈ Λk consists of Intra-class redundancy in Λk, and
Inter-class redundancy distinguishing Λk from Λ.
Universal Lossless Coding Performance Bounds 6 ✬ ✫ ✩ ✪
Redundancy Capacity - Demo
Λ Λ Λ Λk Λ Λ Λ Λε
ε ε ε
θ ∈ ϕ ϕ ϕ ϕ1
1 1 1
θ ∈ ϕ ϕ ϕ ϕ2
2 2 2
θ ∈ ϕ ϕ ϕ ϕ3
3 3 3
Mϕ = 13 Mϕ = 10 Mϕ = 12
M = 10
- The volume of Λk outside Λε assumed negligible.
- Any θ is contained in a unique ϕ and has equal probability to other θ′ ∈ ϕ.
- In every ϕ all points distinguishable by xn.
By theorem, for every code and almost every θ ∈ Λk, Rn (L, θ) ≥ (1 − ε) log 10 n
Universal Lossless Coding Performance Bounds 7 ✬ ✫ ✩ ✪
Finite k-dimensional Parametric Sources
- ϕ determined by initial shift u in a grid (one ϕ sufficient for maximin)
- θ ∈ ϕ distinguishable if ϕ is a grid with spacing n−0.5(1−ε)
Rn (L, θ) ≥ (1 − ε) k 2 log n n for every code L (·) and almost every θ ∈ Λ [Rissanen, 1984] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X θ ∈ ϕ ϕ ϕ ϕ1
1 1 1
θ ∈ ϕ ϕ ϕ ϕ2
2 2 2
θ ∈ ϕ ϕ ϕ ϕ3
3 3 3
n-0.5(1-ε)
u2 u1 u1 u2 u3 Initial shifts that define set ϕ ϕ ϕ ϕ
Universal Lossless Coding Performance Bounds 8 ✬ ✫ ✩ ✪
Distinguishability
Setting and Proof in most sources sense
- Choose a random grid ϕ (as in random coding).
- Generate xn by a given θ ∈ ϕ.
- Let ˆ
θ be the Maximum Likelihood estimator of θ from xn.
- Let ˆ
θg be the grid point whose components are nearest ˆ θ.
- Prove that Pe = Pr
ˆ
θg = θ | θ
- → 0 as n → ∞.
Use union bound on components of θ: Pe ≤
k
- j=1
Pr
ˆ
θgi = θi
- ≤
k
- j=1
n · 2
−n·minxn∈Ai D
- Pˆ
θi || Pθi
- ≤
2(log k)+(log n)−cnε/2 → 0.
- Ai - the event that ˆ
θgi = θi.
- D
- Pˆ
θi || Pθi
- ≥
c n1−ε for ˆ
θ ∈ Ai, c is constant.
Universal Lossless Coding Performance Bounds 9 ✬ ✫ ✩ ✪
I.I.D. Sources - Large Alphabet k - Minimax
[Shamir, 2003]
Problems with Large k
- Volume of Λk is 1/(k − 1)! (decreases with n), because
k−1
- i=1
θi ≤ 1.
- Too large spacing in grid n−0.5(1−ε) results in loose bound.
- Too small spacing (nk)−0.5(1−ε) results in lack of distinguishability in grids.
Solution
- Build non-uniform grids.
- Spacing near a
n proportional to √a n1−ε/2.
- Number of grid points preceding a
n proportional to √a nε/2.
Drawback
- This structure violates the requirements of the strong version, and thus is only
good for minimax/maximin redundancies.
Universal Lossless Coding Performance Bounds 10 ✬ ✫ ✩ ✪
Minimax/Maximin Redundancy - I.I.D. Large k
- ϕ is grid below,
- θ ∈ ϕ distinguishable by above definition (proved as in finite parametric case),
- bounding number of points in grid results in
R+
n (Λk) = R− n (Λk) ≥ (1 − ε) (k − 1)
2n log n k X X X X X X X X X X X X a0.5n-(1-ε/2) X X X X X X X X X X X X X a/n
Universal Lossless Coding Performance Bounds 11 ✬ ✫ ✩ ✪
Most Sources - I.I.D. Large k
Key Realizations
- Non-uniform grid above is not useful here.
- All sources outside a k − 1 dimensional sphere with radius r = n−0.5(1−ε)
around θ are distinguishable from θ by xn.
Method
- Pack as many as possible spheres with radius r and volume Vk−1(r) in the k − 1
dimensional space Λk of volume 1/(k − 1)!.
- Place θ ∈ ϕ at centers of the spheres (whole grid shifted for random selection).
- Factor in packing density 2−(k−1) to reduce number of points.
M ≥ 1 (k − 1)!Vk−1 (r) 2(k−1).
Result
Rn (L, θ) ≥ (1 − ε) (k − 1) 2n log n k for every code L(·) and almost every θ ∈ Λk. [Shamir, 2003] Note: Second order term is lower than that of minimax/maximin bound.
Universal Lossless Coding Performance Bounds 12 ✬ ✫ ✩ ✪
Patterns Induced by I.I.D. Sources
Motivation
- Classical compression considers known small alphabets.
- Sometimes alphabet is unknown and possibly large.
- Coding cost of unknown alphabet is inevitable.
Approach
- Use the inevitable cost to improve compression.
- Code sequence patterns in a second stage.
Patterns
- Indices assigned to original sequence letters in order of first occurrence.
- Example: The strings: xn = ‘lossless’, ‘sellsoll’, ‘12331433’, ‘76887288’ all
have the same pattern Ψ (xn) = ‘12331433’.
- Individual sequence redundancy studied in [Aberg, et al., 1997, Orlitsky et al.,
2002-].
Universal Lossless Coding Performance Bounds 13 ✬ ✫ ✩ ✪
I.I.D. Induced Patterns - Derivation
- Any θ′ which is a permutation of θ appears to be the same source.
Example: typical sequences - similar patterns θ = {0.1, 0.2} θ′ = {0.7, 0.2} xn = 1223333333 xn = 3221111111 Ψ (xn) = 1223333333 Ψ (xn) = 1223333333
- There are at most k! such permutations.
Remaining Space
1 1
1/2 1/2
1 X X
Same Types
X X
Same Types
Original Space Remaining Space k = 2 k = 3
Note: for k = 3 this is true for any combination of 2 out of 3 letters.
Universal Lossless Coding Performance Bounds 14 ✬ ✫ ✩ ✪
Pattern Redundancy Bounds
- The grid (in both maximin and most source cases) reduces
MΨ ≥ Mi.i.d. k!
- For k ≥ n1/3 too many permutations eliminated more than once, but worst
smaller k can be assumed.
- More sequences contribute to correct decision in the grid to allow
distinguishability.
Bounds [Shamir, 2003]
- Average minimax lower bound
R+
n [Ψ (Λk)] ≥
k−1 2n log n1−ε k3 + k−1 2n log πe3 2 − O
- log k
n
- ,
for k ≤
- πn1−ε
2
1/3
- π
2
1/3 · (1.5 log e) · n−(2+ε)/3 − O
- log n
n
- ,
for k >
- πn1−ε
2
1/3
- Average most-sources lower bound
Rn [L, ψ (θ)] ≥
k−1 2n log n1−ε k3 − k−1 2n log 8π e3 − O
- log k
n
- ,
for k ≤ 1
2 ·
- n1−ε
π
1/3
1.5 log e 2π1/3 · n−(2+ε)/3 − O
- log n
n
- ,
for k > 1
2 ·
- n1−ε
π
1/3
Universal Lossless Coding Performance Bounds 15 ✬ ✫ ✩ ✪
Piecewise Stationary Sources - PSS’s
Definition of PSS ψ
△
= (θ, t) ∈ Λq ⊂ Λ
- PSS - emits data divided into independent stationary segments separated by
abrupt changes in statistics
- Λ - nth order class of PSS’s (contains all possible combinations of the
k-dimensional parameters for n-sequences)
- Λq - All PSS’s in Λ with q segments
- θ
△
= {θ1, θ2, . . . , θq} - segmental parameters
- t
△
= {t1, t2, . . . , tq−1} - transition path (TP)
Redundancy bound [Shamir, 2000]
Rn (L, ψ) ≥ (1 − ε)
1
2kq + q − 1
log(n/q)
n for every L (·), for almost every ψ ∈ Λq, for every q. in the minimax/maximin senses.
Universal Lossless Coding Performance Bounds 16 ✬ ✫ ✩ ✪
Bound Derivation - PSS’s
Finite Number of Segments q
- 1. Λε contains all ψ for which all segments long (longer than n1−ε/2), and all
transitions are large.
- 2. Λε is most class for fixed q.
- 3. Partition Λε into sets as follows:
- Parse n-tuple to phrases of length l = n1−ε.
- For all ψ ∈ ϕ, ∀i, ti is a point in the same phrase in a grid with spacing lε.
- θi is a point in a grid as defined for stationary sources.
- ∀ψ ∈ ϕ, ti and θi must be from grids with identical initial shifts.
- 4. Distinguish among ψ ∈ ϕ as follows:
- Use phrases entirely inside segments to estimate θi.
- Given ˆ
θ, estimate transitions from respective grids. By definition of the grids, the bound for finite q [Merhav, 1993] results.
Universal Lossless Coding Performance Bounds 17 ✬ ✫ ✩ ✪
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
ψ ∈ ϕ ϕ1 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only red points
- 0.5(1-ε)
- X
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
t1 is one grid point t2 is one grid point
PSS with q = 3
θ1 is one grid point θ2 is one grid point θ3 is one grid point
ψ ∈ ϕ ϕ2 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only blue points ψ ∈ ϕ ϕ3 : θ : θ1 , θ θ2 ,θ3 , t1, t2 are only green points ε
Phrase
Set ϕ i contains all combinations with
- ne point from each
- f the five grids
Universal Lossless Coding Performance Bounds 18 ✬ ✫ ✩ ✪
General Bound Derivation - PSS’s
Large q
- 1. Λε defined is not most class.
- 2. For very large q, probability of error in at least one of the source parameters
significantly increases the overall error probability.
Solutions to Asymptotic Problems
- 1. Λε contains sources for which most segments are long and most transitions
are large.
- 2. Reduce sets ϕ to improve distinguishability for very large q.
Two different Cases
- q ≫ n/q - almost similar to fixed q (modified according to modification 1
above).
- q ≫ n/q - requires additional algebraic coding techniques for
distinguishability.
Universal Lossless Coding Performance Bounds 19 ✬ ✫ ✩ ✪
General Bound Derivation - PSS’s, Cont.
Second Case: q ≫ n/q
- Too many parameters.
- Error in estimating one results in error in estimating ψ.
Solution - Reduce ϕ by Linear Block Codes:
- Let η > 0 be arbitrarily small,
- q′ - number of ‘free’ segmental parameters,
- c′ - number of ‘free’ transition times.
- (1 − η) q′ segmental parameters and (1 − η) c′ transitions chosen from grids.
- Remaining parameters are parity checks.
- Grids’ resolutions chosen to yield Galois Fields.
- Each grid point is assigned an element in the proper Galois Field.
- Codes designed to correct up to αηq′ errors (exist: Gilbert-Varshamov).
Guarantees distinguishability even for q ≫ n/q, resulting in the same asymptotic bound (ε is now larger).
Universal Lossless Coding Performance Bounds 20 ✬ ✫ ✩ ✪
Additional Source Classes
PSS’s with Slowly Linearly Varying Statistics [Shamir, 2001]
- q segments, transition duration of (n/q)α:
Rn (L, ψ) ≥ (1 − ε)
kq
2 + (q − 1)
- 1 − α
2
log(n/q)
n
- If durations unknown,
Rn (L, ψ) ≥ (1 − ε)
1
2kq + q − 1
log(n/q)
n Hierarchical version of redundancy-capacity for compound class must be used. Insignificant cost above PSS’s.
Switching Sources - s states [Shamir, 2001]
If s ≤ (n/q)0.5k(1−ε). Then, for every code L (·) and almost all sources Rn (L, ψ) ≥ (1 − ε) n
ks
2 log(n/s) + (q − 1) log(n/q) + (q − s) log s
- Otherwise,
Rn (L, ψ) ≥ (1 − ε) n
1
2kq + q − 1
- log(n/q)
Universal Lossless Coding Performance Bounds 21 ✬ ✫ ✩ ✪
Summary and Conclusions
- 1. The redundancy-capacity theorem is very useful to derive lower bounds on
- minimax/maximin redundancy in universal coding,
- redundancy for most sources in universal coding.
- 2. Lower bounds on redundancy in both cases were obtained for
- finite number of parameters parametric sources,
- i.i.d. sources with large alphabets,
- patterns induced by i.i.d. sources,
- piecewise stationary sources,
- piecewise stationary sources with slowly varying statistics,
- switching sources.
- 3. Different techniques from coding theory were used:
- random coding,
- sphere packing,
- algebraic code distance bounds.