Compression: Prefix Codes Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation
Compression: Prefix Codes Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation
Compression: Prefix Codes Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin (Binary, Static) Code Maps each symbol a of a given finite alphabet A to a codeword w ( a ) in {
(Binary, Static) Code
- Maps each symbol a of a given finite alphabet A to a codeword w(a)
in {0, 1}∗ (i.e., a binary codeword) – The mapping is static, i.e., a is always encoded as w(a), regardless
- f the surrounding context
– So the mapping determines the encoder – But decoding can be problematic (why?)
Theory in Programming Practice, Plaxton, Spring 2004
Uniquely Decodable Code
- A code is uniquely decodable if the associated encoder maps distinct
input strings to distinct encoded strings – Necessary and sufficient for lossless decoding – Example of a code that is uniquely decodable? One that is not?
- Let ℓ(a) denote the length of w(a)
Theory in Programming Practice, Plaxton, Spring 2004
Optimal Code
- Suppose we are given a frequency f(a) for each symbol a in A
– Let p(a) denote
f(a) P
b∈A f(b)
– Note that p(a) may be viewed as a probability
- We define the weight of a code as
a∈A p(a) · ℓ(a)
- A code is optimal (for a given alphabet and associated probability
distribution) if it has minimum weight over all uniquely decodable codes – Remark: Keep in mind that we are only talking about optimality with respect to the set of binary static codes; we will revisit this issue later
Theory in Programming Practice, Plaxton, Spring 2004
An Entropy-Based Lower Bound on Code Weight
- Let H denote the entropy of the probability distribution associated with
alphabet A, i.e., H = −
- a∈A
p(a) log p(a)
- Theorem: The weight of any uniquely decodable code for A is at least
H
- Hint: Use the two inequalities given on the next slide and the fact that
the logarithm function is concave over the positive reals
Theory in Programming Practice, Plaxton, Spring 2004
Two Inequalities
- McMillan: Any uniquely decodable code satisfies
- a∈A
2−ℓ(a) ≤ 1
- Jensen: If λ1, . . . , λn are nonnegative reals summing to 1 and f is a
concave function over an interval containing the reals x1, . . . , xn then
- i
λi · f(xi) ≤ f
- i
λi · xi
- Theory in Programming Practice, Plaxton, Spring 2004
Prefix Code
- A prefix code is a code in which no codeword is the prefix of another
– Uniquely decodable – Easy to decode
- Exercise: Give an example of a code that is uniquely decodable but is
not a prefix code
Theory in Programming Practice, Plaxton, Spring 2004
Kraft-McMillan Inequality
- Kraft:
For any sequence
- f
integers ℓ1, . . . , ℓ|A| such that
- 1≤i≤|A| 2−ℓi ≤ 1, there is a prefix code for A with codeword lengths
ℓ1, . . . , ℓ|A|
- Since every uniquely decodable code satisfies McMillan’s inequality, we
can restrict our attention to prefix codes in searching for an optimal code
- McMillan’s inequality and the above result are often stated together
(in two parts) and referred to as the Kraft-McMillan inequality
Theory in Programming Practice, Plaxton, Spring 2004
An Entropy-Based Upper Bound on the Weight of an Optimal Code
- Theorem: There is an optimal (prefix) code for A with weight less than
H + 1
- Hint: First use the Kraft-McMillan inequality to establish the existence
- f a prefix code for where ℓ(a) = ⌈log
1 p(a)⌉ for all a in A
Theory in Programming Practice, Plaxton, Spring 2004
Summary and Discussion of Entropy-Based Bounds
- The weight of an optimal prefix code lies in the interval [H, H + 1)
- If H is high, then an optimal prefix code is guaranteed to achieve
close to the best possible compression ratio achievable with any coding technique (static or not) – Here we are appealing to Shannon’s entropy bound
- If H is close to zero, then an optimal prefix code might achieve a
compression ratio that is dramatically worse than the best possible – Example? – Other compression techniques may be applied in such situations in
- rder to achieve near-optimal performance (e.g., arithmetic coding
- r run-length coding)
Theory in Programming Practice, Plaxton, Spring 2004
Computing an Optimal Prefix Code
- Huffman’s algorithm will be presented in the next lecture
Theory in Programming Practice, Plaxton, Spring 2004