Huffman Coding Eric Dubois School of Electrical Engineering and - - PowerPoint PPT Presentation

huffman coding
SMART_READER_LITE
LIVE PREVIEW

Huffman Coding Eric Dubois School of Electrical Engineering and - - PowerPoint PPT Presentation

Huffman Coding Eric Dubois School of Electrical Engineering and Computer Science University of Ottawa September 2012 Eric Dubois (EECS) Huffman Coding September 2012 1 / 17 The optimal prefix code problem Given a finite alphabet with a


slide-1
SLIDE 1

Huffman Coding

Eric Dubois

School of Electrical Engineering and Computer Science University of Ottawa

September 2012

Eric Dubois (EECS) Huffman Coding September 2012 1 / 17

slide-2
SLIDE 2

The optimal prefix code problem

Given a finite alphabet with a given set of probabilities, we want to find a prefix code with the shortest average codeword length. To simplify notation, denote pi = P(ai) and ℓi = ℓ(ai), for i = 1, . . . , M. Without loss of generality, we arrange the symbols in the alphabet so that p1 ≥ p2 ≥ · · · ≥ pM. Problem: Find a set of positive integers ℓ1, ℓ2, . . . , ℓM such that ℓ =

M

  • i=1

piℓi is minimized, subject to the constraint

M

  • i=1

2−ℓi ≤ 1 The solution may not be unique.

Eric Dubois (EECS) Huffman Coding September 2012 2 / 17

slide-3
SLIDE 3

Preview of the Huffman algorithm

The Huffman algorithm was originally devised by David Huffman, apparently as part of a course assignment at MIT and published in 1951. Consider the following example: M = 8, {pi} = {0.25, 0.2, 0.2, 0.18, 0.09, 0.05, 0.02, 0.01}. The Huffman procedure constructs the prefix code starting with the last bits of the least probable symbols. List the probabilities in decreasing order in a column on the left. Assign the final bits of the last two codewords. Add the two probabilities to replace the previous two. Select the two lowest probabilities in the reduced list, and assign two bits. Continue until two symbols remain. Read codewords from right to left.

Eric Dubois (EECS) Huffman Coding September 2012 3 / 17

slide-4
SLIDE 4

Huffman coding example

0.6 p1 = 0.25 p2 = 0.2 p3 = 0.2 p4 = 0.18 p5 = 0.09 p6 = 0.05 p7 = 0.02 p8 = 0.01 1 0.03 1 0.08 1 0.17 1 0.35 1 0.4 1 1 1.0

Codewords 10 00 01 110 1110 11110 111110 111111 Eric Dubois (EECS) Huffman Coding September 2012 4 / 17

slide-5
SLIDE 5

Huffman coding example (2)

H1 = − 8

i=1 pi log2(pi) = 2.5821

ℓHuff = 8

i=1 piℓi = 2.63

ℓShann = 3.04 ℓfixed = 3

Eric Dubois (EECS) Huffman Coding September 2012 5 / 17

slide-6
SLIDE 6

Huffman coding example – spreadsheet

p_i ‐log_2(p_i) ‐p_i*log_2(p_i) l_i p_i*l_i 2^(‐l_i) l_i p_i*l_i 2^(‐l_i) 0.25 2.0000 0.5000 2 0.5 0.25 2 0.5 0.25 0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25 0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25 0.18 2.4739 0.4453 3 0.54 0.125 3 0.54 0.125 0.09 3.4739 0.3127 4 0.36 0.0625 4 0.36 0.0625 0.05 4.3219 0.2161 5 0.25 0.03125 5 0.25 0.03125 0.02 5.6439 0.1129 6 0.12 0.015625 6 0.12 0.015625 0.01 6.6439 0.0664 7 0.07 0.0078125 6 0.06 0.015625 sum p_i entropy H_1 l_Shann Kraft l_Huff Kraft 1.00 2.5821 3.0400 0.7422 2.6300 1.0000 Shannon code Huffman code

Eric Dubois (EECS) Huffman Coding September 2012 6 / 17

slide-7
SLIDE 7

Huffman coding example – binary tree

c (a2) c (a3)

1 1 1 1 1 1 1

c (a1) c (a4) c (a5) c (a6) c (a7) c (a8) Eric Dubois (EECS) Huffman Coding September 2012 7 / 17

slide-8
SLIDE 8

Theorem

For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties:

1 If pj > pk, then ℓj ≤ ℓk, so that ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓM. 2 The two longest codewords have the same length: ℓM−1 = ℓM. 3 The two longest codewords differ only in their last bit, and correspond

to the two source symbols of lowest probability. Note that not all optimal codes need satisfy these properties, but at least

  • ne does.

Eric Dubois (EECS) Huffman Coding September 2012 8 / 17

slide-9
SLIDE 9

Proof (1)

Let C be an optimal code with codeword lengths ℓ1, . . . , ℓM, and suppose that contrary to the theorem statement, pj > pk but ℓj > ℓk. Let C′ be a new code with ℓ′

j = ℓk, ℓ′ k = ℓj, and ℓ′ i = ℓi for i = j, k.

Then ℓ(C′) − ℓ(C) =

M

  • i=1

piℓ′

i − M

  • i=1

piℓi = pjℓk + pkℓj − pjℓj − pkℓk = (pj − pk)(ℓk − ℓj) < 0 which contradicts the assumption that C is an optimal code. Thus ℓj ≤ ℓk.

Eric Dubois (EECS) Huffman Coding September 2012 9 / 17

slide-10
SLIDE 10

Theorem

For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties:

1 If pj > pk, then ℓj ≤ ℓk, so that ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓM. 2 The two longest codewords have the same length: ℓM−1 = ℓM. 3 The two longest codewords differ only in their last bit, and correspond

to the two source symbols of lowest probability.

Eric Dubois (EECS) Huffman Coding September 2012 10 / 17

slide-11
SLIDE 11

Proof (2)

Suppose that ℓM > ℓM−1. Thus no other codeword will be of length ℓM. Since C is a prefix code, we can remove the last bit of c(aM) and the new code will still be a prefix code, but of lower average codeword length (ℓ(C) − pM). Again, this contradicts the assumption that C is an optimal code, so ℓM−1 = ℓM.

Eric Dubois (EECS) Huffman Coding September 2012 11 / 17

slide-12
SLIDE 12

Theorem

For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties:

1 If pj > pk, then ℓj ≤ ℓk, so that ℓ1 ≤ ℓ2 ≤ · · · ≤ ℓM. 2 The two longest codewords have the same length: ℓM−1 = ℓM. 3 The two longest codewords differ only in their last bit, and correspond

to the two source symbols of lowest probability.

Eric Dubois (EECS) Huffman Coding September 2012 12 / 17

slide-13
SLIDE 13

Proof (3)

For all the codewords of length ℓM, there is another codeword that differs only in the last bit. Otherwise, we could remove the last bit as in (2) and reduce the average codeword length. If the codeword that differs from c(aM) in the last bit is not c(aM−1) but rather c(aj) for some other j, we can exchange the codewords for aM−1 and aj without changing the average codeword length, and the code would remain optimal. The Huffman algorithm is a recursive procedure to find a code satisfying the properties of the theorem.

Eric Dubois (EECS) Huffman Coding September 2012 13 / 17

slide-14
SLIDE 14

Optimal code tree

Huffman code scenario

cM (aM) cM (aM-1)

Eric Dubois (EECS) Huffman Coding September 2012 14 / 17

slide-15
SLIDE 15

Recursive Algorithm

Assume that we have an optimal code CM for the alphabet A = {a1, . . . .aM} with probabilities P(ai) satisfying the properties of the theorem. Form the reduced alphabet A′ = {a′

1, . . . , a′ M−1} with probabilities

P(a′

i) = P(ai), i = 1, . . . , M − 2 and P(a′ M−1) = P(aM−1) + P(aM).

Suppose that we have a prefix code CM−1 for the reduced alphabet satisfying cM(ai) = cM−1(a′

i), i = 1, . . . , M − 2,

cM(aM−1) = cM−1(a′

M−1) ∗ 0 and cM(aM) = cM−1(a′ M−1) ∗ 1

Then ℓi = ℓ′

i, i = 1, . . . , M − 2 and ℓM−1 = ℓM = ℓ′ M−1 + 1.

ℓ(CM) =

M

  • i=1

P(ai)ℓi =

M−2

  • i=1

P(a′

i)ℓ′ i + (P(aM−1) + P(aM))(ℓ′ M−1 + 1)

=

M−1

  • i=1

P(a′

i)ℓ′ i + P(aM−1) + P(aM)

= ℓ(CM−1) + P(aM−1) + P(aM)

Eric Dubois (EECS) Huffman Coding September 2012 15 / 17

slide-16
SLIDE 16

Reduced code tree

Huffman code scenario (2)

cM-1 (a’M-1)

Eric Dubois (EECS) Huffman Coding September 2012 16 / 17

slide-17
SLIDE 17

Recursive Algorithm (2)

Conclusion: CM is an optimal code for {A, P(ai)} if and only if CM−1 is an optimal code for {A′, P(a′

i)}

Similarly, we can obtain CM−2 from CM−1. We continue until we reach C2 for an alphabet with two symbols, where the only possible code has codewords 0 and 1. This results in the Huffman procedure illustrated by the earlier example. Note that H1 ≤ ℓHuff ≤ ℓShann < H1 + 1.

Eric Dubois (EECS) Huffman Coding September 2012 17 / 17