! The variable-length code is a prefix min f ( c ) d ( c ) - - PDF document

the variable length code is a prefix min f c d c code no
SMART_READER_LITE
LIVE PREVIEW

! The variable-length code is a prefix min f ( c ) d ( c ) - - PDF document

Coding ! Representing characters from some input alphabet using another alphabet . Example: input characters from the Latin Huffman Codes alphabet, output strings of binary digits. ! Fixed-length binary character code: for an input


slide-1
SLIDE 1

1

Huffman Codes

2

Coding

! Representing characters from some

input alphabet Σ using another alphabet σ.

– Example: input characters from the Latin alphabet, output strings of binary digits.

! Fixed-length binary character code: for

an input alphabet of size n, represent each charcter as a binary string of bits.

– Example: 8-bit ASCII code.

! A more space-efficient representation

can be obtained using variable-length coding.

– Example: Huffman codes.

 

n lg

3

Example

! A file of 100,000 characters takes

300,000 bits with fixed-length code, but only 224,000 bits in variable length code.

! The variable-length code is a prefix

code: no binary string is a prefix of another.

1100 1101 111 100 101 Variable length code 101 100 011 010 001 000 Fixed-length code 5 9 16 12 13 45 Frequency (%) f e d c b a

Σ

4

Prefix Codes

! A binary prefix code can be represented as a

binary tree: each path from the root to a leaf is a binary codeword of a character.

! Optimality condition: minimum weighted sum

  • f leaf depths in the tree.

a b c d f e

1 1 1 1 1

c T T

c d c f ) ( ) ( min

slide-2
SLIDE 2

5

Optimality

! Note: an optimal code must correspond to a

full tree!

! Fact: an optimal coding tree for an alphabet of

n characters has (n-1) internal nodes.

a b c d

1 1 1 1

f e

1

a b c d f e

1 1 1 1 1

>

6

Huffman Code construction

class CodingTree { private float frequency; private char letter; private CodingTree left, right; } Input: a set of n pairs (character, frequency) Init: create n coding trees (one per char) PriorityQ Q = new PriorityQ(treeSet); CodingTree node; For (j=1; j < n; j++) { node = new CodingTree(); node.left = Q.deleteMin(); node.right = Q.deleteMin(); node.frequency = node.left.frequency + node.right.frequency; Q.insert(node); } Return Q.deleteMin();

7

Example

45, a 13, b 12, c 16, d 9, e 5, f 45, a 13, b 12, c 16, d

1

5, f 9, e 14 45, a 16, d

1

5, f 9, e 14

1

13, b 12, c 25 45, a

1

13, b 12, c 25

1

30 16, d

1

5, f 9, e 14

8

Example (continued)

45, a

1

13, b 12, c 25

1

55 45, a

1

13, b 12, c 25

1

55

1 100 1

30 16, d

1

5, f 9, e 14

1

30 16, d

1

5, f 9, e 14

slide-3
SLIDE 3

9

Huffman Decoding

! Starting at the root of the coding tree,

read input bits.

! After reading 0 go left ! After reading 1 go right ! If a leaf node has been reached,

  • utput the character stored in the leaf,

and return to the root of the tree.

10

Optimality Proof (1)

! Lemma 1 (17.2): Let C be an alphabet

and x,y two characters in C with lowest

  • frequencies. Then there exists an
  • ptimal prefix code tree in which x and

y are sibling leaves. x c b y b c x y b y x c

11

Optimality Proof (2)

! Lemma 2 (17.3): Let T be an optimal

prefix code tree for alphabet C. Consider any two sibling characters x and y in C and let z be their parent in

  • T. Then, considering z as a character

with frequency f[z] = f[x]+f[y], the tree T’ = T – {x,y} represents an

  • ptimal prefix code for the alphabet

C’ = C – {x,y} U {z}.

12

Optimality Proof (3)

! Theorem: Huffman’s algorithm

produces an optimal prefix code.

! Proof: By induction on the size of the

alphabet C, using Lemmas 1 and 2.