Compression: Huffmans Algorithm Greg Plaxton Theory in Programming - PowerPoint PPT Presentation

Compression: Huffman’s Algorithm Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Tree Representation of a Prefix Code • Consider the infinite binary tree T in which each leftgoing edge is labeled with a 0 and each rightgoing edge is labeled with a 1 • We define the label of any node x in T as the concatenation of the edge labels on the unique downward path from the root to x – There is a one-to-one correspondence between the nodes of T and { 0 , 1 } ∗ • A prefix code may be represented as the (unique, finite) subtree of T that contains the root and has leaf labels equal to the set of codewords – Such a representation is possible because no two codewords lie on the same downward path from the root Theory in Programming Practice, Plaxton, Spring 2004

Convention Concerning Zero-Frequency Symbols • Let A denote the input alphabet • For each a in A , let f ( a ) denote the frequency of a – Remark: Scaling the frequencies by any positive quantity does not change the problem of computing an optimal code – We will adopt the convention that a symbol with frequency zero should not be assigned a codeword in an optimal code – For the remainder of this lecture, assume that A has been preprocessed to remove any zero-frequency symbols, so f ( a ) > 0 for all a in A Theory in Programming Practice, Plaxton, Spring 2004

Full Binary Tree • A (finite) binary tree is full if every internal node has exactly two children • Theorem: Any optimal prefix code corresponds to a full binary tree • Proof: The weight of a prefix code corresponding to a nonfull binary tree strictly decreases when we contract the lone descendant edge of of any single-child node • Theorem: Any optimal prefix code tightly satisfies the McMillan inequality (i.e., the inequality is an equality) • Hint: Use induction on the number of leaves; for the induction step, remove two sibling leaves at the deepest level Theory in Programming Practice, Plaxton, Spring 2004

Huffman’s Algorithm • The algorithm manipulates a forest of full binary trees – Invariant: The total number of leaves is n = | A | , and each leaf is labeled by a distinct symbol in A – Initially, there are n single-node trees – The algorithm repeatedly merges pairs of trees – The algorithm terminates when there is one tree, the output tree • To merge trees T 1 and T 2 , we introduce a new node with left subtree T 1 and right subtree T 2 – Remark: It doesn’t matter which tree becomes the left or right child of the merged tree • It remains only to specify which pair of trees to merge in each iteration Theory in Programming Practice, Plaxton, Spring 2004

Huffman’s Algorithm: Merging Rule • Let us define the frequency of a tree T in the forest as the sum of the frequencies of the symbols associated with (the leaves of) T • We merge the two lowest-frequency trees, breaking ties arbitrarily • Remark: Due to the arbitrary tie-breaking, Huffman’s algorithm can produce different codes; in fact, it can produce codes with different distributions of codeword lengths Theory in Programming Practice, Plaxton, Spring 2004

Huffman’s Algorithm: Recursive Formulation • Let a and b be the two lowest-frequency symbols in A , breaking ties arbitrarily • Let A ′ = ( A \ { a, b } ) ∪ { c } , where c is a new symbol with f ( c ) = f ( a ) + f ( b ) • Recursively compute the tree representation of an optimal prefix code for A ′ • Modify this tree by adding two children labeled a and b to the leaf labeled c (which ceases to be a leaf) – The resulting tree corresponds to an optimal prefix code for A Theory in Programming Practice, Plaxton, Spring 2004

Huffman’s Algorithm: Proof of Correctness • Not hard to convince yourself that the iterative and recursive versions of Huffman’s algorithm are equivalent • Our proof is given in terms of the recursive version Theory in Programming Practice, Plaxton, Spring 2004

Huffman’s Algorithm: Proof of Correctness • Let X denote the set of tree representations of all prefix codes for A in which the symbols a and b are associated with sibling leaves – Lemma: The set X contains the tree representation of an optimal prefix code • Let Y denote the set of tree representations of all prefix codes for A ′ – Observation: There is a natural one-to-one correspondence between the sets X and Y ; for any tree T in Y , let h ( T ) denote the corresponding tree in X – Lemma: For any tree T in Y , the weight of h ( T ) is equal to the weight of T plus f ( a ) + f ( b ) – It follows that any minimum-weight tree T in Y corresponds to a minimum-weight tree h ( T ) in X Theory in Programming Practice, Plaxton, Spring 2004

Remarks on Optimal Codes • While there is always an optimal prefix code, not every optimal code is a prefix code – Example involving the alphabet { a, b, c } with equal frequencies? • Two optimal prefix codes (for the same input) can have a different distribution of codeword lengths – Example involving the alphabet { a, b, c, d } ? Theory in Programming Practice, Plaxton, Spring 2004

Implementation of Huffman’s Algorithm • Use a priority queue data structure (e.g., a binary heap) that supports the operations insert and delete-min in O (log n ) time • Initially, we insert each of the n symbols into the priority queue • At each iteration (or level of recursion) we remove two entries from the priority queue and insert one • The total cost of the 2 n − 1 insert (resp., delete-min) operations is O ( n log n ) • The total cost of all other operations is O (1) per iteration (or level of recursion), hence O ( n ) overall Theory in Programming Practice, Plaxton, Spring 2004

Linear-Time Implementation • Assume that the “leaves” (i.e., symbols) are provided to us in a queue sorted in nondecreasing order of frequency • As the algorithm progresses, we can find the lowest-frequency unprocessed leaf in O (1) time • But what about the “nonleaf” nodes, one of which is created at each iteration? – When a nonleaf is created, append it to a second queue of nonleaves (initially empty) – Invariant: The nonleaf queue is sorted in nondecreasing order of frequency – To implement delete-min, we simply compare the frequencies of the nodes at the heads of the leaf and nonleaf queues, dequeueing whichever is lower Theory in Programming Practice, Plaxton, Spring 2004

Optimal Prefix Codes: Summary • For an i.i.d. source, the average number of bits used to encode each symbol is within one of the entropy lower bound – Except in cases where the entropy is extremely close to zero, this implies near-optimal compression – But we need to know the symbol probabilities ahead of time in order to encode the data in a single pass • For more general Markov sources, an optimal prefix code can achieve compression that exceeds the entropy bound by an arbitrarily high factor, even if the entropy of the source is high – Example: It is easy to construct a Markov source in which the symbol probabilities fluctuate from time to time; in such a cases we need to use a more adaptive method in order to approach the entropy bound Theory in Programming Practice, Plaxton, Spring 2004

Compression: Huffmans Algorithm Greg Plaxton Theory in Programming - PowerPoint PPT Presentation

Compression: Huffmans Algorithm Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Tree Representation of a Prefix Code Consider the infinite binary tree T in which each

Huffman Coding Variable Rate Codes Example: David A. Huffman (1951) Huffman coding uses

Huffman Trees To save space when storing it. Greedy Algorithm for Data Compression To save

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Welcome to M2 SCCI 2014-2015 Promotion David Albert Huffman David A. Huffman(1925-1999) [Photo:

1 Data structures for decoder: Construction of canonical Huffman: (sketch) The array

PTAS for Huffman coding with unequal letter costs Mordecai Golin (HKUST), Claire Mathieu (Brown)

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

CSE 421 Algorithms Huffman Codes: An Optimal Data Compression Method 1 a 45% b 13%

CSE 417 Algorithms Winter 2007 Huffman Codes: An Optimal Data Compression Method 1 a 45% b

CSE 421 Algorithms Summer 2007 Huffman Codes: An Optimal Data Compression Method 1 a 45% b

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Week 5 Video 5 Relationship Mining Network Analysis Todays Class Network Analysis Network

Voting in Parallel Universes ILLC Workshop on Collective Decision Making 2015 Stphane Airiau

Human Ranking of Machine Translation Matt Post Johns Hopkins University University of

sst rst r srt rs

Solving problems by searching Uninformed search algorithms Discussion Class CS 171 Friday,

twenty six diagonal cracking shear f c http://www.bam.de concrete construction:

Tightening Big M Constraints Lou Hafer DIMACS Workshop on COIN-OR July 17 20, 2006 1

Common Stormwater Problems 1 11/29/2018 New Challenges and Mandates Aim To Improve Water