 
              Huffman Coding Variable Rate Codes � Example: � David A. Huffman (1951) Huffman coding uses frequencies of symbols in a string to build a variable rate prefix code � 1) A → 00 ; B → 01 ; C → 10 ; D → 11 ; � Each symbol is mapped to a binary string 2) A → 0 ; B → 100 ; C → 101 ; D → 11 ; � More frequent symbols have shorter codes � � No code is a prefix of another No code is a prefix of another � Two different encodings of AABDDCAA � Example: 0 1 � 0000011111100000 (16 bits) A A 0 1 0 B 100 D � 00100111110100 (14 bits) 0 1 C 101 D 11 B C 27/02/2011 Applied Algorithmics - week7 1 27/02/2011 Applied Algorithmics - week7 2 Cost of Huffman Trees Cost of Huffman Trees - example � Let A ={ a 1 , a 2 , .., a m } be the alphabet in which each � Example: symbol a i has probability p i � Let a 1 = A , p 1 = 1/2 ; a 2 = B , p 2 = 1/8 ; a 3 = C , p 3 = 1/8 ; a 4 = D , p 4 = 1/4 � We can define the cost of the Huffman tree HT as where r 1 = 1 , r 2 = 3 , r 3 = 3 , and r 4 = 2 m C(HT) = Σ p i ·r i , i=1 HT 0 where r i is the length of the path from the root to a i 1 C(HT) =1·1/2 +3·1/8 +3·1/8 +2·1/4=1.75 � The cost C(HT) is the expected length (in bits) of a code A 1 0 word represented by the tree HT . The value of C(HT) is D 0 1 called the bit rate of the code. B C 27/02/2011 Applied Algorithmics - week7 3 27/02/2011 Applied Algorithmics - week7 4
Huffman Tree Property Huffman Tree Property � Input: Given probabilities p 1 , p 2 , .., p m for symbols a 1 , a 2 , � Input: Given probabilities p 1 , p 2 , .., p m for symbols a 1 , a 2 , .., a m from alphabet A .., a m from alphabet A � Output: A tree that minimizes the average number of bits � Output: A tree that minimizes the average number of bits (bit rate) to code a symbol from A (bit rate) to code a symbol from A (bit rate) to code a symbol from A (bit rate) to code a symbol from A � I.e., the goal is to minimize function: � I.e., the goal is to minimize function: C(HT) = Σ p i ·r i , C(HT) = Σ p i ·r i , where r i is the length of the path from the root to leaf a i . where r i is the length of the path from the root to leaf a i . This is called a Huffman tree or Huffman code for alphabet A This is called a Huffman tree or Huffman code for alphabet A 27/02/2011 Applied Algorithmics - week7 5 27/02/2011 Applied Algorithmics - week7 6 Construction of Huffman Trees Construction of Huffman Trees � Form a (tree) node for each symbol a i with weight p i P( A )= 0.4 , P( B )= 0.1 , P( C )= 0.3 , P( D )= 0.1 , P( E )= 0.1 � Insert all nodes to a priority queue PQ (e.g., a heap) ordered by nodes probabilities 0.1 0.1 0.1 0.3 0.4 E E D D B B C C A A � while (the priority queue has more than two nodes) � while (the priority queue has more than two nodes) � min 1 ← remove-min ( PQ ); min 2 ← remove-min ( PQ ); � create a new (tree) node T ; 0.2 0.1 0.3 0.4 � T.weight ← min 1 .weight + min 2 .weight ; B C A � T.left ← min 1 ; T.right ← min 2 ; � insert ( PQ , T ) D E � return (last node in PQ ) 27/02/2011 Applied Algorithmics - week7 7 27/02/2011 Applied Algorithmics - week7 8
Construction of Huffman Trees Construction of Huffman Trees 0.2 0.1 0.3 0.4 0.3 0.6 B C A 0.4 0.3 0.4 C A A 0 1 0 1 0 1 D E B B C C 0 1 0 1 0.3 0.3 0.4 C A D E B 0 1 0 1 B D E 0 1 D E 27/02/2011 Applied Algorithmics - week7 9 27/02/2011 Applied Algorithmics - week7 10 Construction of Huffman Trees Construction of Huffman Trees 0.4 0.6 1 A 0 A = 0 1 0 0 1 A B = 100 0 0 A A 1 1 C C = 11 0 1 0 1 D = 1010 C 0 1 C B E = 1011 0 1 0 1 B B 0 1 D E 0 1 D E D E 27/02/2011 Applied Algorithmics - week7 11 27/02/2011 Applied Algorithmics - week7 12
Huffman Codes Basics of Information Theory � The entropy of an information source (string) S built over � Theorem: For any source S the Huffman code can alphabet A ={ a 1 , a 2 , .., a m }is defined as: be computed efficiently in time O(n·log n) , where n H(S) = � i p i ·log 2 (1/p i ) is the size of the source S. where p i is the probability that symbol a i in S will occur where p i is the probability that symbol a i in S will occur Proof: The time complexity of Huffman coding � log 2 (1/p i ) indicates the amount of information contained algorithm is dominated by the use of priority queues in a i , i.e., the number of bits needed to code a i . � For example, in an image with uniform distribution of � One can also prove that Huffman coding creates the gray-level intensity, i.e. all p i = 1/256 , then the number of most efficient set of prefix codes for a given text bits needed to encode each gray level is 8 bits. The � It is also one of the most efficient entropy coder entropy of this image is 8. 27/02/2011 Applied Algorithmics - week7 13 27/02/2011 Applied Algorithmics - week7 14 Huffman Code vs. Entropy Error detection and correction P( A )= 0.4 , P( B )= 0.1 , P( C )= 0.3 , P( D )= 0.1 , P( E )= 0.1 � Hamming codes: � codewords in Hamming (error detecting and error correcting) � Entropy: codes consist of m data bits and r redundant bits. � 0.4 · log 2 (10/4) + 0.1 · log 2 (10) + 0.3 · log 2 (10/3) + � 0.4 · log 2 (10/4) + 0.1 · log 2 (10) + 0.3 · log 2 (10/3) + � � Hamming distance between two strings represents the number Hamming distance between two strings represents the number of bit positions on which two bit patterns differ (similar to 0.1 · log 2 (10) + 0.1 · log 2 (10) = 2.05 bits per symbol pattern matching k mismatches). � Huffman Code: � Hamming distance of the code is determined by the two codewords whose Hamming distance is the smallest . � 0.4 · 1 + 0.1 · 3 + 0.3 · 2 + 0.1 · 4 + 0.1 · 4 = 2.10 � error detection involves determining if codewords in the � Not bad, not bad at all. received message match closely enough legal codewords. 27/02/2011 Applied Algorithmics - week7 15 27/02/2011 Applied Algorithmics - week8 16
Error detection and correction Error detection and correction � To detect properly d single bit errors, one needs to apply a (b) A code with good distance properties d+1 code distance. (a) A code with poor distance properties � To correct properly d single bit errors, one needs to apply o o o o x x x a 2d+1 code distance. a 2d+1 code distance. o o o o o o o o x x o o x o x o x x x o x � In general, the price for redundant bits is too expensive (!!) o o x x o o to do error correction for all network messages o o o o o x x o o o � Thus safety and integrity of network communication is based on error detecting codes and extra transmissions in code distance x = codewords o = non-codewords case any errors were detected 27/02/2011 Applied Algorithmics - week8 17 27/02/2011 Applied Algorithmics - week8 18 Error-Detection System using Check Bits Cyclic Redundancy Checking (CRC) Received information bits Information bits c yclic r edundancy c heck (CRC) is a popular technique for detecting data transmission errors. Transmitted Recalculate messages are divided into predetermined lengths messages are divided into predetermined lengths check bits that are divided by a fixed divisor. According to the calculation, the remainder number is appended Channel onto and sent with the message. When the message Calculate Compare is received, the computer recalculates the remainder check bits and compares it to the transmitted remainder. Information Received Check If the numbers do not match, an error is detected. accepted if check bits bits check bits match 27/02/2011 Applied Algorithmics - week8 19 27/02/2011 Applied Algorithmics - week8 20
Recommend
More recommend