i n f o r m a t i o n t r a n s m i s s i o n c h a p t e
play

I n f o r m a t i o n T r a n s m i s s i o n - PowerPoint PPT Presentation

I n f o r m a t i o n T r a n s m i s s i o n C h a p t e r 5 , S o u r c e c o d i n g OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY L e a r n i n g o u t c o m e s A f t e r


  1. I n f o r m a t i o n T r a n s m i s s i o n C h a p t e r 5 , S o u r c e c o d i n g OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY

  2. L e a r n i n g o u t c o m e s ● A f t e r t h i s l e c t u r e t h e s t u d e n t s h o u l d – u n d e r s t a n d t h e b a s i c s o f s o u r c e c o d i n g , – k n o w w h a t a p r e fi x f r e e s o u r c e c o d e i s , – k n o w h o w t o c a l c u l a t e a v e r a g e c o d e w o r d l e n g t h , – u n d e r s t a n d t h e l i m i t s o n s o u r c e c o d i n g , – u n d e r s t a n d t h e c o n c e p t o f u n i v e r s a l s o u r c e c o d i n g , a n d – b e a b l e t o p e r f o r m e n c o d i n g a n d d e c o d i n g a c c o r d i n g t o t h e L e m p e l - Z i v - We l c h a l g o r i t h m 2

  3. Wh e r e a r e w e i n t h e B I G P I C T U R E ? Source coding Lecture relates to pages 179-189 in textbook. 3

  4. Wh a t d i d S h a n n o n p r o m i s e ? • We can represent a source sequence from X, of length n, uniquely by, on the average, nH ( X ) bits 4

  5. P r e fi x f r e e s o u r c e c o d e We say that a sequence of length l is a prefix of a sequence if the first l symbols of the latter sequence is identical to the first sequence; in particular, a sequence is a prefix of itself. Then we require that no codeword is the prefix of another codeword and call such a code a prefix-free source code . The sequence 10011 has the prefixes: 1, 10, 100, 1001, and 10011. The source code with codewords {00,01,1} is prefix-free, but {00,10,1} is not, since 1 is prefix of 10. 5

  6. E x a m p l e Consider the source code What is the average codeword length? 6

  7. H u ff m a n c o d e c o n s t r u c t i o n An optimal way to find a prefix-free variable-length code was discovered by David Huffman in 1951, when he worked on a term, paper as a student. Procedure: 1.Let the source symbols be nodes with respective probabilities 2.Combine the two least probable remaining nodes into a new node and calculate its probability 3.If there are more than one node left, go back to step 2. 4.Label each branch of the constructed the tree either 0 or 1. ROOT 5.Traversing the tree from the root to each symbol gives the codewords. 7

  8. P a t h l e n g t h l e m m a In a rooted tree with probabilities, the average depth of the leaves is equal to the sum of the probabilities of the nodes (including the root). 8

  9. E x a m p l e What is the average word length and the uncertainty of the source ROOT Average word length: 2% away from what is possible Uncertainty/entropy: 9

  10. R e a c h i n g t h e l i m i t If we encode consecutive source symbols pairwise, that is, use the Huffman code for the source we will obtain an average codeword length per single source symbol that is closer to the uncertainty of the source, H ( U ) = 2.35. 1 0

  11. A u n i v e r s a l s o u r c e c o d i n g a l g o r i t h m The LZW algorithm is due to Ziv, Lempel, and Welch and belongs to the class of so-called universal source-coding algorithms which means that we do not need to know the source statistics. The algorithm is easy to implement and for long sequences it approaches the uncertainty of the source; it is asymptotically optimum. 1 1

  12. B a s i c p r o c e d u r e 1. Initialize the dictionary. 2. Find the longest string W in the dictionary that matches the current input. 3. Emit the dictionary index for W to output and remove W from the input. 4. Add W followed by the next symbol in the input to the dictionary. 5. Go to Step 2. Suppose we want to compress the sentence: DO_NOT_TROUBLE_TROUBLE_ UNTIL_TROUBLE_TROUBLES_YOU! 1 2

  13. DO_NOT_TROUBLE_TROUBLE_ UNTIL_TROUBLE_TROUBLES_YOU! 1 3

  14. E v a l u a t i o n Without compression we need as many as 50*8 = 400 binary digits to represent the sentence as a string of 50 ASCII symbols. If we sum the number of binary digits needed for the 38 steps shown in the table we get only 271 binary digits. A highly optimized version of the LZW algorithm we have described is used widely in practice to compress computer files on in all major operating systems. 1 4

  15. S u m m a r y ● Mo s t n a t u r a l t y p e s o f d a t a c a n b e c o m p r e s s e d ● T h e u n c e r t a i n t y , o r e n t r o p y , o f t h e s o u r c e d e t e r m i n e s h o w m u c h w e c a n c o m p r e s s d a t a w i t h o u t l o o s i n g a n y t h i n g ● P r e fi x f r e e v a r i a b l e - l e n g t h s o u r c e c o d e s c a n b e u n i q u e l y d e c o d e d ● H u f f m a n ' s c o d e c o n s t r u c t i o n i s o p t i m a l f o r a g i v e n s e t o f s y m b o l s ● G r o u p i n g s y m b o l s c a n r e d u c e a v e r a g e c o d e w o r d l e n g t h , b u t n o t f u r t h e r d o w n t h a n w h a t i s g i v e n b y t h e u n c e r t a i n t y / e n t r o p y ● U n i v e r s a l s o u r c e c o d i n g b u i l d s a c o d i n g t a b l e o n t h e fl y , w i t h o u t k n o w i n g t h e s o u r c e p r o b a b i l i t i e s i n a d v a n c e ● L e m p e l - Z i v - We l c h i s t h e m o s t w e l l k n o w n u n i v e r s a l s o u r c e c o d e r , a p p l i e d i n a l l m o d e r n o p e r a t i n g s y s t e m s f o r c o m p r e s s i n g fi l e s 1 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend