I n f o r m a t i o n T r a n s m i s s i o n - PowerPoint PPT Presentation

I n f o r m a t i o n T r a n s m i s s i o n C h a p t e r 5 , S o u r c e c o d i n g OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY

L e a r n i n g o u t c o m e s ● A f t e r t h i s l e c t u r e t h e s t u d e n t s h o u l d – u n d e r s t a n d t h e b a s i c s o f s o u r c e c o d i n g , – k n o w w h a t a p r e fi x f r e e s o u r c e c o d e i s , – k n o w h o w t o c a l c u l a t e a v e r a g e c o d e w o r d l e n g t h , – u n d e r s t a n d t h e l i m i t s o n s o u r c e c o d i n g , – u n d e r s t a n d t h e c o n c e p t o f u n i v e r s a l s o u r c e c o d i n g , a n d – b e a b l e t o p e r f o r m e n c o d i n g a n d d e c o d i n g a c c o r d i n g t o t h e L e m p e l - Z i v - We l c h a l g o r i t h m 2

Wh e r e a r e w e i n t h e B I G P I C T U R E ? Source coding Lecture relates to pages 179-189 in textbook. 3

Wh a t d i d S h a n n o n p r o m i s e ? • We can represent a source sequence from X, of length n, uniquely by, on the average, nH ( X ) bits 4

P r e fi x f r e e s o u r c e c o d e We say that a sequence of length l is a prefix of a sequence if the first l symbols of the latter sequence is identical to the first sequence; in particular, a sequence is a prefix of itself. Then we require that no codeword is the prefix of another codeword and call such a code a prefix-free source code . The sequence 10011 has the prefixes: 1, 10, 100, 1001, and 10011. The source code with codewords {00,01,1} is prefix-free, but {00,10,1} is not, since 1 is prefix of 10. 5

E x a m p l e Consider the source code What is the average codeword length? 6

H u ff m a n c o d e c o n s t r u c t i o n An optimal way to find a prefix-free variable-length code was discovered by David Huffman in 1951, when he worked on a term, paper as a student. Procedure: 1.Let the source symbols be nodes with respective probabilities 2.Combine the two least probable remaining nodes into a new node and calculate its probability 3.If there are more than one node left, go back to step 2. 4.Label each branch of the constructed the tree either 0 or 1. ROOT 5.Traversing the tree from the root to each symbol gives the codewords. 7

P a t h l e n g t h l e m m a In a rooted tree with probabilities, the average depth of the leaves is equal to the sum of the probabilities of the nodes (including the root). 8

E x a m p l e What is the average word length and the uncertainty of the source ROOT Average word length: 2% away from what is possible Uncertainty/entropy: 9

R e a c h i n g t h e l i m i t If we encode consecutive source symbols pairwise, that is, use the Huffman code for the source we will obtain an average codeword length per single source symbol that is closer to the uncertainty of the source, H ( U ) = 2.35. 1 0

A u n i v e r s a l s o u r c e c o d i n g a l g o r i t h m The LZW algorithm is due to Ziv, Lempel, and Welch and belongs to the class of so-called universal source-coding algorithms which means that we do not need to know the source statistics. The algorithm is easy to implement and for long sequences it approaches the uncertainty of the source; it is asymptotically optimum. 1 1

B a s i c p r o c e d u r e 1. Initialize the dictionary. 2. Find the longest string W in the dictionary that matches the current input. 3. Emit the dictionary index for W to output and remove W from the input. 4. Add W followed by the next symbol in the input to the dictionary. 5. Go to Step 2. Suppose we want to compress the sentence: DO_NOT_TROUBLE_TROUBLE_ UNTIL_TROUBLE_TROUBLES_YOU! 1 2

DO_NOT_TROUBLE_TROUBLE_ UNTIL_TROUBLE_TROUBLES_YOU! 1 3

E v a l u a t i o n Without compression we need as many as 50*8 = 400 binary digits to represent the sentence as a string of 50 ASCII symbols. If we sum the number of binary digits needed for the 38 steps shown in the table we get only 271 binary digits. A highly optimized version of the LZW algorithm we have described is used widely in practice to compress computer files on in all major operating systems. 1 4

S u m m a r y ● Mo s t n a t u r a l t y p e s o f d a t a c a n b e c o m p r e s s e d ● T h e u n c e r t a i n t y , o r e n t r o p y , o f t h e s o u r c e d e t e r m i n e s h o w m u c h w e c a n c o m p r e s s d a t a w i t h o u t l o o s i n g a n y t h i n g ● P r e fi x f r e e v a r i a b l e - l e n g t h s o u r c e c o d e s c a n b e u n i q u e l y d e c o d e d ● H u f f m a n ' s c o d e c o n s t r u c t i o n i s o p t i m a l f o r a g i v e n s e t o f s y m b o l s ● G r o u p i n g s y m b o l s c a n r e d u c e a v e r a g e c o d e w o r d l e n g t h , b u t n o t f u r t h e r d o w n t h a n w h a t i s g i v e n b y t h e u n c e r t a i n t y / e n t r o p y ● U n i v e r s a l s o u r c e c o d i n g b u i l d s a c o d i n g t a b l e o n t h e fl y , w i t h o u t k n o w i n g t h e s o u r c e p r o b a b i l i t i e s i n a d v a n c e ● L e m p e l - Z i v - We l c h i s t h e m o s t w e l l k n o w n u n i v e r s a l s o u r c e c o d e r , a p p l i e d i n a l l m o d e r n o p e r a t i n g s y s t e m s f o r c o m p r e s s i n g fi l e s 1 5

I n f o r m a t i o n T r a n s m i s s i o n - PowerPoint PPT Presentation

I n f o r m a t i o n T r a n s m i s s i o n C h a p t e r 5 , S o u r c e c o d i n g OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY L e a r n i n g o u t c o m e s A f t e r

Struktur Data & Algoritme ( Data Structures & Algorithms ) Tree: application Denny (

Greedy Algorithms, Continued Suppose T is a text of 130 million characters. What is a

22. Greedy Algorithms weight w i . The maximum weight is given as W . Input is

Chapter 6: Compression and Encryption CS105: Great Insights in Computer Science Thermostat This

Compression CISC489/689010,Lecture#5 Monday,February23

Balanced Mobiles Yassine Hamoudi , Sophie Laplante , Roberto Mantaci May 17, 2017 ENS Lyon

Compressing IP Forwarding Tables: Towards Entropy Bounds and Beyond Revised on Feb 10, 2014

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Efficient Generation of Short and Fast Repeater Tree Topologies Christoph Bartoschek, Stephan

Lecture 17 Log into Linux. Copy two subdirectories in /home/hwang/cs375/lecture17/ $ cp -r

A PEEK INSIDE RIAK Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com

CS519: Computer Networks Lecture 5, Part 3: Mar 10, 2004 Transport: TCP performance TCP

Announcements P4: Graded Will resolve all Project grading issues this week P5: File Systems

Systems for Data Science Marco Serafini COMPSCI 532 Lecture 1 Course Structure

Adrian Tate Adrian Tate Technical Lead of Scientific Libraries Technical Lead of Scientific

The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org

Our cloud is thirsty ! Shaolei Ren Florida International University sren@cs.fiu.edu 1 A

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

CompSci 514: Computer Networks L18: Datacenter Network Architectures II Xiaowei Yang 1

Topic 2 Current, Voltage and Power Prof Peter Cheung Dyson School of Design Engineering

Interprocess Communication Chester Rebeiro IIT Madras 1 Virtual Memory View During

CS 889 Advanced Topics in Human- Computer Interaction RepliCHI Scheduling Friday classes

CS 5150 Software Engineering 9. Usability and User Interfaces William Y. Arms The Importance of

I n f o r m a t i o n T r a n s m i s s i o n - PowerPoint PPT Presentation

I n f o r m a t i o n T r a n s m i s s i o n C h a p t e r 5 , S o u r c e c o d i n g OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY L e a r n i n g o u t c o m e s A f t e r

Struktur Data &amp; Algoritme ( Data Structures &amp; Algorithms ) Tree: application Denny (

Greedy Algorithms, Continued Suppose T is a text of 130 million characters. What is a

22. Greedy Algorithms weight w i . The maximum weight is given as W . Input is

Chapter 6: Compression and Encryption CS105: Great Insights in Computer Science Thermostat This

Compression CISC489/689010,Lecture#5 Monday,February23

Balanced Mobiles Yassine Hamoudi , Sophie Laplante , Roberto Mantaci May 17, 2017 ENS Lyon

Compressing IP Forwarding Tables: Towards Entropy Bounds and Beyond Revised on Feb 10, 2014

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Efficient Generation of Short and Fast Repeater Tree Topologies Christoph Bartoschek, Stephan

Lecture 17 Log into Linux. Copy two subdirectories in /home/hwang/cs375/lecture17/ $ cp -r

A PEEK INSIDE RIAK Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com

CS519: Computer Networks Lecture 5, Part 3: Mar 10, 2004 Transport: TCP performance TCP

Announcements P4: Graded Will resolve all Project grading issues this week P5: File Systems

Systems for Data Science Marco Serafini COMPSCI 532 Lecture 1 Course Structure

Adrian Tate Adrian Tate Technical Lead of Scientific Libraries Technical Lead of Scientific

The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org

Our cloud is thirsty ! Shaolei Ren Florida International University sren@cs.fiu.edu 1 A

NFSv4 Replication for Grid Storage Middleware Peter Honeyman Center for Information Technology

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

CompSci 514: Computer Networks L18: Datacenter Network Architectures II Xiaowei Yang 1

Topic 2 Current, Voltage and Power Prof Peter Cheung Dyson School of Design Engineering

Interprocess Communication Chester Rebeiro IIT Madras 1 Virtual Memory View During

CS 889 Advanced Topics in Human- Computer Interaction RepliCHI Scheduling Friday classes

CS 5150 Software Engineering 9. Usability and User Interfaces William Y. Arms The Importance of

Struktur Data & Algoritme ( Data Structures & Algorithms ) Tree: application Denny (