Huffman Trees To save space when storing it. Greedy Algorithm for - PowerPoint PPT Presentation

Data compression Compression reduces the size of a file: Huffman Trees ・ To save space when storing it. Greedy Algorithm for Data Compression ・ To save time when transmitting it. ・ Most files have lots of redundancy. Tyler Moore Who needs compression? ・ Moore's law: # transistors on a chip doubles every 18–24 months. CS 2123, The University of Tulsa ・ Parkinson's law: data expands to fill space available. ・ Text, images, sound, video, … “ Everyday, we create 2.5 quintillion bytes of data—so much that 90% of the data in the world today has been created in the last Some slides created by or adapted from Dr. Kevin Wayne. For more information see two years alone. ” — IBM report on big data (2011) https://www.cs.princeton.edu/courses/archive/fall12/cos226/lectures.php Basic concepts ancient (1950s), best technology recently developed. 3 Applications Lossless compression and expansion Generic file compression. Message. Binary data B we want to compress. Compress. Generates a "compressed" representation C ( B ) . ・ Files: GZIP , BZIP , 7z. ・ Archivers: PKZIP . Expand. Reconstructs original bitstream B . uses fewer bits (you hope) ・ File systems: NTFS, HFS+, ZFS. Compress Expand Multimedia. bitstream B compressed version C(B) original bitstream B ・ Images: GIF , JPEG. 0110110101... 1101011111... 0110110101... ・ Sound: MP3. ・ Video: MPEG, DivX™, HDTV . Basic model for data compression Communication. ・ ITU-T T4 Group 3 Fax. Compression ratio. Bits in C ( B ) / bits in B . ・ V.42bis modem. ・ Skype. Ex. 50–75% or better compression ratio for natural language. Databases. Google, Facebook, .... 4 5

Rdenudcany in Enlgsih lnagugae Variable-length codes Q. How mcuh rdenudcany is in the Enlgsih lnagugae? Use different number of bits to encode different chars. “ ... randomising letters in the middle of words [has] little or no Ex. Morse code: • • • − − − • • • effect on the ability of skilled readers to understand the text. This is easy to denmtrasote. In a pubiltacion of New Scnieitst you Issue. Ambiguity. could ramdinose all the letetrs, keipeng the first two and last two SOS ? the same, and reibadailty would hadrly be aftcfeed. My ansaylis V7 ? did not come to much beucase the thoery at the time was for IAMIE ? shape and senqeuce retigcionon. Saberi's work sugsegts we may EEWNI ? have some pofrweul palrlael prsooscers at work. The resaon for this is suerly that idnetiyfing coentnt by paarllel prseocsing speeds up regnicoiton. We only need the first and last two letetrs to spot chganes in meniang. ” — Graham Rawlinson In practice. Use a medium gap to separate codewords. codeword for S is a prefix of codeword for V A. Quite a bit. 14 19 Variable-length codes Prefix-free codes: trie representation Q. How do we avoid ambiguity? Q. How to represent the prefix-free code? A. Ensure that no codeword is a prefix of another. A. A binary trie! ・ Chars in leaves. Ex 1. Fixed-length code. ・ Codeword is path from root to leaf. Ex 2. Append special stop char to each codeword. Ex 3. General prefix-free code. Trie representation Trie representation Trie representation Codeword table Codeword table Codeword table Trie representation Codeword table Trie representation Trie representation key value key value key value key value ! 101 0 1 1 ! 101 0 0 1 1 1 1 ! 101 0 1 1 ! 101 0 0 1 1 1 1 A 0 A 0 A A 11 A A A 11 B 0 0 1 1 B 0 0 0 0 1 1 1 1 1111 1111 B 00 0 0 1 1 0 0 1 1 B 00 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 C 110 C 110 C C 010 B A 010 B B A A 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 1 D 100 D 100 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 D 100 D 100 D ! D D ! ! R 1110 C R 1110 C C C C C R 011 R D ! R 011 R R D D ! ! 0 0 1 1 0 0 0 0 1 1 1 1 R B R R B B Compressed bitstring Compressed bitstring Compressed bitstring Compressed bitstring 29 bits 29 bits 11000111101011100110001111101 11000111101011100110001111101 30 bits 30 bits 011111110011001000111111100101 011111110011001000111111100101 A B R A C A D A B R A ! A B R A C A D A B R A ! A B RA CA DA B RA ! A B RA CA DA B RA ! Two pre fi x-free codes Two pre fi x-free codes 20 21

Average weighted code length Prefix-free codes: compression and expansion Compression. ・ Method 1: start at leaf; follow path up to the root; print bits in reverse. ・ Method 2: create ST of key-value pairs. Expansion. Definition ・ Start at root. Given a set of symbols s ∈ S and corresponding frequencies f s where ・ Go left if bit is 0; go right if 1. � s ∈ S f s = 1, the average weighted code length using a binary trie is ・ If leaf node, print char and return to root. � s ∈ S f s · Depth ( s ). Trie representation Codeword table Codeword table Trie representation key value key value ! 101 0 1 1 ! 101 0 1 1 A 0 A A 11 B 0 0 1 1 1111 B 00 0 0 1 1 0 0 1 1 C 110 C 010 B A 0 1 1 0 0 1 1 D 100 0 0 1 1 0 1 1 D 100 D ! R 1110 C C R 011 R D ! 0 0 1 1 R B Compressed bitstring Compressed bitstring 29 bits 11000111101011100110001111101 30 bits 011111110011001000111111100101 A B R A C A D A B R A ! A B RA CA DA B RA ! Two pre fi x-free codes 22 2 / 10 Shannon-Fano Codes Exercise Shannon-Fano codes Q. How to find best prefix-free code? Shannon-Fano algorithm: ・ Partition symbols S into two subsets S 0 and S 1 of (roughly) equal freq. ・ Codewords for symbols in S 0 start with 0 ; for symbols in S 1 start with 1 . ・ Recur in S 0 and S 1 . char freq encoding char freq encoding A 5 0... B 2 1... C 1 0... D 1 1... R 2 1... S 0 = codewords starting with 0 ! 1 1... S 1 = codewords starting with 1 Problem 1. How to divide up symbols? Problem 2. Not optimal! 27 3 / 10

Procedure for Creating Huffman Tree and Codes (RLW) Huffman Code Examples 1 Initialize each symbol into a one-node tree with weight corresponding to the probability of the symbol’s occurrence 2 REPEAT a. Select the two trees with smallest weights (break ties randomly). b. Combine two trees into one tree whose root is the sum of weights of two trees UNTIL one tree remains 3 Assign 0 to left edge and 1 to right edge in tree. 4 Huffman code is the binary value constructed from the path from root to leaf 4 / 10 5 / 10 Huffman Coding: Implementation Huffman Coding: Implementation We will use nested lists to represent the Huffman tree structure in Python To efficiently implement Huffman codes, must use a priority queue Place trees onto the queue with associated frequency Add merged trees onto the priority queue with updated frequencies 6 / 10 7 / 10

Huffman Coding: Implementing the Tree Huffman Coding: Implementing the Codes def codes ( tree , p r e f i x = ”” ) : len ( t r e e ) == 1: i f return [ tree , p r e f i x ] from heapq import heapify , heappush , heappop return codes ( t r e e [ 0 ] , p r e f i x+”0”)+ \ def huffman ( seq , f r q ) : codes ( t r e e [ 1 ] , p r e f i x+”1” ) t r e e s = l i s t ( zip ( frq , seq )) codesd ( t r ) : def h e a p i f y ( t r e e s ) # A min − heap based on f r e q cmap = {} len ( t r e e s ) > 1: while # U n t i l a l l are combined codesh ( tree , p r e f i x = ”” ) : def fa , a = heappop ( t r e e s ) # Get the two s m a l l e s t t r e e s i f len ( t r e e ) == 1: fb , b = heappop ( t r e e s ) cmap [ t r e e ]= p r e f i x heappush ( trees , ( fa+fb , [ a , b ] ) ) # Combine and re − add else : t r e e s [0][ − 1] return codesh ( t r e e [ 0 ] , p r e f i x+”0” ) codesh ( t r e e [ 1 ] , p r e f i x+”1” ) codesh ( tr , ”” ) return cmap 8 / 10 9 / 10 Huffman Coding: Implementation seq = ” abcdefghi ” f r q = [4 ,5 ,6 ,9 ,11 ,12 ,15 ,16 ,20] htree = huffman ( seq , f r q ) print htree codes ( htree ) print ch = codesd ( htree ) ch print t e x t = ” abbafabgee ” text , ” encodes to : ” print print ”” . j o i n ( [ ch [ x ] for x in t e x t ] ) ””” [ [ ’ i ’ , [ [ ’ a ’ , ’ b ’ ] , ’ e ’ ] ] , [ [ ’ f ’ , ’ g ’ ] , [ [ ’ c ’ , ’ d ’ ] , ’ h ’ ] ] ] [ ’ i ’ , ’00 ’ , ’ a ’ , ’0100 ’ , ’ b ’ , ’0101 ’ , ’ e ’ , ’011 ’ , ’ f ’ , ’100 ’ , ’ g ’ , ’101 ’ , ’ c ’ , ’1100 ’ , ’ d ’ , ’1101 ’ , ’ h ’ , ’111 ’] { ’ a ’: ’0100 ’ , ’ c ’: ’1100 ’ , ’ b ’: ’0101 ’ , ’ e ’: ’011 ’ , ’d ’: ’1101 ’ , ’ g ’: ’101 ’ , ’ f ’: ’100 ’ , ’ i ’ : ’ 0 0 ’ , ’ h ’: ’111 ’ } abbafabgee encodes to : 010001010101010010001000101101011011 10 / 10

Huffman Trees To save space when storing it. Greedy Algorithm for - PowerPoint PPT Presentation

Data compression Compression reduces the size of a file: Huffman Trees To save space when storing it. Greedy Algorithm for Data Compression To save time when transmitting it. Most files have lots of redundancy. Tyler Moore Who needs

Huffman Coding Variable Rate Codes Example: David A. Huffman (1951) Huffman coding uses

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

1 Data structures for decoder: Construction of canonical Huffman: (sketch) The array

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Welcome to M2 SCCI 2014-2015 Promotion David Albert Huffman David A. Huffman(1925-1999) [Photo:

PTAS for Huffman coding with unequal letter costs Mordecai Golin (HKUST), Claire Mathieu (Brown)

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Logistics Project 2 Trees IV Minimal submission due Sunday Please dont miss the

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

The number of spanning trees of random 2 -trees Stephan Wagner (joint work with Elmar Teufl)

Recent Developments in Video Compression Standardization CVPR CLIC Workshop, Salt Lake City,

Compression: Huffmans Algorithm Greg Plaxton Theory in Programming Practice, Spring 2005

Compression of Propositional Resolution Proofs by Lowering Subproofs Joseph Boudou 1 Bruno

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering

Encryption at Rest in ZFS Tom Caputi tcaputi@datto.com Overview of Encryption Implementation 2

More Transport, Please! More Transport, Please! Kory Draughn June 9-12, 2020 Software Developer

An Introduction to Information Theory Carlton Downey November 12, 2013 Motivation Information

An Empirical Evaluation of Simple DTD-Conscious Compression Techniques James Cheney Database