1
play

1 Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees - PDF document

General: Compression methods depend on data characteristic there is no universal (best) method Compression Requirements : Introduction text, ELs: lossless images may be lossy Information theory


  1. General: Compression methods depend on data characteristic � there is no universal (best) method Compression הסיחד Requirements : • Introduction • text, EL’s: lossless • images – may be lossy • Information theory • efficiency -- how may bits per byte of data? • Text compression (often in percentage) • coding should be fast, decoding superfast • IL compression DL - 2004 Compression – Beeri/Feitelson 2 DL - 2004 Compression – Beeri/Feitelson 1 Compression vs. communications: A general model for statistics-based compression: line file source destination n o Model Model i s e coder decoder Minor difference: Communication is always on-line, Same model must be used at both sides Compression is on/ off line (off-line: complete file given) Model is (often) stored in compressed file – its size affects compression efficiency DL - 2004 Compression – Beeri/Feitelson 4 DL - 2004 Compression – Beeri/Feitelson 3 ∑ Appetizer: Huffman coding > = Assume: symbol probabilities: ,..., p p ( 0) ( p 1) 1 q 1 q > Source alphabet: ,..., s s , 1 Huffman’s Algorithm (eager construction of code tree): 1 q coding alphabet: binary -- {0,1} • Allocate a node for each symbol, weight = (standard) binary coding: symbol probability Uniquely decodable • • Enter nodes into priority queue Q • Model = table (small weights first)   • efficiency: bits/ symbol  log q  • While | Q| > 1 { (no/ little compression) – Remove two first nodes (smallest weights) – Create new node, make it their parent, assign it the Can do better if symbol frequencies are known: sum of their weights frequent symbol – short code – Enter new node into Q rare symbol – long code } Minimizes the average Return: single node in Q (root of tree) DL - 2004 Compression – Beeri/Feitelson 6 DL - 2004 Compression – Beeri/Feitelson 5 1

  2. Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees used? Q: { } 1 1/2 1/4 1/2 1/4 1/8 1/8 Coding: for each symbol s, output binary path from root to leaf(s) Decoding: read incoming stream of bits, follow 1 path from root of tree. When leaf(s) reached, output s, and return to root. 1/2 1/2 Common model (stored on both sides) : 1/4 1/4 the tree 1/8 1/8 DL - 2004 Compression – Beeri/Feitelson 8 DL - 2004 Compression – Beeri/Feitelson 7 A note on Huffman trees: Expected cost bits/ symbol:   The algorithm is non-deterministic:  log q  Binary: ∑ = p l ( l length of path from root to leaf( )) s Huffman : • In each step, either node can be the left child of i i i i new parent In example: If two children of a node are exchanged, result is also a Huffman tree binary: 2 Huffman : 1/ 2x1 + ¼ x2 + 1/ 8x3 + 1/ 8x3 = 1.75 Closure under rotation w.r.t nodes • Consider 0.4, 0.2, 0.2, 0.1, 0.1 Q: what would be the tree and cost for: after 1 st step, 2 out of 3 nodes are selected � There are many Huffman trees for a given 5/ 12, 1/ 3, 1/ 6, 1/ 12 ? probability distribution DL - 2004 Compression – Beeri/Feitelson 10 DL - 2004 Compression – Beeri/Feitelson 9 A prefix code = binary tree Concepts: variable length code: (e.g. Huffman) Every binary tree with q leaves is a prefix code for uniquely decodable code: each legal code sequence q symbols, lengths of code words = lengths of is generated by a unique source sequence paths ידיימ instantaneous/ prefix code end of code of each symbol can be recognized Kraft inequality: Examples: l 1 ,..., q l Exists a q-leaf tree with path lengths 0, 010, 01, 10 − ≤ ∑ l 2 1 iff i 10, 00, 11, 110 0, 10, 110, 111 (Huffman of example) (comma code) = 1 iff tree is complete 0, 01, 011, 111 (inverted comma code) DL - 2004 Compression – Beeri/Feitelson 12 DL - 2004 Compression – Beeri/Feitelson 11 2

  3. If T is not complete (every node has 0/ 2 children) Proof : it has a node with a single child � assume exists a tree T � Can be “shortened” = − ≤ ∑ Take T’ to be the full tree of depth l max( ) l l i 2 i 1 new tree still satisfies − < ∑ The number of its leaves: 2 l l 2 1 i hence given tree must satisfy − A leaf of T, at distance from root has l l l 2 i � Only complete trees have equality i leaves of T’ under it T Comment: l i Sum on all leaves of T: In general a prefix code that is not a complete l tree is dominated by a tree with smaller cost − ∑ ∑ l l − ≤ ⇒ − ≤ l l l l 2 2 2 1 i i i From now: tree are complete Full: all paths same length DL - 2004 Compression – Beeri/Feitelson 14 DL - 2004 Compression – Beeri/Feitelson 13 − ≤ ∑  : Assume l MacMillan Theorem : 2 1 i exists a uniquely decodable code with lengths − ≤ ∑ = ∃ ≠ l Lemma: if max( ) l l then k j s.t. = l l 2 1 l 1 ,..., q l iff i i j j k Replace these two by their sum (hence q-1 Corollary: when there is a uniquely decodeable lengths) and use induction code, there is also a prefix code (same cost) � No need to think about the first class − = ∑ l Assume must the tree be complete? 2 1 i Uniquely decodable prefix DL - 2004 Compression – Beeri/Feitelson 16 DL - 2004 Compression – Beeri/Feitelson 15 Q> 1: On optimality of Huffman: ∑ In Huffman tree, there are two maximal paths Cost of a tree/ code T: L(T) = p l i i that end in sibling nodes Claim: if a tree T does not satisfy In T, the paths for last two symbols are longest ≥ ≥ ⇒ ≤ ≤ (by (* )) but their ends may not be siblings (*) p ... p l ... l 1 q 1 q l But, T is complete, hence the leaf with has a then it is dominated by a tree with smaller cost q sibling with same length; exchange with the leaf ≤ Claim: for any T, L(T ) L(T) corresponding to l − Huff q 1 Proof: can assume T satisfies (* ) Now, in both trees, these two longest paths can Use induction: be replaced by their parents � Case of q-1 (induction hypothesis) Q= 2: both trees have lengths 1,1 DL - 2004 Compression – Beeri/Feitelson 18 DL - 2004 Compression – Beeri/Feitelson 17 3

  4. Summary: • Huffman trees are optimal hence satisfy (* ) • Any two Huffman trees have equal costs • Huffman trees have min cost among all trees (codes) DL - 2004 Compression – Beeri/Feitelson 19 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend