1 Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees - PDF document

General: Compression methods depend on data characteristic � there is no universal (best) method Compression הסיחד Requirements : • Introduction • text, EL’s: lossless • images – may be lossy • Information theory • efficiency -- how may bits per byte of data? • Text compression (often in percentage) • coding should be fast, decoding superfast • IL compression DL - 2004 Compression – Beeri/Feitelson 2 DL - 2004 Compression – Beeri/Feitelson 1 Compression vs. communications: A general model for statistics-based compression: line file source destination n o Model Model i s e coder decoder Minor difference: Communication is always on-line, Same model must be used at both sides Compression is on/ off line (off-line: complete file given) Model is (often) stored in compressed file – its size affects compression efficiency DL - 2004 Compression – Beeri/Feitelson 4 DL - 2004 Compression – Beeri/Feitelson 3 ∑ Appetizer: Huffman coding > = Assume: symbol probabilities: ,..., p p ( 0) ( p 1) 1 q 1 q > Source alphabet: ,..., s s , 1 Huffman’s Algorithm (eager construction of code tree): 1 q coding alphabet: binary -- {0,1} • Allocate a node for each symbol, weight = (standard) binary coding: symbol probability Uniquely decodable • • Enter nodes into priority queue Q • Model = table (small weights first)   • efficiency: bits/ symbol  log q  • While | Q| > 1 { (no/ little compression) – Remove two first nodes (smallest weights) – Create new node, make it their parent, assign it the Can do better if symbol frequencies are known: sum of their weights frequent symbol – short code – Enter new node into Q rare symbol – long code } Minimizes the average Return: single node in Q (root of tree) DL - 2004 Compression – Beeri/Feitelson 6 DL - 2004 Compression – Beeri/Feitelson 5 1

Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees used? Q: { } 1 1/2 1/4 1/2 1/4 1/8 1/8 Coding: for each symbol s, output binary path from root to leaf(s) Decoding: read incoming stream of bits, follow 1 path from root of tree. When leaf(s) reached, output s, and return to root. 1/2 1/2 Common model (stored on both sides) : 1/4 1/4 the tree 1/8 1/8 DL - 2004 Compression – Beeri/Feitelson 8 DL - 2004 Compression – Beeri/Feitelson 7 A note on Huffman trees: Expected cost bits/ symbol:   The algorithm is non-deterministic:  log q  Binary: ∑ = p l ( l length of path from root to leaf( )) s Huffman : • In each step, either node can be the left child of i i i i new parent In example: If two children of a node are exchanged, result is also a Huffman tree binary: 2 Huffman : 1/ 2x1 + ¼ x2 + 1/ 8x3 + 1/ 8x3 = 1.75 Closure under rotation w.r.t nodes • Consider 0.4, 0.2, 0.2, 0.1, 0.1 Q: what would be the tree and cost for: after 1 st step, 2 out of 3 nodes are selected � There are many Huffman trees for a given 5/ 12, 1/ 3, 1/ 6, 1/ 12 ? probability distribution DL - 2004 Compression – Beeri/Feitelson 10 DL - 2004 Compression – Beeri/Feitelson 9 A prefix code = binary tree Concepts: variable length code: (e.g. Huffman) Every binary tree with q leaves is a prefix code for uniquely decodable code: each legal code sequence q symbols, lengths of code words = lengths of is generated by a unique source sequence paths ידיימ instantaneous/ prefix code end of code of each symbol can be recognized Kraft inequality: Examples: l 1 ,..., q l Exists a q-leaf tree with path lengths 0, 010, 01, 10 − ≤ ∑ l 2 1 iff i 10, 00, 11, 110 0, 10, 110, 111 (Huffman of example) (comma code) = 1 iff tree is complete 0, 01, 011, 111 (inverted comma code) DL - 2004 Compression – Beeri/Feitelson 12 DL - 2004 Compression – Beeri/Feitelson 11 2

If T is not complete (every node has 0/ 2 children) Proof : it has a node with a single child � assume exists a tree T � Can be “shortened” = − ≤ ∑ Take T’ to be the full tree of depth l max( ) l l i 2 i 1 new tree still satisfies − < ∑ The number of its leaves: 2 l l 2 1 i hence given tree must satisfy − A leaf of T, at distance from root has l l l 2 i � Only complete trees have equality i leaves of T’ under it T Comment: l i Sum on all leaves of T: In general a prefix code that is not a complete l tree is dominated by a tree with smaller cost − ∑ ∑ l l − ≤ ⇒ − ≤ l l l l 2 2 2 1 i i i From now: tree are complete Full: all paths same length DL - 2004 Compression – Beeri/Feitelson 14 DL - 2004 Compression – Beeri/Feitelson 13 − ≤ ∑  : Assume l MacMillan Theorem : 2 1 i exists a uniquely decodable code with lengths − ≤ ∑ = ∃ ≠ l Lemma: if max( ) l l then k j s.t. = l l 2 1 l 1 ,..., q l iff i i j j k Replace these two by their sum (hence q-1 Corollary: when there is a uniquely decodeable lengths) and use induction code, there is also a prefix code (same cost) � No need to think about the first class − = ∑ l Assume must the tree be complete? 2 1 i Uniquely decodable prefix DL - 2004 Compression – Beeri/Feitelson 16 DL - 2004 Compression – Beeri/Feitelson 15 Q> 1: On optimality of Huffman: ∑ In Huffman tree, there are two maximal paths Cost of a tree/ code T: L(T) = p l i i that end in sibling nodes Claim: if a tree T does not satisfy In T, the paths for last two symbols are longest ≥ ≥ ⇒ ≤ ≤ (by (* )) but their ends may not be siblings (*) p ... p l ... l 1 q 1 q l But, T is complete, hence the leaf with has a then it is dominated by a tree with smaller cost q sibling with same length; exchange with the leaf ≤ Claim: for any T, L(T ) L(T) corresponding to l − Huff q 1 Proof: can assume T satisfies (* ) Now, in both trees, these two longest paths can Use induction: be replaced by their parents � Case of q-1 (induction hypothesis) Q= 2: both trees have lengths 1,1 DL - 2004 Compression – Beeri/Feitelson 18 DL - 2004 Compression – Beeri/Feitelson 17 3

Summary: • Huffman trees are optimal hence satisfy (* ) • Any two Huffman trees have equal costs • Huffman trees have min cost among all trees (codes) DL - 2004 Compression – Beeri/Feitelson 19 4

1 Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees - PDF document

General: Compression methods depend on data characteristic there is no universal (best) method Compression Requirements : Introduction text, ELs: lossless images may be lossy Information theory

Outline Memory safety and security CSci 4271W Stack buffer overflow Development of Secure

Homotopy Type Theory Steve Awodey Carnegie Mellon University Logic Colloquium 2011 Barcelona

Computational content of the fan theorem for coconvex bars Helmut Schwichtenberg Mathematisches

A Mechanical Soundness Proof for Subtyping over Recursive Types Timothy Jones, David Pearce

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured

Your Data in the Cloud Week 7 Frank Chen | Spring 2017 Frank Chen | Spring 2017 Agenda

The setting of the research ( s ) { 0 , 1 } 1 ( ( s )) = s s S

Realistic analysis of algorithms Application to some popular algorithms Julien Clment

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski

BU CS 332 Theory of Computation Lecture 2: Reading: Deterministic Finite Automata

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of

. there was Justesen (and earlier, Shannon and Berger .) Peter H.N. de With

Version (Source Code) Control SWEN-250 Overview Motivation why is version control useful?

Version Control Systems Introduction to Git Dennis Klein Scientific IT GSI Darmstadt Panda

Revision Control with GIT Eric McCreath Revision Control Systems There are a large number of

DAQ Software Management Plans Pengfei Ding DUNE DAQ Meeting November 4 th , 2019 Outline How

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

Coding for Everyone How your library can help anyone learn to code July 19, 2016 Kelly Smith

Hypergraph-based Coding Schemes for Two Source Coding Problems under Maximal Distortion Sourya

Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai

Conjugate prior summary Distribution Likelihood p ( x | ) Prior p ( ) Distribution (1

1 Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees - PDF document

General: Compression methods depend on data characteristic there is no universal (best) method Compression Requirements : Introduction text, ELs: lossless images may be lossy Information theory

Outline Memory safety and security CSci 4271W Stack buffer overflow Development of Secure

Homotopy Type Theory Steve Awodey Carnegie Mellon University Logic Colloquium 2011 Barcelona

Computational content of the fan theorem for coconvex bars Helmut Schwichtenberg Mathematisches

A Mechanical Soundness Proof for Subtyping over Recursive Types Timothy Jones, David Pearce

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured

Your Data in the Cloud Week 7 Frank Chen | Spring 2017 Frank Chen | Spring 2017 Agenda

The setting of the research ( s ) { 0 , 1 } 1 ( ( s )) = s s S

Realistic analysis of algorithms Application to some popular algorithms Julien Clment

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski

BU CS 332 Theory of Computation Lecture 2: Reading: Deterministic Finite Automata

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Formal Models of Language Paula Buttery Dept of Computer Science &amp; Technology, University of

. there was Justesen (and earlier, Shannon and Berger .) Peter H.N. de With

Version (Source Code) Control SWEN-250 Overview Motivation why is version control useful?

Version Control Systems Introduction to Git Dennis Klein Scientific IT GSI Darmstadt Panda

Revision Control with GIT Eric McCreath Revision Control Systems There are a large number of

DAQ Software Management Plans Pengfei Ding DUNE DAQ Meeting November 4 th , 2019 Outline How

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

Coding for Everyone How your library can help anyone learn to code July 19, 2016 Kelly Smith

Hypergraph-based Coding Schemes for Two Source Coding Problems under Maximal Distortion Sourya

Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai

Conjugate prior summary Distribution Likelihood p ( x | ) Prior p ( ) Distribution (1

Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of