Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: - PowerPoint PPT Presentation

Coding Theorems Coding Theorems Huffman Coding Huffman Coding Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding Theorem; Kraft Inequality Huffman Coding Shannon Information Source Coding Theorem Frank Keller School of Informatics 2 Huffman Coding University of Edinburgh keller@inf.ed.ac.uk March 13, 2006 Frank Keller Formal Modeling in Cognitive Science 1 Frank Keller Formal Modeling in Cognitive Science 2 Kraft Inequality Kraft Inequality Coding Theorems Coding Theorems Shannon Information Shannon Information Huffman Coding Huffman Coding Source Coding Theorem Source Coding Theorem Kraft Inequality Kraft Inequality Problem: construct an instantaneous code of minimum expected We can illustrate the Kraft Inequality using a coding tree. Start length for a given random variable. The following inequality holds: with a tree that contains all three-bit codes: Theorem: Kraft Inequality For an instantaneous code C for a random variable X , the code ✟ ❍ ✟✟✟✟✟ ❍ ❍ word lengths l ( x ) must satisfy the inequality: ❍ ❍ ❍ 0 1 2 − l ( x ) ≤ 1 � ✟ ❍ ✟ ❍ ✟✟ ❍ ✟✟ ❍ ❍ ❍ x ∈ X 00 01 10 11 ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ Conversely, if the code word lengths satisfy this inequality, then ✟ ✟ ✟ ✟ 000 001 010 011 100 101 110 111 there exists an instantaneous code with these word lengths. Frank Keller Formal Modeling in Cognitive Science 3 Frank Keller Formal Modeling in Cognitive Science 4

Kraft Inequality Kraft Inequality Coding Theorems Coding Theorems Shannon Information Shannon Information Huffman Coding Huffman Coding Source Coding Theorem Source Coding Theorem Kraft Inequality Kraft Inequality Now if we decide to use the code word 10: For each code word, prune all the branches below it (as they ✟✟ ✟ ❍ ❍ ❍ violate the prefix condition). For example, if we decide to use the 0 1 code word 0, we get the following tree: ✟✟ ❍ ❍ 10 11 ✟ ❍ ✟✟✟ ❍ ❍ ✟ ❍ ❍ ❍ ✟ 0 1 110 111 ✟ ❍ ✟✟ ❍ ❍ The remaining leaves constitute a prefix code. Kraft inequality: 10 11 ✟ ❍ ❍ ✟ ❍ ❍ 2 − l ( x ) = 2 − 1 + 2 − 2 + 2 − 3 + 2 − 3 = 1 2 + 1 4 + 1 8 + 1 ✟ ✟ � 8 = 1 100 101 110 111 x ∈ X Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6 Kraft Inequality Kraft Inequality Coding Theorems Coding Theorems Shannon Information Shannon Information Huffman Coding Huffman Coding Source Coding Theorem Source Coding Theorem Shannon Information Shannon Information Example The Kraft inequality tells us that an instantaneous code exists. But we are interested in finding the optimal code, i.e., one that Consider the following random variable with the optimal code minimized the expected code length L ( C ) . lengths given by the Shannon information: Theorem: Shannon Information a b c d x 1 1 1 1 f ( x ) The expected length L ( C ) of a code C for the random variable X 2 4 8 8 l ( x ) 1 2 3 3 with distribution f ( x ) is minimal if the code word lengths l ( x ) are given by: The expected code length L ( C ) for the optimal code is: l ( x ) = − log f ( x ) � � This quantity is called the Shannon information. L ( C ) = f ( x ) l ( x ) = − f ( x ) log f ( x ) = 1 . 75 x ∈ X x ∈ X Shannon information is pointwise entropy. (See mutual information Note that this is the same as the entropy of X , H ( X ). and pointwise mutual information.) Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8

Kraft Inequality Kraft Inequality Coding Theorems Coding Theorems Shannon Information Shannon Information Huffman Coding Huffman Coding Source Coding Theorem Source Coding Theorem Lower Bound on Expected Length Upper Bound on Expected Length Of course we are more interested in finding an upper bound, i.e., a code that has a maximum expected length: This observation about the relation between the entropy and the expected length of the optimal code can be generalized: Theorem: Source Coding Theorem Let C a code with optimal code lengths, i.e, l ( x ) = − log f ( x ) for Theorem: Lower Bound on Expected Length the random variable X with distribution f ( x ). Then the expected Let C be an instantaneous code for the random variable X . Then length L ( C ) is bounded by: the expected code length L ( C ) is bounded by: H ( X ) ≤ L ( C ) < H ( X ) + 1 L ( C ) ≥ H ( X ) Why is the upper bound H ( X ) + 1 and not H ( X )? Because sometimes the Shannon information gives us fractional lengths; we have to round up. Frank Keller Formal Modeling in Cognitive Science 9 Frank Keller Formal Modeling in Cognitive Science 10 Kraft Inequality Kraft Inequality Coding Theorems Coding Theorems Shannon Information Shannon Information Huffman Coding Huffman Coding Source Coding Theorem Source Coding Theorem Source Coding Theorem Source Coding Theorem Example Consider the following random variable with the optimal code Example lengths given by the Shannon information: Now consider the following code that tries to the code words on the optimal code lengths as closely as possible: x a b c d e f ( x ) 0.25 0.25 0.2 0.15 0.15 x a b c d e l ( x ) 2.0 2.0 2.3 2.7 2.7 C ( x ) 00 10 11 010 011 l ( x ) 2 2 2 3 3 The entropy of this random variable is H ( X ) = 2 . 2855. The source coding theorem tells us: The expected code length for this code is therefore L ( C ) = 2 . 30. This is very close to the optimal code length of H ( X ) = 2 . 2855. 2 . 2855 ≤ L ( C ) < 3 . 2855 where L ( C ) is the code length of the optimal code. Frank Keller Formal Modeling in Cognitive Science 11 Frank Keller Formal Modeling in Cognitive Science 12

Coding Theorems Coding Theorems Huffman Coding Huffman Coding Huffman Coding Huffman Coding 1 Find the two symbols with the smallest probability and The source coding theorem tells us the properties of the optimal combine them into a new symbol and add their probabilities. code, but not how to find it. A number of algorithms exists for this. 2 Repeat step (2) until there is only one symbol left with a Here, we consider Huffman coding , an algorithm that constructs a probability of 1. code with the following properties: 3 Draw all the symbols in the form of a tree which branches instantaneous (prefix code); every time two symbols are combined. optimal (shortest expected length code). 4 Label all the left branches of the tree with a 0 and all the right branches with a 1. The expected code length of the Huffman code is bounded by 5 The code for a symbol is the sequence of 0s and 1s that lead H ( X ) + 1. to it on the tree, starting from the root (with probability 1). Frank Keller Formal Modeling in Cognitive Science 13 Frank Keller Formal Modeling in Cognitive Science 14 Coding Theorems Coding Theorems Huffman Coding Huffman Coding Huffman Coding Huffman Coding Example Example Assume we want to encode the set of all vowels, and we have the following probability distribution: u 0.07 i 0.09 0 1 x a e i o u ui 0.16 a 0.12 f ( x ) 0.12 0.42 0.09 0.30 0.07 0 1 − log f ( x ) 3.06 1.25 3.47 1.74 3.84 uia 0.28 o 0.30 The Huffman code for this distribution is: 0 1 x a e i o u uiao 0.58 e 0.42 0 1 C ( x ) 001 1 0001 01 0000 l ( x ) 3 1 4 2 4 uiaoe 1.0 Generate this code by drawing the Huffman coding tree. Frank Keller Formal Modeling in Cognitive Science 15 Frank Keller Formal Modeling in Cognitive Science 16

Coding Theorems Huffman Coding Summary The optimal length of a code word is given by its Shannon information: − log f ( x ); source coding theorem: the expected length of the optimal code is bounded by entropy: H ( X ) ≤ L ( C ) < H ( X ) + 1. Huffman Coding is an algorithm for finding an optimal instantaneous code for a given random variable. Frank Keller Formal Modeling in Cognitive Science 17

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: - PowerPoint PPT Presentation

Coding Theorems Coding Theorems Huffman Coding Huffman Coding Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding Theorem; Kraft Inequality Huffman Coding Shannon Information Source Coding

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

Formal Methods and Cryptography Lecture 25 Formal Methods Formal Methods Logical foundations

Formal Methods and Cryptography Lecture 24 1 Formal Methods 2 Formal Methods Logical

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science & Educational Technology I

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

1 PSYC530 wk01 handouts Cognitive Engineering Cognitive Science Applied to Human Factors

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Formal Modeling in Cognitive Science Lecture 17: Sample Spaces, Events, Probabilities Steve

Formal Modeling in Cognitive Science Lecture 18: Conditional Probability; Bayes Theorem Steve

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Greedy Algorithms ms Jeevani Goone*llake University of

Math236 Discrete Maths with Applications P. Ittmann UKZN, Pietermaritzburg Semester 1, 2012

MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro More

! The variable-length code is a prefix min f ( c ) d ( c ) code : no binary string is

Chapter 11: Text Indexing and Matching There were 5 Exabytes of information created between the

Compression, Information 2. binary code : assigns a string of 0 s and 1 s to each and Entropy

Catch Me If You Can: A Practical Framework to Evade Censorship in Information-Centric Networks

CS4102 Algorithms Fall 2018 Warm up Decode the line below into English (hint: use Google or

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: - PowerPoint PPT Presentation

Coding Theorems Coding Theorems Huffman Coding Huffman Coding Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding Theorem; Kraft Inequality Huffman Coding Shannon Information Source Coding

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

Formal Methods and Cryptography Lecture 25 Formal Methods Formal Methods Logical foundations

Formal Methods and Cryptography Lecture 24 1 Formal Methods 2 Formal Methods Logical

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science &amp; Educational Technology I

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

1 PSYC530 wk01 handouts Cognitive Engineering Cognitive Science Applied to Human Factors

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Formal Modeling in Cognitive Science Lecture 17: Sample Spaces, Events, Probabilities Steve

Formal Modeling in Cognitive Science Lecture 18: Conditional Probability; Bayes Theorem Steve

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Greedy Algorithms ms Jeevani Goone*llake University of

Math236 Discrete Maths with Applications P. Ittmann UKZN, Pietermaritzburg Semester 1, 2012

MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro More

! The variable-length code is a prefix min f ( c ) d ( c ) code : no binary string is

Chapter 11: Text Indexing and Matching There were 5 Exabytes of information created between the

Compression, Information 2. binary code : assigns a string of 0 s and 1 s to each and Entropy

Catch Me If You Can: A Practical Framework to Evade Censorship in Information-Centric Networks

CS4102 Algorithms Fall 2018 Warm up Decode the line below into English (hint: use Google or

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science & Educational Technology I