Coding and Information Theory Chris Williams School of Informatics, - PowerPoint PPT Presentation

Coding and Information Theory Chris Williams School of Informatics, University of Edinburgh November 2007 1 / 20

Overview What is information theory? Entropy Coding Rate-distortion theory Mutual information Channel capacity Reading: Bishop §1.6 2 / 20

Information Theory Shannon (1948): Information theory is concerned with: Source coding , reducing redundancy by modelling the structure in the data Channel coding , how to deal with “noisy” transmission Key idea is prediction Source coding: redundancy means predictability of the rest of the data given part of it Channel coding: Predict what we want given what we have been given 3 / 20

Information Theory Textbooks Elements of Information Theory. T. M. Cover and J. A. Thomas. Wiley, 1991. [comprehensive] Coding and Information Theory. R. W. Hamming. Prentice-Hall, 1980. [introductory] Information Theory, Inference and Learning Algorithms D. J. C. MacKay, CUP (2003), available online (viewing only) http://www.inference.phy.cam.ac.uk/mackay/itila 4 / 20

Entropy A discrete random variable X takes on values from an alphabet X , and has probability mass function P ( x ) = P ( X = x ) for x ∈ X . The entropy H ( X ) of X is defined as � H ( X ) = − P ( x ) log P ( x ) x ∈X convention: for P ( x ) = 0, 0 × log 1 / 0 ≡ 0 The entropy measures the information content or “uncertainty” of X . Units: log 2 ⇒ bits; log e ⇒ nats. 5 / 20

Joint entropy, conditional entropy � H ( X , Y ) = − P ( x , y ) log P ( x , y ) x , y � H ( Y | X ) = P ( x ) H ( Y | X = x ) x � � = − P ( x ) P ( y | x ) log P ( y | x ) x y = − E P ( x , y ) log P ( y | x ) H ( X , Y ) = H ( X ) + H ( Y | X ) If X , Y are independent H ( X , Y ) = H ( X ) + H ( Y ) 6 / 20

Coding theory A coding scheme C assigns a code C ( x ) to every symbol x ; C ( x ) has length ℓ ( x ) . The expected code length L ( C ) of the code is � L ( C ) = p ( x ) ℓ ( x ) x ∈X Theorem 1: Noiseless coding theorem The expected length L ( C ) of any instantaneous code for X is bounded below by H ( X ) , i.e. L ( C ) ≥ H ( X ) Theorem 2 There exists an instantaneous code such that H ( X ) ≤ L ( C ) < H ( X ) + 1 7 / 20

Practical coding methods How can we come close to the lower bound ? Huffman coding H ( X ) ≤ L ( C ) < H ( X ) + 1 Use blocking to reduce the extra bit to an arbitrarily small amount. Arithmetic coding 8 / 20

Coding with the wrong probabilities Say we use the wrong probabilities q i to construct a code. Then � L ( C q ) = − p i log q i i But p i log p i � > 0 if q i � = p i q i i ⇒ L ( C q ) − H ( X ) > 0 i.e. using the wrong probabilities increases the minimum attainable average code length. 9 / 20

Coding real data So far we have discussed coding sequences if iid random variables. But, for example, the pixels in an image are not iid RVs. So what do we do ? Consider an image having N pixels, each of which can take on k grey-level values, as a single RV taking on k N values. We would then need to estimate probabilities for all k N different images in order to code a particular image properly, which is rather difficult for large k and N . One solution is to chop images into blocks, e.g. 8 × 8 pixels, and code each block separately. 10 / 20

• Predictive encoding – try to predict the current pixel value given nearby context. Successful prediction reduces uncertainty. H ( X 1 , X 2 ) = H ( X 1 ) + H ( X 2 | X 1 ) 11 / 20

Rate-distortion theory What happens if we can’t afford enough bits to code all of the symbols exactly ? We must be prepared for lossy compression, when two different symbols are assigned the same code. In order to minimize the errors caused by this, we need a distortion function d ( x i , x j ) which measures how much error is caused when symbol x i codes for x j . x x x x The k -means algorithm is a method of choosing code book vectors so as to minimize the expected distortion for d ( x i , x j ) = ( x i − x j ) 2 12 / 20

Source coding Patterns that we observe have a lot of structure, e.g. visual scenes that we care about don’t look like “snow” on the TV This gives rise to redundancy , i.e. that observing part of a scene will help us predict other parts This redundancy can be exploited to code the data efficiently— loss less compression 13 / 20

Q: Why is coding so important? A: Because of the lossless coding theorem: the best probabilistic model of the data will have the shortest code Source coding gives us a way of comparing and evaluating different models of data, and searching for good ones Usually we will build models with hidden variables — a new representation of the data 14 / 20

Mutual information I ( X ; Y ) = KL ( p ( x , y ) , p ( x ) p ( y )) ≥ 0 p ( x , y ) log p ( x , y ) � = p ( x ) p ( y ) = I ( Y ; X ) x , y p ( x , y ) log p ( x | y ) � = p ( x ) x , y = H ( X ) − H ( X | Y ) = H ( X ) + H ( Y ) − H ( X , Y ) Mutual information is a measure of the amount of information that one RV contains about another. It is the reduction in uncertainty of one RV due to knowledge of the other. Zero mutual information if X and Y are independent 15 / 20

Mutual Information Example 1: Y 1 non smoker smoker lung 1/3 0 cancer Y 2 no lung 0 2/3 cancer 16 / 20

• Example 2: Y 1 non smoker smoker lung 1/9 2/9 cancer Y 2 no lung 2/9 4/9 cancer 17 / 20

Continuous variables P ( y 1 ) P ( y 2 ) dy 1 dy 2 = − 1 P ( y 1 , y 2 ) � � 2 log ( 1 − ρ 2 ) I ( Y 1 ; Y 2 ) = P ( y 1 , y 2 ) log 18 / 20

PCA and mutual information Linsker, 1988, Principle of maximum information preservation Consider a random variable Y = a T X + ǫ , with a T a = 1. How do we maximize I ( Y ; X ) ? I ( Y ; X ) = H ( Y ) − H ( Y | X ) But H ( Y | X ) is just the entropy of the noise term ǫ . If X has a joint multivariate Gaussian distribution then Y will have a Gaussian distribution. The (differential) entropy of a Gaussian N ( µ, σ 2 ) is 1 2 log 2 π e σ 2 . Hence we maximize information preservation by choosing a to give Y maximum variance subject to the constraint a T a = 1. 19 / 20

Channel capacity The channel capacity of a discrete memoryless channel is defined as C = max p ( x ) I ( X ; Y ) Noisy channel coding theorem (Informal statement) Error free communication above the channel capacity is impossible; communication at bit rates below C is possible with arbitrarily small error. 20 / 20

Coding and Information Theory Chris Williams School of Informatics, - PowerPoint PPT Presentation

Coding and Information Theory Chris Williams School of Informatics, University of Edinburgh November 2007 1 / 20 Overview What is information theory? Entropy Coding Rate-distortion theory Mutual information Channel capacity Reading:

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Why coding? Information compression

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

An Introduction to (Network) Coding Theory Anna-Lena Horlemann-Trautmann University of St.

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C D 1 D 0 3 7 A current 2

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

CCN & Network Coding Cedric Westphal Huawei and UCSC ICN & Network Coding - RFC7933

Lecture 4 Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Lecture 4 Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Large Intelligent Surfaces - Massive MIMO Evolution or Revolution? Rui Dinis 12 1 Instituto de

Empirical Properties of Good Channel Codes Qinghua (Devon) Ding June 8, 2020 The Chinese

Graph ensemble design for channel coding A. Montanari 1 A. Amraoui 2 T. Richardson 3 R. Urbanke 2

Resource-Efficient Encoding Communication and Fusion in Wireless Networks of Sensors and

Quantifying Information is a fundamental issue in computer security: Flow Using Min-Entropy

COL863: Quantum Computation and Information Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE,

Coding and Information Theory Chris Williams School of Informatics, - PowerPoint PPT Presentation

Coding and Information Theory Chris Williams School of Informatics, University of Edinburgh November 2007 1 / 20 Overview What is information theory? Entropy Coding Rate-distortion theory Mutual information Channel capacity Reading:

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Coding and Applications in Sensor Networks Why coding? Information compression

Overview Coding and Information Theory What is information theory? Entropy Coding Chris

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

An Introduction to (Network) Coding Theory Anna-Lena Horlemann-Trautmann University of St.

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C D 1 D 0 3 7 A current 2

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

CCN &amp; Network Coding Cedric Westphal Huawei and UCSC ICN &amp; Network Coding - RFC7933

Lecture 4 Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Lecture 4 Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan

Large Intelligent Surfaces - Massive MIMO Evolution or Revolution? Rui Dinis 12 1 Instituto de

Empirical Properties of Good Channel Codes Qinghua (Devon) Ding June 8, 2020 The Chinese

Graph ensemble design for channel coding A. Montanari 1 A. Amraoui 2 T. Richardson 3 R. Urbanke 2

Resource-Efficient Encoding Communication and Fusion in Wireless Networks of Sensors and

Quantifying Information is a fundamental issue in computer security: Flow Using Min-Entropy

COL863: Quantum Computation and Information Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE,

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

CCN & Network Coding Cedric Westphal Huawei and UCSC ICN & Network Coding - RFC7933