Lecture 2 Measures of Information I-Hsiang Wang Department of - PowerPoint PPT Presentation

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence Lecture 2 Measures of Information I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw September 16, 2015 1 / 42 I-Hsiang Wang IT Lecture 2

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence How to measure information? Before this, we should ask: What is information? 2 / 42 I-Hsiang Wang IT Lecture 2

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence In our daily lives, information is often obtained by learning something that we did not know before. Examples: result of a ball game, score of an exam, weather, … In other words, one gets some information by learning something about which that he/she was uncertain before. Shannon: “Information is the resolution of uncertainty.” 3 / 42 I-Hsiang Wang IT Lecture 2

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence I-Hsiang Wang 4 / 42 players eventually wins the gold medal (message T). wins the final (message B). two pieces of news (the two messages)? How much information can he get after 10 days when he learns the However, due to his work, he cannot access any news in 10 days. NBA final and who will win the Men’s single. D is an enthusiastic sports fan. He is interested in who will win the tournament (the French Open quarterfinals) happening right now. Suppose there is a professional basketball (NBA) final and a tennis Let us take the following example: Motivating Example IT Lecture 2 For the NBA final, D will learn that one of the two teams eventually For the French Open quarterfinals, D will learn that one of the eight

Entropy and Conditional Entropy multiplicative. I-Hsiang Wang 5 / 42 Logarithmic Function What function produces additive outputs with multiplicative inputs? Mutual Information and Kullback–Leibler Divergence IT Lecture 2 should be additive, while the number of possible outcomes is Observations 1 The amount of information is related to the number of possible outcomes: message B is a result of two possible outcomes, while 2 The amount of information obtained in learning the two messages message T is a result of eight possible outcomes. Let f ( · ) be a function that measures the amount of information: # of possible Amount of info. # of possible Amount of info. f ( · ) outcomes of B from learning B outcomes of B from learning B f ( · ) × + # of possible Amount of info. # of possible Amount of info. f ( · ) outcomes of T from learning T outcomes of T from learning T

Entropy and Conditional Entropy The Heats win the final (w.p. I-Hsiang Wang 6 / 42 The Spurs win the final (w.p. The probability that the Spurs win the final: Mutual Information and Kullback–Leibler Divergence IT Lecture 2 that outcome should be very little. The probability that the Heats win the final: Logarithm as the Information Measure outcome occurs with very high probability, the amount of information of Initial guess of the measure of information: log ( # of possible outcomes ) . However, this measure does not take the likeliness into account – if some For example, suppose D knows that the Spurs was leading the Heats 3:1 2 → 1 1 8 . 1 8 ): it is like out of 8 times there is only 1 time that will generate this outcome = ⇒ the amount of information = log 8 = 3 bits. 1 2 → 7 8 . 7 8 ): it is like out of 8 7 times there is only 1 time that will generate this outcome = ⇒ the amount of information = log 8 7 = 3 − log 7 bits.

Entropy and Conditional Entropy uncertainty of an unknown outcome I-Hsiang Wang 7 / 42 Correspondingly, the measure of information of a random outcome X is Mutual Information and Kullback–Leibler Divergence Hence, a plausible measure of information of a realization x drawn from a 4 The measure of information is actually measuring the amount of 3 The measure of information should take the likeliness into account 2 The measure of information should be additive 1 The amount of information is related to the # of possible outcomes From the motivation, we collect the following intuitions: Information and Uncertainty IT Lecture 2 1 random outcome X is f ( x ) := log P { X = x } . the averaged value of f ( x ) : E X [ f ( X )] . Notation : in this lecture, the logarithms are of base 2 if not specified.

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence Definition of Entropy and Conditional Entropy Properties of Entropy and Conditional Entropy 1 Entropy and Conditional Entropy Definition of Entropy and Conditional Entropy Properties of Entropy and Conditional Entropy 2 Mutual Information and Kullback–Leibler Divergence 8 / 42 I-Hsiang Wang IT Lecture 2

Entropy and Conditional Entropy the expectation of the self information over all possible realizations: I-Hsiang Wang 9 / 42 information when one learns the actual outcome/realization of r.v. X . Note : Entropy can be understood as the (average) amount of Mutual Information and Kullback–Leibler Divergence The entropy of a random variable X is defined by Definition 1 (Entropy) log Hence, to measure the uncertainty of a random variable , we should take However, on the average, it happens rarely. Definition of Entropy and Conditional Entropy Properties of Entropy and Conditional Entropy Entropy: Measure of Uncertainty of a Random Variable log IT Lecture 2 If the outcome has small probability, it contains higher uncertainty. 1 P { X = x } ⇝ measure of information/uncertainty of an outcome x . [ ] = ∑ 1 1 H ( X ) := E X p ( x ) log p ( X ) p ( x ) . x ∈X Note : By convention we set 0 log 1 0 = 0 since lim t → 0 t log t = 0 .

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence I-Hsiang Wang 10 / 42 concave in p . arg max max 1 Analytically check that Exercise 1 IT Lecture 2 Properties of Entropy and Conditional Entropy Definition of Entropy and Conditional Entropy Example 1 (Binary entropy function) Let X ∼ Ber ( p ) be a Bernoulli random variable, that is, X ∈ { 0 , 1 } , p X (1) = 1 − p X (0) = p . Then, the entropy of X is called the binary entropy function H b ( p ) , where (note: we follow the convention that 0 log 1 0 = 0 .) H b ( p ) := H ( X ) = − p log p − (1 − p ) log (1 − p ) , p ∈ [0 , 1] . H b ( p ) 1 1 . 9 . 8 . 7 p ∈ [0 , 1] H b ( p ) = 1 , . 6 . 5 0 . 5 p ∈ [0 , 1] H b ( p ) = 1/2 . . 4 . 3 . 2 2 Analytically prove that H b ( p ) is . 1 0 0 p 0 0 . 5 1

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence I-Hsiang Wang 11 / 42 sol : IT Lecture 2 x Definition of Entropy and Conditional Entropy Properties of Entropy and Conditional Entropy Example 2 Consider a random variable X ∈ { 0 , 1 , 2 , 3 } with p.m.f. defined as follows: 0 1 2 3 1 1 1 1 p ( x ) 6 3 3 6 Compute H ( X ) and H ( Y ) , where Y := X mod 2 . H ( X ) = 2 × 1 6 × log 6 + 2 × 1 3 × log 3 = 1 3 + log 3 . H ( Y ) = 2 × 1 2 × log 2 = 1 .

Entropy and Conditional Entropy where I-Hsiang Wang 12 / 42 Mutual Information and Kullback–Leibler Divergence i.i.d. Below we take a slight deviation and look at a mathematical problem. Besides the intuitive motivation, Entropy has operational meanings. Operational Meaning of Entropy Properties of Entropy and Conditional Entropy Definition of Entropy and Conditional Entropy IT Lecture 2 Problem : Consider a sequence of discrete rv’s X n := ( X 1 , X 2 , . . . , X n ) , X i ∈ X , X i ∼ p X , ∀ i = 1 , 2 , . . . , n . |X| < ∞ . For a given ϵ ∈ (0 , 1) , we say A ⊆ X n is an ϵ -high-probability set iff P { X n ∈ A} ≥ 1 − ϵ. We would like to find the asymptotic scaling of the smallest cardinality of ϵ -high-probability sets as n → ∞ . Let s ( n , ϵ ) be that smallest cardinality.

Entropy and Conditional Entropy Mutual Information and Kullback–Leibler Divergence I-Hsiang Wang 13 / 42 random source sequences! (as Shannon pointed out in his 1948 paper.) This is the saving (compression) due to the statistical structure of n minimum # of bits required prescribed missed probability, Why? Because the above theorem guarantees that, for any IT Lecture 2 With the theorem, if one would like to describe a random length- n pf : Application of Law of Large Numbers. See HW1. lim Theorem 1 (Cardinality of High Probability Sets) Properties of Entropy and Conditional Entropy Definition of Entropy and Conditional Entropy 1 n log s ( n , ϵ ) = H ( X ) , ∀ ϵ ∈ (0 , 1) . n →∞ Implications : H ( X ) is the minimum possible compression ratio. X -sequence with a missed probability at most ϵ , he/she only needs k ≈ nH ( X ) bits when n is large. → H ( X ) as n → ∞ .

Entropy and Conditional Entropy entropy of the component random variables. I-Hsiang Wang 14 / 42 log Mutual Information and Kullback–Leibler Divergence X d Definition 2 (Entropy) defined by the expectation of the self information In some literature, the entropy of a random vector is also called the joint Initially we define entropy for a random variable. Definition of Entropy and Conditional Entropy Properties of Entropy and Conditional Entropy variables, or, a random vector. Entropy: Definition IT Lecture 2 It is straightforward to extend this definition to a sequence of random [ X 1 ] T is The entropy of a d -dimensional random vector X := · · · [ ] ∑ 1 1 H ( X ) := E X = p ( x ) log p ( x ) = H ( X 1 , . . . , X d ) . p ( X ) x ∈X 1 ×···×X d

Lecture 2 Measures of Information I-Hsiang Wang Department of - PowerPoint PPT Presentation

Entropy and Conditional Entropy Mutual Information and KullbackLeibler Divergence Lecture 2 Measures of Information I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw September 16, 2015 1 / 42

Transitional Measures Introduction to Regulatory Measures 1 Why Regulatory Measures ?

Investor Day May 15, 2017 Regulation G: Non-GAAP Measures and Reconciliation of Non-GAAP Measures

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Health Care Quality Measures IBHC Measures Atlas The Problem Finding quality measures to assess

2.2: Numerical summary Measures of location. Measures of spread. Measures of form.

Rules Engine National Quality Measures Ross Merritt, MPH Analytic Consultant August 2010

Distinguishing Performance Using Patient-Reported Outcome Measures Adam Rose, MD MSc RAND

Investor Presentation Non-GAAP Financial Measures SemGroups non-GAAP measures, Adjusted

Capacity Working Group III Development of GDNs Capacity Output Measures Output Measures In

Atlas of Integrated Behavioral Health Care Quality Measures IBHC Measures Atlas The Problem

A Closer Look at Quality Measures April 9, 2013 1 Colorado Department of Health Care Policy

Investor Presentation Non-GAAP Financial Measures SemGroups non-GAAP measures, Adjusted

Potential Mobile Source Reduction Measures Potential Mobile Source Reduction Measures Port

Development of Health Outcome Based Measures Development of Health Outcome Based Measures

shortened Notation Measures of Location Measures of Dispersion Standardization

NON-GAAP FINANCIAL MEASURES NON-GAAP FINANCIAL MEASURES Quarter Ended December 31, 2019 1

Amplitude Analysis in Hadron Spectroscopy Csar Fernndez-Ramrez JPAC/Jefferson Lab Joint

CM10134 / CM50147 CM10134 / CM50147 Pr Prog ogramming I mming I Lecture Handouts Dr. Marina

Hadron Spectroscopy Alessandro Pilloni Hadronic Physics with Lepton and Hadron Beams, JLab,

3/26/2020 KyCL 103: The Budget, Letters, & Forms Kentucky Comprehensive Literacy Technical

Discover Victoria Discover You Congratulations! Congratulations! Welcome, Class of 2023!

CSE 473: Introduc1on to Ar1ficial Intelligence Introduc1on Luke Ze<lemoyer University of

MODULE FIVE The Law Of Attraction is the most powerful law of the universe You can only allow

STRUCTURALISM: STRUCTURALISM: STRUCTURALISM: Wilhelm Wundt and Edward Titchener Wilhelm Wundt

Lecture 2 Measures of Information I-Hsiang Wang Department of - PowerPoint PPT Presentation

Entropy and Conditional Entropy Mutual Information and KullbackLeibler Divergence Lecture 2 Measures of Information I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw September 16, 2015 1 / 42

Transitional Measures Introduction to Regulatory Measures 1 Why Regulatory Measures ?

Investor Day May 15, 2017 Regulation G: Non-GAAP Measures and Reconciliation of Non-GAAP Measures

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Health Care Quality Measures IBHC Measures Atlas The Problem Finding quality measures to assess

2.2: Numerical summary Measures of location. Measures of spread. Measures of form.

Rules Engine National Quality Measures Ross Merritt, MPH Analytic Consultant August 2010

Distinguishing Performance Using Patient-Reported Outcome Measures Adam Rose, MD MSc RAND

Investor Presentation Non-GAAP Financial Measures SemGroups non-GAAP measures, Adjusted

Capacity Working Group III Development of GDNs Capacity Output Measures Output Measures In

Atlas of Integrated Behavioral Health Care Quality Measures IBHC Measures Atlas The Problem

A Closer Look at Quality Measures April 9, 2013 1 Colorado Department of Health Care Policy

Investor Presentation Non-GAAP Financial Measures SemGroups non-GAAP measures, Adjusted

Potential Mobile Source Reduction Measures Potential Mobile Source Reduction Measures Port

Development of Health Outcome Based Measures Development of Health Outcome Based Measures

shortened Notation Measures of Location Measures of Dispersion Standardization

NON-GAAP FINANCIAL MEASURES NON-GAAP FINANCIAL MEASURES Quarter Ended December 31, 2019 1

Amplitude Analysis in Hadron Spectroscopy Csar Fernndez-Ramrez JPAC/Jefferson Lab Joint

CM10134 / CM50147 CM10134 / CM50147 Pr Prog ogramming I mming I Lecture Handouts Dr. Marina

Hadron Spectroscopy Alessandro Pilloni Hadronic Physics with Lepton and Hadron Beams, JLab,

3/26/2020 KyCL 103: The Budget, Letters, &amp; Forms Kentucky Comprehensive Literacy Technical

Discover Victoria Discover You Congratulations! Congratulations! Welcome, Class of 2023!

CSE 473: Introduc1on to Ar1ficial Intelligence Introduc1on Luke Ze&lt;lemoyer University of

MODULE FIVE The Law Of Attraction is the most powerful law of the universe You can only allow

STRUCTURALISM: STRUCTURALISM: STRUCTURALISM: Wilhelm Wundt and Edward Titchener Wilhelm Wundt

3/26/2020 KyCL 103: The Budget, Letters, & Forms Kentucky Comprehensive Literacy Technical

CSE 473: Introduc1on to Ar1ficial Intelligence Introduc1on Luke Ze<lemoyer University of