Chapter 2
Entropy, Relative Entropy, and Mutual Infor- mation
Peng-Hua Wang
Graduate Institute of Communication Engineering National Taipei University
Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation - - PowerPoint PPT Presentation
Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute of Communication Engineering National Taipei University Chapter Outline Chap. 2 Entropy, Relative Entropy, and Mutual Information 2.1 Entropy 2.2
Graduate Institute of Communication Engineering National Taipei University
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 2/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 3/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 4/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 5/51
x∈X
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 6/51
■ X be a discrete random variable with alphabet X and
■ log2 p(x), the entropy is expressed in bits. ■ If the base is e, i.e., ln p(x), the entropy is expressed in
■ If the base is b, we denote the entropy as Hb(X). ■ 0 log 0 lim
t→0+ t log t = 0.
■ H(X) = E[log
1 p(X)] = −E log p(X)
■ H(X) may not exist.
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 7/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 8/51
■ The amount of information (code length) required on the
■ The minimum expected number of binary questions
■ The amount of “information” provided by an
◆ If an event is less probable, we receive more
◆ A certain event provides no information. ■ “Uncertainty” about a random variable. ■ “Randomness” of a random variable.
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 9/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 10/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 11/51
■ H(p) is a concave function of the distribution. ■ H(p) = 0 if p = 0 or 1. ■ H(p) = 1 is maximum if p = 1/2.
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 12/51
2,
4,
8,
8.
■ We wish to determine the value of X with the “Yes/No”
■ The minimum number of binary questions lies between
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 13/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 14/51
x∈X ∑ y∈Y
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 15/51
x∈X
x∈X
y∈Y
x∈X ∑ y∈Y
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 16/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 17/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 18/51
■ H(X) − H(X|Y) = H(Y) − H(Y|X)
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 19/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 20/51
x∈X
■ D(p||q) is also called the Kullback–Leibler Distance ■ We will use 0 log 0
0 = 0 and p log p 0 = ∞
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 21/51
■ D(p||q) is a measure of the distance between two
■ D(p||q) is a measure of the inefficiency of assuming that
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 22/51
■ If we know the true distribution p(x), we could
x∈X
x∈X
x∈X
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 23/51
x∈X ∑ y∈Y
■ The mutual information I(X; Y) is the relative entropy
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 24/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 25/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 26/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 27/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 28/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 29/51
n
i=1
n
i=1
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 30/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 31/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 32/51
■ A function f is said to be strictly convex if ≤ is replaced
■ A function is convex if it always lies below any chord. ■ A function f is concave if − f is convex. ■ f is convex (strictly convex) ⇔ f ′′ ≥ 0 (f ′′ > 0).
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 33/51
■ It can be extended to linear combination of n values. For
2x2 + α′ 3x3)
2x2 + α′ 3x3)
2 f (x2) + α′ 3 f (x3)
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 34/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 35/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 36/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 37/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 38/51
x∈X
x∈X
x∈X
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 39/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 40/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 41/51
n
i=1
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 42/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 43/51
n
i=1
i=1
i=1 ai
i=1 bi
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 44/51
i=1 αi = 1, we have n
i=1
n
i=1
i=1 bi
i=1 ai = A, and
n
i=1
ai A log bi ai
i=1 bi
A
n
i=1
ai log bi ai
i=1
ai
i=1 bi
∑n
i=1 ai
n
i=1
ai log ai bi
i=1
ai
i=1 ai
∑n
i=1 bi
.
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 45/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 46/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 47/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 48/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 49/51
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 50/51
■ X → Y → Z iff X and Z are conditionally independent
■ X → Y → Z implies Z → Y → X. We can write
■ If Z = f (Y), then X → Y → Z.
Peng-Hua Wang, February 19, 2012 Information Theory, Chap. 2 - p. 51/51