SLIDE 1 Information Theory
Don Fallis
SLIDE 2
Information in the Wild
SLIDE 3
Intentional Information Transfer
SLIDE 4
Data Storage
SLIDE 5
Measuring Information
SLIDE 6
Surprise!
SLIDE 7 Inversely Related to Probability
- The lower the probability of event A,
the more information you get by learning A.
- The higher the probability of event A,
the less information you get by learning A.
- So, 1/p(A) is a plausible measure of
the information you get by learning A.
SLIDE 8 Measuring Information
1 2 1 2 3 4 1 2 3 4 5 6 7 8
- S(HEADS) = 1/p(HEADS) = 1/0.5 = 2
- S(‘1’) = 1/p(‘1’) = 1/0.25 = 4
- S(‘2’) = 1/p(‘2’) = 1/0.125 = 8
SLIDE 9 Measuring Information
1 2 1 1 5 2 2 6 3 3 7 4 4 8
- 2 + 4 ≠ 8
- Log2(2) + log2(4) = 1 + 2 = 3 = log2(8)
SLIDE 10
Binary Search
SLIDE 11 Surprise
- Surprise of a Fair Coin coming up Heads
- S(FC = HEADS) = log2( 1/(1/2) ) = log2(2) = 1 bit
- Surprise of LLR being at the Left shrub at first time step
- S(X1 = LEFT) = log2( 1/(1/3) ) = log2(3) = 1.58 bits
- Surprise of a Fire Alarm going off
- S(FA = ALARM) = log2( 1/(1/100) ) = log2(100) = 6.644 bits
SLIDE 12
Bits versus Binary Digits
SLIDE 13 Entropy
- Entropy is Average Surprise
- Note that this another example of expected value.
- Entropy of a Fair Coin
- H(FC) = 1/2*log2(2) + 1/2*log2(2)
- H(FC) = 1/2*1 + 1/2*1 = 1
- Entropy of Robot Location at first time step
- H(X1) = 1/3*log2(3) + 1/3*log2(3) + 1/3*log2(3)
- H(X1) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58
- Entropy of a Fire Alarm
- H(FA) = 0.01*log2(100) + 0.99*log2(1.01)
- H(FA) = 0.01*6.644 + 0.99*0.014 = 0.081
SLIDE 14
Uniform Maximizes Entropy
SLIDE 15
Amount of Information Transmitted
SLIDE 16
Noise
SLIDE 17
Information Channel
SLIDE 18
Binary Symmetric Channel
SLIDE 19 Probabilistic Graphical Model
𝚾S S0 q S1 1-q 𝛀SR R0 R1 S0 1-p p S1 p 1-p
SLIDE 20
Mutual Information
SLIDE 21
Worst-Case Scenario (Independent)
SLIDE 22 Best-Case Scenario (Perfectly Correlated)
SLIDE 23
- 𝑁𝐽 𝑌 & 𝑍 = 𝐼 𝑌 + 𝐼 𝑍 − 𝐼(𝑌 & 𝑍)
Everything In Between
SLIDE 24 Measuring Mutual Information
- Mutual Information is Expected Reduction in Uncertainty
- Note that this another example of expected value.
- Suppose that you see a Yellow flash …
- Your credences shift from (1/3, 1/3, 1/3) to (1/2, 1/2, 0)
- The entropy of your credences shifts from 1.58 to 1
- So, there is a reduction in entropy of 0.58
- Suppose that you see a White flash …
- Your credences shift from (1/3, 1/3, 1/3) to (0, 0, 1)
- The entropy of your credences shifts from 1.58 to 0
- So, there is a reduction in entropy of 1.58
- Take a Weighted Average …
- The probability of a Yellow flash is 2/3
- The probability of a White flash is 1/3
- So, the expected reduction in entropy is 2/3*0.58 + 1/3*1.58 = 0.92
SLIDE 25 H↓ E→ YELLOW WHITE total H GOOD 1/3 1/3 BAD 1/3 1/3 UGLY 1/3 1/3 total E 2/3 1/3
Firefly Entropy
- H(H) = 1/3*log2(3) + 1/3*log2(3) + 1/3*log2(3)
- H(H) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58
- H(E) = 2/3*log2(1.5) + 1/3*log2(3)
- H(E) = 2/3*0.58 + 1/3*1.58 = 0.92
- 𝑞 ℎ & 𝑓 , 𝑞 ℎ , and 𝑞(𝑓)
SLIDE 26 H↓ E→ YELLOW WHITE total H GOOD 1/3 1/3 BAD 1/3 1/3 UGLY 1/3 1/3 total E 2/3 1/3
More Firefly Entropy
- H(H&E) = 1/3*log2(3) + 0*log2(0) + 1/3*log2(3)
+ 0*log2(0) + 0*log2(0) + 1/3*log2(3)
- H(H&E) = 1/3*1.58 + 0*(-∞) + 1/3*1.58
+ 0*(-∞) + 0*(-∞) + 1/3*1.58 = 1.58
SLIDE 27 H↓ E→ YELLOW WHITE total H GOOD 1/3 1/3 BAD 1/3 1/3 UGLY 1/3 1/3 total E 2/3 1/3
Firefly Mutual Information
- MI(H&E) = H(H) + H(E) – H(H&E)
- MI(H&E) = 1.58 + 0.92 – 1.58 = 0.92
- 𝑞 ℎ & 𝑓 , 𝑞 ℎ , and 𝑞(𝑓)
SLIDE 28 X1↓ X2→ left middle right total X1 left 1/12 1/4 1/3 middle 1/12 1/4 1/3 right 1/3 1/3 total X2 1/12 1/3 7/12
- 𝑞 𝑦1 & 𝑦2 , 𝑞 𝑦1 , and 𝑞 𝑦2
Robot Localization #1
- H(X1) = 1/3*log2(3) + 1/3*log2(3) + 1/3*log2(3)
- H(X1) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58
- H(X2) = 1/12*log2(12) + 1/3*log2(3) + 7/12*log2(1.71)
- H(X2) = 1/12*3.58 + 1/3*1.58 + 7/12*0.78 = 1.28
SLIDE 29 X1↓ X2→ left middle right total X1 left 1/12 1/4 1/3 middle 1/12 1/4 1/3 right 1/3 1/3 total X2 1/12 1/3 7/12
- 𝑞 𝑦1 & 𝑦2 , 𝑞 𝑦1 , and 𝑞 𝑦2
Robot Localization #1
- H(X1&X2) = 1/12*log2(12) + 1/4*log2(4) +
1/12*log2(12) + 1/4*log2(4) + 1/3*log2(3)
- H(X1&X2) = 1/12*3.58 + 1/4*2 + 1/12*3.58
+ 1/4*2 + 1/3*1.58 = 2.13
SLIDE 30 X1↓ X2→ left middle right total X1 left 1/12 1/4 1/3 middle 1/12 1/4 1/3 right 1/3 1/3 total X2 1/12 1/3 7/12
- 𝑞 𝑦1 & 𝑦2 , 𝑞 𝑦1 , and 𝑞 𝑦2
Robot Localization #1
- MI(X1&X2) = H(X1) + H(X2) – H(X1&X2)
- MI(X1&X2) = 1.58 + 1.28 – 2.13 = 0.74
SLIDE 31
- 𝑞 𝑦1 & 𝑝1 , 𝑞 𝑦1 , and 𝑞 𝑝1
Robot Localization #1
- H(X1) = 1/3*log2(3) + 1/3*log2(3) + 1/3*log2(3)
- H(X1) = 1/3*1.58 + 1/3*1.58 + 1/3*1.58 = 1.58
- H(O1) = 2/3*log2(1.5) + 1/3*log2(3)
- H(O1) = 2/3*0.58 + 1/3*1.58 = 0.92
X1↓ O1→ hot cold total X1 left 1/3 1/3 middle 1/3 1/3 right 1/3 1/3 total O1 2/3 1/3
SLIDE 32 Robot Localization #1
- H(X1&O1) = 1/3*log2(3) + 0*log2(0) + 0*log2(0) +
1/3*log2(3) + 1/3*log2(3) + 0*log2(0)
- H(X1&O1) = 1/3*1.58 + 0*(-∞) + 0*(-∞) +
1/3*1.58 + 1/3*1.58 + 0*(-∞) = 1.58
- 𝑞 𝑦1 & 𝑝1 , 𝑞 𝑦1 , and 𝑞 𝑝1
X1↓ O1→ hot cold total X1 left 1/3 1/3 middle 1/3 1/3 right 1/3 1/3 total O1 2/3 1/3
SLIDE 33 X1↓ O1→ hot cold total X1 left 1/3 1/3 middle 1/3 1/3 right 1/3 1/3 total O1 2/3 1/3
Robot Localization #1
- MI(X1&O1) = H(X1) + H(O1) – H(X1&O1)
- MI(X1&O1) = 1.58 + 0.92 – 1.58 = 0.92
- 𝑞 𝑦1 & 𝑝1 , 𝑞 𝑦1 , and 𝑞 𝑝1