information theory and statistical inference
play

Information Theory and Statistical Inference Samuel Cheng School of - PowerPoint PPT Presentation

Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45 Lecture 2 Introduction to probabilistic


  1. Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45

  2. Lecture 2 Introduction to probabilistic inference Inference o : (Observed) evidence, θ : Parameter, x : prediction Maximum Likelihood (ML) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( o | θ ) Maximum A Posteriori (MAP) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( θ | o ) Bayesian � x = � ˆ x x p ( x | θ ) p ( θ | o ) θ � �� � p ( x | o ) where p ( θ | o ) = p ( o | θ ) p ( θ ) ∝ p ( o | θ ) p ( θ ) p ( o ) ���� prior S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 2 / 45

  3. Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Prior: Probability of a hypothesis before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 3 / 45

  4. Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Uniform Prior: All hypothesis are equally likely before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 4 / 45

  5. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? 3 P ( C 1 | H ) = P ( H | C 1 ) P ( C 1 ) � P ( H ) = P ( H | C i ) P ( C i ) P ( H ) i =1 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 5 / 45

  6. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.6 Posterior: Probability of a hypothesis given data C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 6 / 45

  7. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = ? P(C 2 |HT) = ? P(C 3 |HT) = ? P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 7 / 45

  8. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.21 P(C 2 |HT) = 0.58 P(C 3 |HT) = 0.21 P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 8 / 45

  9. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.21 P(C 2 |HT) = 0.58 P(C 3 |HT) = 0.21 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 9 / 45

  10. Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 10 / 45

  11. Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum Likelihood Estimate: The best hypothesis that fits observed data assuming uniform prior Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 P(H|C 2 ) = 0.5 P(C 2 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 11 / 45

  12. Lecture 2 Introduction to probabilistic inference Using Prior Knowledge • Should we always use Uniform Prior? • Background knowledge: • Heads => you go first in Abalone against TA • TAs are nice people • => TA is more likely to use a coin biased in your favor C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 12 / 45

  13. Lecture 2 Introduction to probabilistic inference Using Prior Knowledge We can encode it in the prior: P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 13 / 45

  14. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? P ( C 1 | H ) = α P ( H | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 14 / 45

  15. Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.006 P(C 2 |H) = 0.165 P(C 3 |H) = 0.829 ML posterior after Exp 1: P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.600 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 15 / 45

  16. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = ? P(C 2 |HT) = ? P(C 3 |HT) = ? P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 16 / 45

  17. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 17 / 45

  18. Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 18 / 45

  19. Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 19 / 45

  20. Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data assuming a non-uniform prior Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 3 P(H|C 3 ) = 0.9 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 20 / 45

  21. Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 21 / 45

  22. Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 2 and C 3 are almost equally likely C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 22 / 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend