Information Theory and Statistical Inference Samuel Cheng School of - PowerPoint PPT Presentation

Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45

Lecture 2 Introduction to probabilistic inference Inference o : (Observed) evidence, θ : Parameter, x : prediction Maximum Likelihood (ML) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( o | θ ) Maximum A Posteriori (MAP) x = arg max x p ( x | ˆ θ ) , ˆ ˆ θ = arg max θ p ( θ | o ) Bayesian � x = � ˆ x x p ( x | θ ) p ( θ | o ) θ � �� p ( x | o ) where p ( θ | o ) = p ( o | θ ) p ( θ ) ∝ p ( o | θ ) p ( θ ) p ( o ) �� prior S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 2 / 45

Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Prior: Probability of a hypothesis before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 3 / 45

Lecture 2 Introduction to probabilistic inference Coin Flip C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 Which coin will I use? P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 Uniform Prior: All hypothesis are equally likely before we make any observations (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 4 / 45

Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? 3 P ( C 1 | H ) = P ( H | C 1 ) P ( C 1 ) � P ( H ) = P ( H | C i ) P ( C i ) P ( H ) i =1 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 5 / 45

Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.6 Posterior: Probability of a hypothesis given data C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 6 / 45

Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.21 P(C 2 |HT) = 0.58 P(C 3 |HT) = 0.21 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 9 / 45

Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 1/3 P(C 2 ) = 1/3 P(C 3 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 10 / 45

Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum Likelihood Estimate: The best hypothesis that fits observed data assuming uniform prior Most likely coin: Best estimate for P(H) C 2 P(H|C 2 ) = 0.5 C 2 P(H|C 2 ) = 0.5 P(C 2 ) = 1/3 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 11 / 45

Lecture 2 Introduction to probabilistic inference Using Prior Knowledge • Should we always use Uniform Prior? • Background knowledge: • Heads => you go first in Abalone against TA • TAs are nice people • => TA is more likely to use a coin biased in your favor C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 12 / 45

Lecture 2 Introduction to probabilistic inference Using Prior Knowledge We can encode it in the prior: P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 13 / 45

Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = ? P(C 2 |H) = ? P(C 3 |H) = ? P ( C 1 | H ) = α P ( H | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 14 / 45

Lecture 2 Introduction to probabilistic inference Experiment 1: Heads Which coin did I use? P(C 1 |H) = 0.006 P(C 2 |H) = 0.165 P(C 3 |H) = 0.829 ML posterior after Exp 1: P(C 1 |H) = 0.066 P(C 2 |H) = 0.333 P(C 3 |H) = 0.600 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 15 / 45

Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 P ( C 1 | HT ) = α P ( HT | C 1 ) P ( C 1 ) = α P ( H | C 1 ) P ( T | C 1 ) P ( C 1 ) C 2 C 3 C 1 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 17 / 45

Lecture 2 Introduction to probabilistic inference Experiment 2: Tails Which coin did I use? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 18 / 45

Lecture 2 Introduction to probabilistic inference Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 2 ) = 0.5 P(H|C 3 ) = 0.9 P(C 1 ) = 0.05 P(C 2 ) = 0.25 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 19 / 45

Lecture 2 Introduction to probabilistic inference Your Estimate? Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data assuming a non-uniform prior Most likely coin: Best estimate for P(H) C 3 P(H|C 3 ) = 0.9 C 3 P(H|C 3 ) = 0.9 P(C 3 ) = 0.70 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 20 / 45

Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 21 / 45

Lecture 2 Introduction to probabilistic inference Did We Do The Right Thing? P(C 1 |HT) = 0.035 P(C 2 |HT) = 0.481 P(C 3 |HT) = 0.485 C 2 and C 3 are almost equally likely C 1 C 2 C 3 P(H|C 1 ) = 0.1 P(H|C 3 ) = 0.9 P(H|C 2 ) = 0.5 (Slide credit: University of Washington CSE473) S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 22 / 45

Information Theory and Statistical Inference Samuel Cheng School of - PowerPoint PPT Presentation

Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45 Lecture 2 Introduction to probabilistic

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Chapter 2: Transformations and Expectations (a recap) STK4011/9011: Statistical Inference Theory

Chapter 3: Common Families of Distributions STK4011/9011: Statistical Inference Theory Johan

Chapter 4: Multiple Random Variables STK4011/9011: Statistical Inference Theory Johan Pensar

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh

Voyaging around nacre with the x-ray shuttle from biomineralisation to prosthetics via mollusc

Event Calendar SHIMA Daio,

First International Workshop on Learning over Multiple Contexts LMCE 2014 Nancy, 19 September

Teaching VDM & Teaching Formal Methods Ana Paiva apaiva@fe.up.pt www.fe.up.pt/~apaiva OVT

Infinitely many corks with shadow complexity one . . . . . Hironobu Naoe (Tohoku University)

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

Wentworth Lawyers COMMERCIAL LAWYERS D IRECTORS D UTIES : L ESSONS FROM THE PAST TO GUIDE

Information Theory and Statistical Inference Samuel Cheng School of - PowerPoint PPT Presentation

Information Theory and Statistical Inference Samuel Cheng School of ECE University of Oklahoma August 23, 2018 S. Cheng (OU-ECE) Information Theory and Statistical Inference August 23, 2018 1 / 45 Lecture 2 Introduction to probabilistic

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Chapter 2: Transformations and Expectations (a recap) STK4011/9011: Statistical Inference Theory

Chapter 3: Common Families of Distributions STK4011/9011: Statistical Inference Theory Johan

Chapter 4: Multiple Random Variables STK4011/9011: Statistical Inference Theory Johan Pensar

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Introduction to Gaussian Processes Iain Murray School of Informatics, University of Edinburgh

Voyaging around nacre with the x-ray shuttle from biomineralisation to prosthetics via mollusc

Event Calendar SHIMA Daio,

First International Workshop on Learning over Multiple Contexts LMCE 2014 Nancy, 19 September

Teaching VDM &amp; Teaching Formal Methods Ana Paiva apaiva@fe.up.pt www.fe.up.pt/~apaiva OVT

Infinitely many corks with shadow complexity one . . . . . Hironobu Naoe (Tohoku University)

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

Wentworth Lawyers COMMERCIAL LAWYERS D IRECTORS D UTIES : L ESSONS FROM THE PAST TO GUIDE

Teaching VDM & Teaching Formal Methods Ana Paiva apaiva@fe.up.pt www.fe.up.pt/~apaiva OVT