Where do the probabilities come from? Probabilities come from: - PowerPoint PPT Presentation

Where do the probabilities come from? Probabilities come from: ◮ Experts ◮ Data � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 1 / 6

Learning probabilities — the simplest case Observe tosses of thumbtack: Tails Heads n 0 instances of Heads = false n 1 instances of Heads = true what should we use as P ( heads )? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 2 / 6

Learning probabilities — the simplest case Observe tosses of thumbtack: Tails Heads n 0 instances of Heads = false n 1 instances of Heads = true what should we use as P ( heads )? n 1 Empirical frequency: P ( heads ) = n 0 + n 1 � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 2 / 6

Learning probabilities — the simplest case Observe tosses of thumbtack: Tails Heads n 0 instances of Heads = false n 1 instances of Heads = true what should we use as P ( heads )? n 1 Empirical frequency: P ( heads ) = n 0 + n 1 n 1 + 1 Laplace smoothing [1812]: P ( heads ) = n 0 + n 1 + 2 � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 2 / 6

Learning probabilities — the simplest case Observe tosses of thumbtack: Tails Heads n 0 instances of Heads = false n 1 instances of Heads = true what should we use as P ( heads )? n 1 Empirical frequency: P ( heads ) = n 0 + n 1 n 1 + 1 Laplace smoothing [1812]: P ( heads ) = n 0 + n 1 + 2 n 1 + c 1 Informed priors: P ( heads ) = n 0 + n 1 + c 0 + c 1 for some informed pseudo counts c 0 , c 1 > 0. c 0 = 1, c 1 = 1, expressed ignorance (uniform prior) Pseudo-counts convey prior knowledge. Consider: “how much more would I believe α if I had seen one example with α true than if I has seen no examples with α true?” � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 2 / 6

Learning probabilities — the simplest case Observe tosses of thumbtack: Tails Heads n 0 instances of Heads = false n 1 instances of Heads = true what should we use as P ( heads )? n 1 Empirical frequency: P ( heads ) = n 0 + n 1 n 1 + 1 Laplace smoothing [1812]: P ( heads ) = n 0 + n 1 + 2 n 1 + c 1 Informed priors: P ( heads ) = n 0 + n 1 + c 0 + c 1 for some informed pseudo counts c 0 , c 1 > 0. c 0 = 1, c 1 = 1, expressed ignorance (uniform prior) Pseudo-counts convey prior knowledge. Consider: “how much more would I believe α if I had seen one example with α true than if I has seen no examples with α true?” — empirical frequency overfits to the data. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 2 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5? ◮ Only restaurants with few ratings have an average rating of 5. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Example of Overfitting We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the one predicted to have the best future ratings. How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5? ◮ Only restaurants with few ratings have an average rating of 5. Solution: add some “average” ratings for each restaurant! � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 3 / 6

Bayesian Learning Probability of Heads … Toss 1 Toss 2 Toss 11 aispace: http://artint.info/code/aispace/beta.xml Probablity of Heads is a random variable representing the probability of heads. Range is { 0 . 0 , 0 . 1 , 0 . 2 , . . . , 0 . 9 , 1 . 0 } or interval [0 , 1]. P ( Toss # n = Heads | Probablity of Heads = v ) = � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 4 / 6

Bayesian Learning Probability of Heads … Toss 1 Toss 2 Toss 11 aispace: http://artint.info/code/aispace/beta.xml Probablity of Heads is a random variable representing the probability of heads. Range is { 0 . 0 , 0 . 1 , 0 . 2 , . . . , 0 . 9 , 1 . 0 } or interval [0 , 1]. P ( Toss # n = Heads | Probablity of Heads = v ) = v Toss # i is independent of Toss # j (for i � = j ) given Probablity of Heads i.i.d. or independent and identically distributed. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 4 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page h i is the best one number of times word w j is used when h i is the help page. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page h i is the best one number of times word w j is used when h i is the help page. When can the counts be updated? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page h i is the best one number of times word w j is used when h i is the help page. When can the counts be updated? When the correct page is found. � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Naive Bayes Classifier: User’s request for help H "able" "absent" "add" "zoom" . . . H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page h i is the best one number of times word w j is used when h i is the help page. When can the counts be updated? When the correct page is found. What prior counts should be used? Can they be zero? � D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 5 / 6

Where do the probabilities come from? Probabilities come from: - PowerPoint PPT Presentation

Where do the probabilities come from? Probabilities come from: Experts Data D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 1 / 6 Learning probabilities the simplest case Observe tosses of thumbtack:

Come, Come Whoever You Are Come, Come, Whoever You Are Though youve broken your vows a

Review: Probabilities DISCRETE PROBABILITIES Intro We have all been exposed to informal

Partially specified Probabilities: decisions and games May 2007 Ehud Lehrer The problem

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

Conditional Probabilities Anders Ringgaard Kristensen Department of Veterinary and Animal

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Should we think of quantum probabilities as Bayesian probabilities? Carlton M. Caves C. M.

Comonotone lower probabilities for bivariate Introduction and discrete structures Comonotonicity

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Probabilities and Expectations A. Rupam Mahmood September 10, 2015 Probabilities

Integrable gap probabilities for the Generalized Bessel process Manuela Girotti SISSA,

Advent O come, O come, Emmanuel And ransom captive Israel That mourns in lonely exile here

Song of Songs Song of Solomon Song of Songs 6:13-8:4 (NIV) Ch Choru rus Come back, come back,

Counting and Probability Whats to come? Counting and Probability Whats to come?

Hitting Times and Probabilities for Imprecise Markov Chains Thomas Krak, Natan TJoens, and

Zeroes When working with n-gram models, zero probabilities can be real show-stoppers

NAVIGATING BIG DATA with High-Throughput, Energy- Efficient Data Partitioning Lisa Wu, R.J.

In the Shadow of the Wall Art & Oppression in Occupied Palestine Omar Barghouti El-Funoun

T H E N A V E P R O G R A M M E R A L L A B O U T M E D A N I E L E P R O C I D A D I V

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Introduction to Computational Geometry Partha P. Goswami ( ppg.rpe@caluniv.ac.in ) Institute of

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Nave Bayes Pieter Abbeel

on Organizational and Team Wellness Constance Dahlin, MSN, ANP-BC, ACHPN, FPCN, FAAN Palliative

To Dry Cask Storage Commission Meeting January 6, 2014 Agenda Introduction M. Johnson

Where do the probabilities come from? Probabilities come from: - PowerPoint PPT Presentation

Where do the probabilities come from? Probabilities come from: Experts Data D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 1 / 6 Learning probabilities the simplest case Observe tosses of thumbtack:

Come, Come Whoever You Are Come, Come, Whoever You Are Though youve broken your vows a

Review: Probabilities DISCRETE PROBABILITIES Intro We have all been exposed to informal

Partially specified Probabilities: decisions and games May 2007 Ehud Lehrer The problem

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

Conditional Probabilities Anders Ringgaard Kristensen Department of Veterinary and Animal

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Should we think of quantum probabilities as Bayesian probabilities? Carlton M. Caves C. M.

Comonotone lower probabilities for bivariate Introduction and discrete structures Comonotonicity

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Probabilities and Expectations A. Rupam Mahmood September 10, 2015 Probabilities

Integrable gap probabilities for the Generalized Bessel process Manuela Girotti SISSA,

Advent O come, O come, Emmanuel And ransom captive Israel That mourns in lonely exile here

Song of Songs Song of Solomon Song of Songs 6:13-8:4 (NIV) Ch Choru rus Come back, come back,

Counting and Probability Whats to come? Counting and Probability Whats to come?

Hitting Times and Probabilities for Imprecise Markov Chains Thomas Krak, Natan TJoens, and

Zeroes When working with n-gram models, zero probabilities can be real show-stoppers

NAVIGATING BIG DATA with High-Throughput, Energy- Efficient Data Partitioning Lisa Wu, R.J.

In the Shadow of the Wall Art &amp; Oppression in Occupied Palestine Omar Barghouti El-Funoun

T H E N A V E P R O G R A M M E R A L L A B O U T M E D A N I E L E P R O C I D A D I V

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Introduction to Computational Geometry Partha P. Goswami ( ppg.rpe@caluniv.ac.in ) Institute of

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Nave Bayes Pieter Abbeel

on Organizational and Team Wellness Constance Dahlin, MSN, ANP-BC, ACHPN, FPCN, FAAN Palliative

To Dry Cask Storage Commission Meeting January 6, 2014 Agenda Introduction M. Johnson

In the Shadow of the Wall Art & Oppression in Occupied Palestine Omar Barghouti El-Funoun