Where do the probabilities come from? Probabilities come from: - - PowerPoint PPT Presentation

where do the probabilities come from
SMART_READER_LITE
LIVE PREVIEW

Where do the probabilities come from? Probabilities come from: - - PowerPoint PPT Presentation

Where do the probabilities come from? Probabilities come from: Experts Data D. Poole and A. Mackworth 2019 c Artificial Intelligence, Lecture 10.1 1 / 6 Learning probabilities the simplest case Observe tosses of thumbtack:


slide-1
SLIDE 1

Where do the probabilities come from?

Probabilities come from:

◮ Experts ◮ Data

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 1 / 6

slide-2
SLIDE 2

Learning probabilities — the simplest case

Observe tosses of thumbtack: n0 instances of Heads = false n1 instances of Heads = true what should we use as P(heads)?

Tails Heads

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 2 / 6

slide-3
SLIDE 3

Learning probabilities — the simplest case

Observe tosses of thumbtack: n0 instances of Heads = false n1 instances of Heads = true what should we use as P(heads)?

Tails Heads

Empirical frequency: P(heads) = n1 n0 + n1

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 2 / 6

slide-4
SLIDE 4

Learning probabilities — the simplest case

Observe tosses of thumbtack: n0 instances of Heads = false n1 instances of Heads = true what should we use as P(heads)?

Tails Heads

Empirical frequency: P(heads) = n1 n0 + n1 Laplace smoothing [1812]: P(heads) = n1 + 1 n0 + n1 + 2

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 2 / 6

slide-5
SLIDE 5

Learning probabilities — the simplest case

Observe tosses of thumbtack: n0 instances of Heads = false n1 instances of Heads = true what should we use as P(heads)?

Tails Heads

Empirical frequency: P(heads) = n1 n0 + n1 Laplace smoothing [1812]: P(heads) = n1 + 1 n0 + n1 + 2 Informed priors: P(heads) = n1 + c1 n0 + n1 + c0 + c1 for some informed pseudo counts c0, c1 > 0. c0 = 1, c1 = 1, expressed ignorance (uniform prior) Pseudo-counts convey prior knowledge. Consider: “how much more would I believe α if I had seen one example with α true than if I has seen no examples with α true?”

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 2 / 6

slide-6
SLIDE 6

Learning probabilities — the simplest case

Observe tosses of thumbtack: n0 instances of Heads = false n1 instances of Heads = true what should we use as P(heads)?

Tails Heads

Empirical frequency: P(heads) = n1 n0 + n1 Laplace smoothing [1812]: P(heads) = n1 + 1 n0 + n1 + 2 Informed priors: P(heads) = n1 + c1 n0 + n1 + c0 + c1 for some informed pseudo counts c0, c1 > 0. c0 = 1, c1 = 1, expressed ignorance (uniform prior) Pseudo-counts convey prior knowledge. Consider: “how much more would I believe α if I had seen one example with α true than if I has seen no examples with α true?” — empirical frequency overfits to the data.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 2 / 6

slide-7
SLIDE 7

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-8
SLIDE 8

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-9
SLIDE 9

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-10
SLIDE 10

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-11
SLIDE 11

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5?

◮ Only restaurants with few ratings have an average rating

  • f 5.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-12
SLIDE 12

Example of Overfitting

We have a web site where people rate restaurants with 1 to 5 stars. We want to report the most liked restaurant(s) — the

  • ne predicted to have the best future ratings.

How can we determine the most liked restaurant? Are the restaurants with the highest average rating the most liked restaurants? Which restaurants have the highest average rating? Which restaurants have a rating of 5?

◮ Only restaurants with few ratings have an average rating

  • f 5.

Solution: add some “average” ratings for each restaurant!

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 3 / 6

slide-13
SLIDE 13

Bayesian Learning

Toss 1 Probability

  • f Heads

Toss 2 Toss 11

… aispace: http://artint.info/code/aispace/beta.xml Probablity of Heads is a random variable representing the probability of heads. Range is {0.0, 0.1, 0.2, . . . , 0.9, 1.0} or interval [0, 1]. P(Toss#n=Heads | Probablity of Heads=v) =

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 4 / 6

slide-14
SLIDE 14

Bayesian Learning

Toss 1 Probability

  • f Heads

Toss 2 Toss 11

… aispace: http://artint.info/code/aispace/beta.xml Probablity of Heads is a random variable representing the probability of heads. Range is {0.0, 0.1, 0.2, . . . , 0.9, 1.0} or interval [0, 1]. P(Toss#n=Heads | Probablity of Heads=v) = v Toss#i is independent of Toss#j (for i = j) given Probablity of Heads i.i.d. or independent and identically distributed.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 4 / 6

slide-15
SLIDE 15

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-16
SLIDE 16

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-17
SLIDE 17

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-18
SLIDE 18

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page hi is the best one number of times word wj is used when hi is the help page.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-19
SLIDE 19

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page hi is the best one number of times word wj is used when hi is the help page. When can the counts be updated?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-20
SLIDE 20

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page hi is the best one number of times word wj is used when hi is the help page. When can the counts be updated? When the correct page is found.

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-21
SLIDE 21

Naive Bayes Classifier: User’s request for help

H "able" "absent" "add" "zoom"

. . .

H is the help page the user is interested in. We observe the words in the query. What probabilities are required? What counts are required? number of times each help page hi is the best one number of times word wj is used when hi is the help page. When can the counts be updated? When the correct page is found. What prior counts should be used? Can they be zero?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 5 / 6

slide-22
SLIDE 22

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-23
SLIDE 23

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-24
SLIDE 24

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-25
SLIDE 25

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-26
SLIDE 26

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found? What about common words?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-27
SLIDE 27

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found? What about common words? What about words that affect other words, e.g. “not”?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-28
SLIDE 28

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found? What about common words? What about words that affect other words, e.g. “not”? What about new words?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-29
SLIDE 29

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found? What about common words? What about words that affect other words, e.g. “not”? What about new words? What do we do with new help pages?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6

slide-30
SLIDE 30

Issues

If you were designing such a system, many issues arise such as: What if the most likely page isn’t the correct page? What if the user can’t find the correct page? What if the user mistakenly thinks they have the correct page? Can some pages never be found? What about common words? What about words that affect other words, e.g. “not”? What about new words? What do we do with new help pages? How can we transfer the language model to a new help system?

c

  • D. Poole and A. Mackworth 2019

Artificial Intelligence, Lecture 10.1 6 / 6