What s the probability that the next candy is lime? What is P(d - - PowerPoint PPT Presentation

what s the probability that the next candy is lime
SMART_READER_LITE
LIVE PREVIEW

What s the probability that the next candy is lime? What is P(d - - PowerPoint PPT Presentation

What s the probability that the next candy is lime? What is P(d i+1 | d 1 ,, d i ) = P(X | D )? We don t know which bag of candy we got, so we have to assume it could be any one of them: P(X | D ) = i P(X | D , h i )P(h i |


slide-1
SLIDE 1

Whatʼs the probability that the next candy is lime?

1

CS440/ECE448: Intro AI

  • What is P(di+1 | d1 ,…, di ) = P(X | D)?

We donʼt know which bag of candy we got, so we have to assume it could be any one of them:

P(X | D) = ∑i P(X | D, hi )P(hi |D) = ∑i P(X | hi )P(hi |D)

slide-2
SLIDE 2

Lecture 19
 Learning graphical models

  • Prof. Julia Hockenmaier

juliahmr@illinois.edu

  • http://cs.illinois.edu/fa11/cs440
  • CS440/ECE448: Intro to Artificial Intelligence
slide-3
SLIDE 3

The Burglary example

3

CS440/ECE448: Intro AI

Burglary Earthquake Alarm JohnCalls MaryCalls

B=t ¡ B=f ¡ .001 ¡ 0.999 ¡ E=t ¡ E=f ¡ .002 ¡ 0.998 ¡ B ¡ E ¡ A=t ¡ A=f ¡ t ¡ t ¡ .95 ¡ .05 ¡ t ¡ f ¡ .94 ¡ .06 ¡ f ¡ t ¡ .29 ¡ .71 ¡ f ¡ f ¡ .999 ¡ .001 ¡ A ¡ M=t ¡ M=f ¡ t ¡ .7 ¡ .3 ¡ f ¡ .01 ¡ .99 ¡ A ¡ J=t ¡ J=f ¡ t ¡ .9 ¡ .1 ¡ f ¡ .05 ¡ .95 ¡

What is the probability of a burglary if John and Mary call?

slide-4
SLIDE 4

Learning Bayes Nets

slide-5
SLIDE 5

How do we know the parameters

  • f a Bayes Net?

We want to estimate the parameters based

  • n data D.
  • Data = instantiations of some or all random

variables in the Bayes Net.

  • The data are our evidence.
slide-6
SLIDE 6

Surprise Candy

There are two flavors of Surprise Candy: cherry and lime. Both have the same wrapper.

  • There are five different types of bags (which all

look the same) that Surprise Candy is sold in:

– h1: 100% cherry – h2: 75% cherry + 25% lime – h3: 50% cherry + 50% lime – h4: 25% cherry + 75% lime – h5: 100% lime 6

CS440/ECE448: Intro AI

slide-7
SLIDE 7

Surprise Candy

You just bought a bag of Surprise Candy. Which kind of bag did you get?

  • There are five different hypotheses: h1-h5
  • You start eating your candy. This is your data

D1 = cherry, D2 = lime, …., DN= ….

  • What is the most likely hypothesis given your data

(evidence)?

  • 7

CS440/ECE448: Intro AI

slide-8
SLIDE 8

Conditional probability
 refresher

  • 8

CS440/ECE448: Intro AI

P(X |Y )P(Y ) = P(Y | X)P(X) P(X |Y ) = P(X,Y ) P(Y ) P(X |Y )P(Y ) = P(X,Y )

slide-9
SLIDE 9

Bayes Rule

  • P(cause): prior probability of cause

P(cause | effect): posterior probability of cause. P(effect | cause): likelihood of effect

  • Prior ∝ posterior × likelihood

9

CS440/ECE448: Intro AI

P(cause | effect)! P(effect | cause)P(cause) P(cause | effect) = P(effect | cause)P(cause) P(effect)

slide-10
SLIDE 10

Bayes Rule

  • P(h): prior probability of hypothesis

P(h | D): posterior probability of hypothesis. P(D | h): likelihood of data, given hypothesis

  • Prior ∝ posterior × likelihood

10

CS440/ECE448: Intro AI

P(h | D)! P(D | h)P(h) P(h | D) = P(D | h)P(h) P(D)

slide-11
SLIDE 11

Bayes Rule

  • P(h): prior probability of hypothesis

P(h | D): posterior probability of hypothesis. P(D | h): likelihood of data, given hypothesis

11

CS440/ECE448: Intro AI

argmaxh P(h | D) = argmaxh P(D | h)P(h) P(D) = argmaxh P(D | h)P(h)

slide-12
SLIDE 12

Bayesian learning

Use Bayes rule to calculate the probability of each hypothesis given the data.

  • How do we know the prior and the

likelihood?

  • 12

CS440/ECE448: Intro AI

P(h | D) = P(D | h)P(h) P(D)

slide-13
SLIDE 13

The prior P(h)

Sometimes we know P(h) in advance.

– Surprise Candy: (0.1, 0.2, 0.4, 0.2, 0.1)

  • Sometimes we have to make an assumption

(e.g. a uniform prior, when we donʼt know anything)

13

CS440/ECE448: Intro AI

slide-14
SLIDE 14

The likelihood P(D|h)

We typically assume that each observation di is drawn “i.i.d.” - independently from the same (identical) distribution.

  • Therefore:
  • 14

CS440/ECE448: Intro AI

P(D | h) = P(di | h)

i

!

slide-15
SLIDE 15

The posterior P(h|D)

Assume weʼve seen 10 lime candies:

15

CS440/ECE448: Intro AI

0.2 0.4 0.6 0.8 1 2 4 6 8 10 Posterior probability of hypothesis Number of observations in d P(h1 | d) P(h2 | d) P(h3 | d) P(h4 | d) P(h5 | d)

slide-16
SLIDE 16

Whatʼs the probability that the next candy is lime?

This probability will eventually (if we had an infinite amount of data) agree with the true hypothesis.

16

CS440/ECE448: Intro AI

0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 Probability that next candy is lime Number of observations in d

slide-17
SLIDE 17

Bayes optimal prediction

We donʼt know which hypothesis is true, 
 so we marginalize them out:

  • P(X | D) = ∑i P(X | hi )P(hi |D)

This is guaranteed to converge to the true hypothesis.

  • 17

CS440/ECE448: Intro AI

slide-18
SLIDE 18

Maximum a-posteriori (MAP)

We assume the hypothesis with the maximum posterior probability 
 
 hMAP = argmaxh P(h|D) is true:

  • P(X | D) = P(X | hMAP )

18

CS440/ECE448: Intro AI

slide-19
SLIDE 19

Maximum likelihood (ML)

We assume a uniform prior P(h). We then choose the hypothesis that assigns the highest likelihood to the data
 
 hML = argmax h P(D|h) P(X | D) = P(X | hML ) This is commonly used in machine learning.

19

CS440/ECE448: Intro AI

slide-20
SLIDE 20

Surprise candy again

Now the manufacturer has been bought up by another company.

  • Now we donʼt know the lime-cherry

proportions θ (= P(cherry)) anymore.

  • Can we estimate θ from data?

20

CS440/ECE448: Intro AI

flavor

cherry θ lime: 1-θ

slide-21
SLIDE 21

Maximum likelihood learning

Given data D, we want to find the parameters that maximize P(D | θ).

  • We have a data set with N candies.

c candies are cherry. l = (N-c) candies are lime.

  • 21

CS440/ECE448: Intro AI

slide-22
SLIDE 22

Maximum likelihood learning

Out of N candies, c are cherry, (N-c) lime.

  • The likelihood of our data set:
  • 22

CS440/ECE448: Intro AI

P(d |!) = P(d j |

j=1 N

!

!) = ! c(1"!)l

slide-23
SLIDE 23

Log likelihood

Itʼs actually easier to work with 
 the log-likelihood:

23

CS440/ECE448: Intro AI

L(d |!) = log P(d|!) = logP(di !)

j=1 N

!

= c log! +l log(1"!)

slide-24
SLIDE 24

Maximizing Log-likelihood

24

CS440/ECE448: Intro AI

dL(D |!) d! = c ! ! l 1!! = 0 "! = c c +l = c N

slide-25
SLIDE 25

Maximum likelihood estimation

We can simply count how many cherry candies we see.

  • This is also called the relative frequency

estimate.

  • It is appropriate when we have complete

data (i.e. we know the flavor of each candy).

25

CS440/ECE448: Intro AI

slide-26
SLIDE 26

Todayʼs reading

Chapter 13.5, Chapter 20.1 and 20.2.1

  • 26

CS440/ECE448: Intro AI