Out line Learning f rom complet e Dat a St at ist ical Learning - - PDF document

out line
SMART_READER_LITE
LIVE PREVIEW

Out line Learning f rom complet e Dat a St at ist ical Learning - - PDF document

Out line Learning f rom complet e Dat a St at ist ical Learning EM algor it hm (part I I ) Reading: R&N Ch 20.3 J uly 5, 2005 CS 486/ 686 Univer sit y of Wat erloo 2 CS486/686 Lecture Slides (c) 2005 P. Poupart I


slide-1
SLIDE 1

1

St at ist ical Learning (part I I )

J uly 5, 2005 CS 486/ 686 Univer sit y of Wat erloo

CS486/686 Lecture Slides (c) 2005 P. Poupart

2

Out line

  • Learning f rom complet e Dat a

– EM algor it hm

  • Reading: R&N Ch 20.3

CS486/686 Lecture Slides (c) 2005 P. Poupart

3

I ncomplet e dat a

  • So f ar…

– Values of all at t ribut es are known – Learning is relat ively easy

  • But many r eal-wor ld problems have

hidden var iables (a.k.a lat ent var iables)

– I ncomplet e dat a – Values of some at t ribut es missing

CS486/686 Lecture Slides (c) 2005 P. Poupart

4

Unsupervised Learning

  • I ncomplet e dat a unsuper vised lear ning
  • Examples:

– Cat egorisat ion of st ars by ast ronomers – Cat egorisat ion of species by ant hropologist s – Market segment at ion f or market ing – Pat t ern ident if icat ion f or f raud det ect ion – Resear ch in general!

CS486/686 Lecture Slides (c) 2005 P. Poupart

5

Maximum Likelihood Learning

  • ML learning of Bayes net paramet er s:

– For θV=t r ue,pa(V)=v = Pr(V=t rue| par (V) = v) – θV=t r ue,pa(V)=v = – Assumes all at t ribut es have values…

  • What if values of some at t r ibut es are

missing?

# [V=t rue,pa(V)=v] # [V=t rue,pa(V)=v] + # [V=f alse,pa(V)=v]

CS486/686 Lecture Slides (c) 2005 P. Poupart

6

“Naive” solut ions f or incomplet e dat a

  • Solut ion # 1: I gnore records wit h

missing values

– But what if all records are missing values (i.e., when a variable is hidden, none of t he records have any value f or t hat variable)

  • Solut ion # 2: I gnore hidden variables

– Model may become signif icant ly more complex!

slide-2
SLIDE 2

2

CS486/686 Lecture Slides (c) 2005 P. Poupart

7

Heart disease example

  • a) simpler (i.e., f ewer CPT paramet er s)
  • b) complex (i.e., lot s of CPT par amet er s)

Smoking Diet Exercise Symptom 1 Symptom 2 Symptom 3

(a) (b)

HeartDisease Smoking Diet Exercise Symptom 1 Symptom 2 Symptom 3 2 2 2 54 6 6 6 2 2 2 54 162 486

CS486/686 Lecture Slides (c) 2005 P. Poupart

8

“Direct ” maximum likelihood

  • Solut ion 3: maximize likelihood dir ect ly

– Let Z be hidden and E observable – hML = argmaxh P(e| h) = argmaxh ΣZ P(e,Z| h) = argmaxh ΣZ Πi CPT(Vi) = argmaxh log ΣZ Πi CPT(Vi) – Problem: can’t push log past sum t o linear ize pr oduct

CS486/686 Lecture Slides (c) 2005 P. Poupart

9

Expect at ion-Maximizat ion (EM)

  • Solut ion # 4: EM algor it hm

– I nt uit ion: if we new t he missing values, comput ing hML would be t rival

  • Guess hML
  • I t erat e

– Expect at ion: based on hML, comput e expect at ion of t he missing values – Maximizat ion: based on expect ed missing values, comput e new est imat e of hML

CS486/686 Lecture Slides (c) 2005 P. Poupart

10

Expect at ion-Maximizat ion (EM)

  • More f ormally:

– Appr oximat e maximum likelihood – I t erat ively comput e: hi+1 = argmaxh ΣZ P(Z| hi,e) log P(e,Z| h) Expect at ion Maximizat ion

CS486/686 Lecture Slides (c) 2005 P. Poupart

11

Expect at ion-Maximizat ion (EM)

  • Derivat ion

– log P(e| h) = log [P(e, Z| h) / P(Z|e,h)] = log P(e, Z| h) – log P(Z| e,h) = ΣZ P(Z| e,h) log P(e,Z| h) – ΣZ P(Z| e,h) log P(Z| e,h) ≥ ΣZ P(Z| e,h) log P(e,Z| h)

  • EM f inds a local maximum of

ΣZ P(Z| e,h) log P(e, Z| h) which is a lower bound of log P(e| h)

CS486/686 Lecture Slides (c) 2005 P. Poupart

12

Expect at ion-Maximizat ion (EM)

  • Log inside sum can linearize product

– hi+1 = argmaxh ΣZ P(Z| hi,e) log P(e,Z| h) = argmaxh ΣZ P(Z| hi,e) log Πj CPTj = argmaxh ΣZ P(Z| hi,e) Σj log CPTj

  • Monot onic improvement of likelihood

– P(e| hi+1) ≥ P(e| hi)

slide-3
SLIDE 3

3

CS486/686 Lecture Slides (c) 2005 P. Poupart

13

Candy Example

  • Suppose you buy t wo bags of candies of

unknown t ype (e.g. f lavour rat ios)

  • You plan t o eat suf f icient ly many

candies of each bag t o lear n t heir t ype

  • I gnoring your plan, your roommat e

mixes bot h bags…

  • How can you learn t he t ype of each bag

despit e being mixed?

CS486/686 Lecture Slides (c) 2005 P. Poupart

14

Candy Example

  • “Bag” var iable is hidden

CS486/686 Lecture Slides (c) 2005 P. Poupart

15

Unsupervised Clust ering

  • “Class” var iable is hidden
  • Naïve Bayes model

(a) (b)

Wrapper Flavor Bag

P( 1) Bag= Bag 1 2

1

F

2

F P(F=cherry | B)

C X Holes

CS486/686 Lecture Slides (c) 2005 P. Poupart

16

Candy Example

  • Unknown Par amet er s:

– θi = P(Bag=i) – θFi = P(Flavour=cherry| Bag=i) – θWi = P(Wrapper=red| Bag=i) – θHi = P(Hole=yes| Bag=i)

  • When eat ing a candy:

– F, W and H are observable – B is hidden

CS486/686 Lecture Slides (c) 2005 P. Poupart

17

Candy Example

  • Let t rue par amet er s be:

– θ=0.5, θF1=θW1=θH1=0.8, θF2=θW2=θH2=0.3

  • Af t er eat ing 1000 candies:

167 94 100 79 F=lime 90 104 93 273 F=cherr y H=0 H=1 H=0 H=1 W=green W=red

CS486/686 Lecture Slides (c) 2005 P. Poupart

18

Candy Example

  • EM algorit hm
  • Guess h0:

– θ=0.6, θF1=θW1=θH1=0.6, θF2=θW2=θH2=0.4

  • Alt ernat e:

– Expect at ion: expect ed # of candies in each bag – Maximizat ion: new paramet er est imat es

slide-4
SLIDE 4

4

CS486/686 Lecture Slides (c) 2005 P. Poupart

19

Candy Example

  • Expect at ion: expect ed # of candies in

each bag

– # [Bag=i] = Σj P(B=i| f j,wj,hj) – Comput e P(B=i| f j,wj,hj) by variable eliminat ion (or any ot her inf erence alg.)

  • Example:

– # [Bag=1] = 612 – # [Bag=2] = 388

CS486/686 Lecture Slides (c) 2005 P. Poupart

20

Candy Example

  • Maximizat ion: relat ive f r equency of

each bag – θ1 = 612/ 1000 = 0.612 – θ2 = 388/ 1000 = 0.388

CS486/686 Lecture Slides (c) 2005 P. Poupart

21

Candy Example

  • Expect at ion: expect ed # of cherry

candies in each bag

– # [B=i,F=cherry] = Σj P(B=i| f j=cherry,wj,hj) – Comput e P(B=i| f j=cher ry,wj,hj) by variable eliminat ion (or any ot her inf erence alg.)

  • Maximizat ion:

– θF1 = # [B=1,F=cherry] / # [B=1] = 0.668 – θF2 = # [B=2,F=cherry] / # [B=2] = 0.389

CS486/686 Lecture Slides (c) 2005 P. Poupart

22

Candy Example

  • 2025
  • 2020
  • 2015
  • 2010
  • 2005
  • 2000
  • 1995
  • 1990
  • 1985
  • 1980
  • 1975

20 40 60 80 100 120 Log-likelihood Iteration number

CS486/686 Lecture Slides (c) 2005 P. Poupart

23

Bayesian net works

  • EM algorit hm f or gener al Bayes net s
  • Expect at ion:

– # [Vi=vij,Pa(Vi)=paik] = expect ed f requency

  • Maximizat ion:

– θvij ,paik = # [Vi=vij,Pa(Vi)=paik] / # [Pa(Vi)=paik]

CS486/686 Lecture Slides (c) 2005 P. Poupart

24

Next Class

  • Next Class:
  • Neur al net wor ks
  • Russell and Norvig Sect . 20.5