Ba y esian Learning Read Ch Suggested exercises - - PDF document

ba y esian learning read ch suggested exercises
SMART_READER_LITE
LIVE PREVIEW

Ba y esian Learning Read Ch Suggested exercises - - PDF document

Ba y esian Learning Read Ch Suggested exercises Ba y es Theorem MAP ML h yp otheses MAP learners Minim um description length principle


slide-1
SLIDE 1 Ba y esian Learning Read Ch
  • Suggested
exercises
  • Ba
y es Theorem
  • MAP
  • ML
h yp
  • theses
  • MAP
learners
  • Minim
um description length principle
  • Ba
y es
  • ptimal
classier
  • Naiv
e Ba y es learner
  • Example
Learning
  • v
er text data
  • Ba
y esian b elief net w
  • rks
  • Exp
ectation Maximization algorithm
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-2
SLIDE 2 Tw
  • Roles
for Ba y esian Metho ds Pro vides practical learning algorithms
  • Naiv
e Ba y es learning
  • Ba
y esian b elief net w
  • rk
learning
  • Com
bine prior kno wledge prior probabiliti es with
  • bserv
ed data
  • Requires
prior probabiliti es Pro vides useful conceptual framew
  • rk
  • Pro
vides gold standard for ev aluating
  • ther
learning algorithms
  • Additional
insigh t in to Occams razor
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-3
SLIDE 3 Ba y es Theorem P hjD
  • P
D jhP h P D
  • P
h
  • prior
probabilit y
  • f
h yp
  • thesis
h
  • P
D
  • prior
probabilit y
  • f
training data D
  • P
hjD
  • probabilit
y
  • f
h giv en D
  • P
D jh
  • probabilit
y
  • f
D giv en h
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-4
SLIDE 4 Cho
  • sing
Hyp
  • theses
P hjD
  • P
D jhP h P D
  • Generally
w an t the most probable h yp
  • thesis
giv en the training data Maximum a p
  • steriori
h yp
  • thesis
h M AP
  • h
M AP
  • arg
max hH P hjD
  • arg
max hH P D jhP h P D
  • arg
max hH P D jhP h If assume P h i
  • P
h j
  • then
can further simplify
  • and
c ho
  • se
the Maximum likeliho
  • d
ML h yp
  • thesis
h M L
  • arg
max h i H P D jh i
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-5
SLIDE 5 Ba y es Theorem Do es patien t ha v e cancer
  • r
not A patien t tak es a lab test and the result comes bac k p
  • sitiv
e The test returns a correct p
  • sitiv
e result in
  • nly
  • f
the cases in whic h the disease is actually presen t and a correct negativ e result in
  • nly
  • f
the cases in whic h the disease is not presen t F urthermore
  • f
the en tire p
  • pulation
ha v e this cancer P cancer
  • P
cancer
  • P
jcancer
  • P
jcancer
  • P
jcancer
  • P
jcancer
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-6
SLIDE 6 Basic F
  • rm
ulas for Probabilities
  • Pr
  • duct
R ule probabilit y P A
  • B
  • f
a conjunction
  • f
t w
  • ev
en ts A and B P A
  • B
  • P
AjB P B
  • P
B jAP A
  • Sum
R ule probabilit y
  • f
a disjunction
  • f
t w
  • ev
en ts A and B P A
  • B
  • P
A
  • P
B
  • P
A
  • B
  • The
  • r
em
  • f
total pr
  • b
ability if ev en ts A
  • A
n are m utually exclusiv e with P n i P A i
  • then
P B
  • n
X i P B jA i P A i
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-7
SLIDE 7 Brute F
  • rce
MAP Hyp
  • thesis
Learner
  • F
  • r
eac h h yp
  • thesis
h in H
  • calculate
the p
  • sterior
probabilit y P hjD
  • P
D jhP h P D
  • Output
the h yp
  • thesis
h M AP with the highest p
  • sterior
probabilit y h M AP
  • argmax
hH P hjD
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-8
SLIDE 8 Relation to Concept Learning Consider
  • ur
usual concept learning task
  • instance
space X
  • h
yp
  • thesis
space H
  • training
examples D
  • consider
the FindS learning algorithm
  • utputs
most sp ecic h yp
  • thesis
from the v ersion space V S H D
  • What
w
  • uld
Ba y es rule pro duce as the MAP h yp
  • thesis
Do es F indS
  • utput
a MAP h yp
  • thesis
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-9
SLIDE 9 Relation to Concept Learning Assume xed set
  • f
instances hx
  • x
m i Assume D is the set
  • f
classications D
  • hcx
  • cx
m i Cho
  • se
P D jh
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-10
SLIDE 10 Relation to Concept Learning Assume xed set
  • f
instances hx
  • x
m i Assume D is the set
  • f
classications D
  • hcx
  • cx
m i Cho
  • se
P D jh
  • P
D jh
  • if
h consisten t with D
  • P
D jh
  • therwise
Cho
  • se
P h to b e uniform distribution
  • P
h
  • jH
j for all h in H Then P hjD
  • jV
S H D j if h is consisten t with D
  • therwise
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-11
SLIDE 11 Ev
  • lution
  • f
P
  • sterior
Probabiliti es

hypotheses hypotheses hypotheses P(h|D1,D2) P(h|D1) P h) ( a ( ) b ( ) c ( )

  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-12
SLIDE 12 Characterizing Learning Algorithms b y Equiv alen t MAP Learners

Inductive system Output hypotheses Output hypotheses Brute force MAP learner Candidate Elimination Algorithm

Prior assumptions made explicit

P(h) uniform P(D|h) = 0 if inconsistent, = 1 if consistent Equivalent Bayesian inference system Training examples D Hypothesis space H Hypothesis space H Training examples D

  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-13
SLIDE 13 Learning A Real V alued F unction

hML f e y x

Consider an y realv alued target function f T raining examples hx i
  • d
i i where d i is noisy training v alue
  • d
i
  • f
x i
  • e
i
  • e
i is random v ariable noise dra wn indep enden tly for eac h x i according to some Gaussian distribution with mean Then the maxim um lik eli ho
  • d
h yp
  • thesis
h M L is the
  • ne
that minimizes the sum
  • f
squared errors h M L
  • arg
min hH m X i
  • d
i
  • hx
i
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-14
SLIDE 14 Learning A Real V alued F unction h M L
  • argmax
hH pD jh
  • argmax
hH m Y i pd i jh
  • argmax
hH m Y i
  • p
  • e
  • d
i hx i
  • Maximize
natural log
  • f
this instead h M L
  • argmax
hH m X i ln
  • p
  • B
B
  • d
i
  • hx
i
  • C
C A
  • argmax
hH m X i
  • B
B
  • d
i
  • hx
i
  • C
C A
  • argmax
hH m X i
  • d
i
  • hx
i
  • argmin
hH m X i
  • d
i
  • hx
i
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-15
SLIDE 15 Learning to Predict Probabiliti es Consider predicting surviv al probabilit y from patien t data T raining examples hx i
  • d
i i where d i is
  • r
  • W
an t to train neural net w
  • rk
to
  • utput
a pr
  • b
ability giv en x i not a
  • r
  • In
this case can sho w h M L
  • argmax
hH m X i d i ln hx i
  • d
i
  • ln
  • hx
i
  • W
eigh t up date rule for a sigmoid unit w j k
  • w
j k
  • w
j k where w j k
  • m
X i d i
  • hx
i
  • x
ij k
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-16
SLIDE 16 Minim um Description Length Principl e Occams razor prefer the shortest h yp
  • thesis
MDL prefer the h yp
  • thesis
h that minimizes h M D L
  • argmin
hH L C
  • h
  • L
C
  • D
jh where L C x is the description length
  • f
x under enco ding C Example H
  • decision
trees D
  • training
data lab els
  • L
C
  • h
is
  • bits
to describ e tree h
  • L
C
  • D
jh is
  • bits
to describ e D giv en h
  • Note
L C
  • D
jh
  • if
examples classied p erfectly b y h Need
  • nly
describ e exceptions
  • Hence
h M D L trades
  • tree
size for training errors
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-17
SLIDE 17 Minim um Description Length Principl e h M AP
  • arg
max hH P D jhP h
  • arg
max hH log
  • P
D jh
  • log
  • P
h
  • arg
min hH
  • log
  • P
D jh
  • log
  • P
h
  • In
teresting fact from information theory The
  • ptimal
shortest exp ected co ding length co de for an ev en t with probabilit y p is
  • log
  • p
bits So in terpret
  • log
  • P
h is length
  • f
h under
  • ptimal
co de
  • log
  • P
D jh is length
  • f
D giv en h under
  • ptimal
co de
  • prefer
the h yp
  • thesis
that minimizes l eng thh
  • l
eng thmiscl assif ications
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-18
SLIDE 18 Most Probable Classicatio n
  • f
New Instances So far w ev e sough t the most probable hyp
  • thesis
giv en the data D ie h M AP
  • Giv
en new instance x what is its most probable classic ation
  • h
M AP x is not the most probable classicati
  • n
Consider
  • Three
p
  • ssible
h yp
  • theses
P h
  • jD
  • P
h
  • jD
  • P
h
  • jD
  • Giv
en new instance x h
  • x
  • h
  • x
  • h
  • x
  • Whats
most probable classicati
  • n
  • f
x
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-19
SLIDE 19 Ba y es Optimal Classier Ba y es
  • ptimal
classication arg max v j V X h i H P v j jh i P h i jD
  • Example
P h
  • jD
  • P
jh
  • P
jh
  • P
h
  • jD
  • P
jh
  • P
jh
  • P
h
  • jD
  • P
jh
  • P
jh
  • therefore
X h i H P jh i P h i jD
  • X
h i H P jh i P h i jD
  • and
arg max v j V X h i H P v j jh i P h i jD
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-20
SLIDE 20 Gibbs Classier Ba y es
  • ptimal
classier pro vides b est result but can b e exp ensiv e if man y h yp
  • theses
Gibbs algorithm
  • Cho
  • se
  • ne
h yp
  • thesis
at random according to P hjD
  • Use
this to classify new instance Surprising fact Assume target concepts are dra wn at random from H according to priors
  • n
H
  • Then
E er r
  • r
Gibbs
  • E
er r
  • r
B ay esO ptimal
  • Supp
  • se
correct uniform prior distribution
  • v
er H
  • then
  • Pic
k an y h yp
  • thesis
from VS with uniform probabilit y
  • Its
exp ected error no w
  • rse
than t wice Ba y es
  • ptimal
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-21
SLIDE 21 Naiv e Ba y es Classier Along with decision trees neural net w
  • rks
nearest n br
  • ne
  • f
the most practical learning metho ds When to use
  • Mo
derate
  • r
large training set a v ailable
  • A
ttributes that describ e instances are conditionall y indep enden t giv en classication Successful applications
  • Diagnosis
  • Classifying
text do cumen ts
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-22
SLIDE 22 Naiv e Ba y es Classier Assume target function f
  • X
  • V
  • where
eac h instance x describ ed b y attributes ha
  • a
  • a
n i Most probable v alue
  • f
f x is v M AP
  • argmax
v j V P v j ja
  • a
  • a
n
  • v
M AP
  • argmax
v j V P a
  • a
  • a
n jv j P v j
  • P
a
  • a
  • a
n
  • argmax
v j V P a
  • a
  • a
n jv j P v j
  • Naiv
e Ba y es assumption P a
  • a
  • a
n jv j
  • Y
i P a i jv j
  • whic
h giv es Naiv e Ba y es classier v N B
  • argmax
v j V P v j
  • Y
i P a i jv j
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-23
SLIDE 23 Naiv e Ba y es Algorithm Naiv e Ba y es Learnexampl es F
  • r
eac h target v alue v j
  • P
v j
  • estimate
P v j
  • F
  • r
eac h attribute v alue a i
  • f
eac h attribute a
  • P
a i jv j
  • estimate
P a i jv j
  • Classify
New Instancex v N B
  • argmax
v j V
  • P
v j
  • Y
a i x
  • P
a i jv j
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-24
SLIDE 24 Naiv e Ba y es Example Consider PlayT ennis again and new instance hO utl k
  • sun
T emp
  • cool
  • H
umid
  • hig
h W ind
  • str
  • ng
W an t to compute v N B
  • argmax
v j V P v j
  • Y
i P a i jv j
  • P
y
  • P
sunjy
  • P
cool jy
  • P
hig hjy
  • P
str
  • ng
jy
  • P
n P sunjn P cool jn P hig hjn P str
  • ng
jn
  • v
N B
  • n
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-25
SLIDE 25 Naiv e Ba y es Subtletie s
  • Conditional
indep endence assumption is
  • ften
violated P a
  • a
  • a
n jv j
  • Y
i P a i jv j
  • but
it w
  • rks
surprisingly w ell an yw a y
  • Note
dont need estimated p
  • steriors
  • P
v j jx to b e correct need
  • nly
that argmax v j V
  • P
v j
  • Y
i
  • P
a i jv j
  • argmax
v j V P v j P a
  • a
n jv j
  • see
Domingos ! P azzani
  • for
analysis
  • Naiv
e Ba y es p
  • steriors
  • ften
unrealistical l y close to
  • r
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-26
SLIDE 26 Naiv e Ba y es Subtletie s
  • what
if none
  • f
the training instances with target v alue v j ha v e attribute v alue a i
  • Then
  • P
a i jv j
  • and
  • P
v j
  • Y
i
  • P
a i jv j
  • T
ypical solution is Ba y esian estimate for
  • P
a i jv j
  • P
a i jv j
  • n
c
  • mp
n
  • m
where
  • n
is n um b er
  • f
training examples for whic h v
  • v
j
  • n
c n um b er
  • f
examples for whic h v
  • v
j and a
  • a
i
  • p
is prior estimate for
  • P
a i jv j
  • m
is w eigh t giv en to prior ie n um b er
  • f
virtual examples
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-27
SLIDE 27 Learning to Classify T ext Wh y
  • Learn
whic h news articles are
  • f
in terest
  • Learn
to classify w eb pages b y topic Naiv e Ba y es is among most eectiv e algorithms What attributes shall w e use to represen t text do cumen ts
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-28
SLIDE 28 Learning to Classify T ext T arget concept I nter esting
  • D
  • cument
  • f
g
  • Represen
t eac h do cumen t b y v ector
  • f
w
  • rds
  • ne
attribute p er w
  • rd
p
  • sition
in do cumen t
  • Learning
Use training examples to estimate
  • P
  • P
  • P
docj
  • P
docj Naiv e Ba y es conditional indep endence assumption P docjv j
  • l
eng thdoc Y i P a i
  • w
k jv j
  • where
P a i
  • w
k jv j
  • is
probabilit y that w
  • rd
in p
  • sition
i is w k
  • giv
en v j
  • ne
more assumption P a i
  • w
k jv j
  • P
a m
  • w
k jv j
  • i
m
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-29
SLIDE 29 Learn naive Ba yes textE xampl es V
  • c
  • l
le ct al l wor ds and
  • ther
tokens that
  • c
cur in E xampl es
  • V
  • cabul
ar y
  • all
distinct w
  • rds
and
  • ther
tok ens in E xampl es
  • c
alculate the r e quir e d P v j
  • and
P w k jv j
  • pr
  • b
ability terms
  • F
  • r
eac h target v alue v j in V do
  • docs
j
  • subset
  • f
E xampl es for whic h the target v alue is v j
  • P
v j
  • jdocs
j j jE xampl esj
  • T
ext j
  • a
single do cumen t created b y concatenating all mem b ers
  • f
docs j
  • n
  • total
n um b er
  • f
w
  • rds
in T ext j coun ting duplicate w
  • rds
m ultiple times
  • for
eac h w
  • rd
w k in V
  • cabul
ar y
  • n
k
  • n
um b er
  • f
times w
  • rd
w k
  • ccurs
in T ext j
  • P
w k jv j
  • n
k
  • njV
  • cabul
ar y j
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-30
SLIDE 30 Classify naive Ba yes textD
  • c
  • positions
  • all
w
  • rd
p
  • sitions
in D
  • c
that con tain tok ens found in V
  • cabul
ar y
  • Return
v N B
  • where
v N B
  • argmax
v j V P v j
  • Y
ipositions P a i jv j
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-31
SLIDE 31 Tw en t y NewsGroups Giv en
  • training
do cumen ts from eac h group Learn to classify new do cumen ts according to whic h newsgroup it came from compgraphics miscforsale composmswindo wsm isc recautos compsysibmp chardw are recmotorcycles compsysmachardw are recsp
  • rtbaseball
compwindo wsx recsp
  • rtho
c k ey altatheism scispace so creligionc hrist i an scicrypt talkreligi
  • nmisc
scielect ronics talkp
  • li
ti c smideast scimed talkp
  • li
t i csmisc talkp
  • li
ti csguns Naiv e Ba y es
  • classicati
  • n
accuracy
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-32
SLIDE 32 Article from recsp
  • rtho
c k ey Path cantaloupesrvcs cmu edud asne wsha rvard e From xxxyyyzzzedu John Doe Subject Re This years biggest and worst
  • pinio
Date
  • Apr
  • GMT
I can
  • nly
comment
  • n
the Kings but the most
  • bvious
candidate for pleasant surprise is Alex Zhitnik He came highly touted as a defensive defenseman but hes clearly much more than that Great skater and hard shot though wish he were more accurate In fact he pretty much allowed the Kings to trade away that huge defensive liability Paul Coffey Kelly Hrudey is
  • nly
the biggest disappointment if you thought he was any good to begin with But at best hes
  • nly
a mediocre goaltender A better choice would be Tomas Sandstrom though not through any fault
  • f
his
  • wn
but because some thugs in Toronto decided
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-33
SLIDE 33 Learning Curv e for
  • Newsgroups

10 20 30 40 50 60 70 80 90 100 100 1000 10000 20News Bayes TFIDF PRTFIDF

Accuracy vs T raining set size " withheld for test
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-34
SLIDE 34 Ba y esian Belief Net w
  • rks
In teresting b ecause
  • Naiv
e Ba y es assumption
  • f
conditional indep endence to
  • restrictiv
e
  • But
its in tractable without some suc h assumptions
  • Ba
y esian Belief net w
  • rks
describ e conditional indep endence among subsets
  • f
v ariables
  • allo
ws com bining prior kno wledge ab
  • ut
indep endencies among v ariables with
  • bserv
ed training data also called Ba y es Nets
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-35
SLIDE 35 Conditional Indep endence Denition X is c
  • nditional
ly indep endent
  • f
Y giv en Z if the probabilit y distribution go v erning X is indep enden t
  • f
the v alue
  • f
Y giv en the v alue
  • f
Z that is if x i
  • y
j
  • z
k
  • P
X
  • x
i jY
  • y
j
  • Z
  • z
k
  • P
X
  • x
i jZ
  • z
k more compactly
  • w
e write P X jY
  • Z
  • P
X jZ
  • Example
T hunder is conditionall y indep enden t
  • f
R ain giv en Lig htning P T hunder jR ain Lig htning
  • P
T hunder jLig htning
  • Naiv
e Ba y es uses cond indep to justify P X
  • Y
jZ
  • P
X jY
  • Z
P Y jZ
  • P
X jZ P Y jZ
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-36
SLIDE 36 Ba y esian Belief Net w
  • rk

Storm Campfire Lightning Thunder ForestFire Campfire C ¬C ¬S,B ¬S,¬B 0.4 0.6 0.1 0.9 0.8 0.2 0.2 0.8 S,¬B BusTourGroup S,B

Net w
  • rk
represen ts a set
  • f
conditional indep endence assertions
  • Eac
h no de is asserted to b e conditionall y indep enden t
  • f
its nondescendan ts giv en its immediate predecessors
  • Directed
acyclic graph
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-37
SLIDE 37 Ba y esian Belief Net w
  • rk

Storm Campfire Lightning Thunder ForestFire Campfire C ¬C ¬S,B ¬S,¬B 0.4 0.6 0.1 0.9 0.8 0.2 0.2 0.8 S,¬B BusTourGroup S,B

Represen ts join t probabilit y distribution
  • v
er all v ariables
  • eg
P S tor m B usT
  • ur
Gr
  • up
  • F
  • r
estF ir e
  • in
general P y
  • y
n
  • n
Y i P y i jP ar entsY i
  • where
P ar entsY i
  • denotes
immediate predecessors
  • f
Y i in graph
  • so
join t distribution is fully dened b y graph plus the P y i jP ar entsY i
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-38
SLIDE 38 Inference in Ba y esian Net w
  • rks
Ho w can
  • ne
infer the probabiliti es
  • f
  • v
alues
  • f
  • ne
  • r
more net w
  • rk
v ariables giv en
  • bserv
ed v alues
  • f
  • thers
  • Ba
y es net con tains all information needed for this inference
  • If
  • nly
  • ne
v ariable with unkno wn v alue easy to infer it
  • In
general case problem is NP hard In practice can succeed in man y cases
  • Exact
inference metho ds w
  • rk
w ell for some net w
  • rk
structures
  • Mon
te Carlo metho ds sim ulate the net w
  • rk
randomly to calculate appro ximate solutions
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-39
SLIDE 39 Learning
  • f
Ba y esian Net w
  • rks
Sev eral v arian ts
  • f
this learning task
  • Net
w
  • rk
structure migh t b e known
  • r
unknown
  • T
raining examples migh t pro vide v alues
  • f
al l net w
  • rk
v ariables
  • r
just some If structure kno wn and
  • bserv
e all v ariables
  • Then
its easy as training a Naiv e Ba y es classier
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-40
SLIDE 40 Learning Ba y es Nets Supp
  • se
structure kno wn v ariables partially
  • bserv
able eg
  • bserv
e F
  • r
estFir e Storm BusT
  • urGr
  • up
Thunder but not Lightning Campr e
  • Similar
to training neural net w
  • rk
with hidden units
  • In
fact can learn net w
  • rk
conditional probabilit y tables using gradien t ascen t
  • Con
v erge to net w
  • rk
h that lo call y
  • maximizes
P D jh
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-41
SLIDE 41 Gradien t Ascen t for Ba y es Nets Let w ij k denote
  • ne
en try in the conditional probabilit y table for v ariable Y i in the net w
  • rk
w ij k
  • P
Y i
  • y
ij jP ar entsY i
  • the
list u ik
  • f
v alues eg if Y i
  • C
ampf ir e then u ik migh t b e hS tor m
  • T
  • B
usT
  • ur
Gr
  • up
  • F
i P erform gradien t ascen t b y rep eatedly
  • up
date all w ij k using training data D w ij k
  • w
ij k
  • X
dD P h y ij
  • u
ik jd w ij k
  • then
renormalize the w ij k to assure
  • P
j w ij k
  • w
ij k
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-42
SLIDE 42 More
  • n
Learning Ba y es Nets EM algorithm can also b e used Rep eatedly
  • Calculate
probabiliti es
  • f
unobserv ed v ariables assuming h
  • Calculate
new w ij k to maximize E ln P D jh where D no w includes b
  • th
  • bserv
ed and calculated probabilitie s
  • f
  • unobserv
ed v ariables When structure unkno wn
  • Algorithms
use greedy searc h to add"substract edges and no des
  • Activ
e researc h topic
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-43
SLIDE 43 Summary Ba y esian Belief Net w
  • rks
  • Com
bine prior kno wledge with
  • bserv
ed data
  • Impact
  • f
prior kno wledge when correct is to lo w er the sample complexit y
  • Activ
e researc h area
  • Extend
from b
  • lean
to realv alued v ariables
  • P
arameterized distributions instead
  • f
tables
  • Extend
to rstorder instead
  • f
prop
  • sitional
systems
  • More
eectiv e inference metho ds
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-44
SLIDE 44 Exp ectation Maximization EM When to use
  • Data
is
  • nly
partially
  • bserv
able
  • Unsup
ervised clustering target v alue unobserv able
  • Sup
ervised learning some instance attributes unobserv able Some uses
  • T
rain Ba y esian Belief Net w
  • rks
  • Unsup
ervised clustering A UTOCLASS
  • Learning
Hidden Mark
  • v
Mo dels
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-45
SLIDE 45 Generating Data from Mixture
  • f
k Gaussians

p(x) x

Eac h instance x generated b y
  • Cho
  • sing
  • ne
  • f
the k Gaussians with uniform probabilit y
  • Generating
an instance at random according to that Gaussian
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-46
SLIDE 46 EM for Estimating k Means Giv en
  • Instances
from X generated b y mixture
  • f
k Gaussian distributions
  • Unkno
wn means h
  • k
i
  • f
the k Gaussians
  • Dont
kno w whic h instance x i w as generated b y whic h Gaussian Determine
  • Maxim
um lik eli ho
  • d
estimates
  • f
h
  • k
i Think
  • f
full description
  • f
eac h instance as y i
  • hx
i
  • z
i
  • z
i i where
  • z
ij is
  • if
x i generated b y j th Gaussian
  • x
i
  • bserv
able
  • z
ij unobserv able
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-47
SLIDE 47 EM for Estimating k Means EM Algorithm Pic k random initial h
  • h
  • i
then iterate E step Calculate the exp ected v alue E z ij
  • f
eac h hidden v ariable z ij
  • assuming
the curren t h yp
  • thesis
h
  • h
  • i
holds E z ij
  • px
  • x
i j
  • j
  • P
  • n
px
  • x
i j
  • n
  • e
  • x
i
  • j
  • P
  • n
e
  • x
i
  • n
  • M
step Calculate a new maxim um lik eli ho
  • d
h yp
  • thesis
h
  • h
  • i
assuming the v alue tak en
  • n
b y eac h hidden v ariable z ij is its exp ected v alue E z ij
  • calculated
ab
  • v
e Replace h
  • h
  • i
b y h
  • h
  • i
  • j
  • P
m i E z ij
  • x
i P m i E z ij
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-48
SLIDE 48 EM Algorithm Con v erges to lo cal maxim um lik eli ho
  • d
h and pro vides estimates
  • f
hidden v ariables z ij In fact lo cal maxim um in E ln P Y jh
  • Y
is complete
  • bserv
able plus unobserv able v ariables data
  • Exp
ected v alue is tak en
  • v
er p
  • ssible
v alues
  • f
unobserv ed v ariables in Y
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-49
SLIDE 49 General EM Problem Giv en
  • Observ
ed data X
  • fx
  • x
m g
  • Unobserv
ed data Z
  • fz
  • z
m g
  • P
arameterized probabilit y distribution P Y jh where
  • Y
  • fy
  • y
m g is the full data y i
  • x
i
  • z
i
  • h
are the parameters Determine
  • h
that lo call y maximizes E ln P Y jh Man y uses
  • T
rain Ba y esian b elief net w
  • rks
  • Unsup
ervised clustering eg k means
  • Hidden
Mark
  • v
Mo dels
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill
slide-50
SLIDE 50 General EM Metho d Dene lik eli ho
  • d
function Qh
  • jh
whic h calculates Y
  • X
  • Z
using
  • bserv
ed X and curren t parameters h to estimate Z Qh
  • jh
  • E
ln P Y jh
  • jh
X
  • EM
Algorithm Estimation E step Calculate Qh
  • jh
using the curren t h yp
  • thesis
h and the
  • bserv
ed data X to estimate the probabilit y distribution
  • v
er Y
  • Qh
  • jh
  • E
ln P Y jh
  • jh
X
  • Maximization
M step Replace h yp
  • thesis
h b y the h yp
  • thesis
h
  • that
maximizes this Q function h
  • argmax
h
  • Qh
  • jh
  • lecture
slides for textb
  • k
Machine L e arning T Mitc hell McGra w Hill