Markov Decision Processes
Philipp Koehn 7 April 2020
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn - - PowerPoint PPT Presentation
Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020 Outline 1 Hidden Markov models Inference: filtering, smoothing, best sequence Dynamic Bayesian
Philipp Koehn 7 April 2020
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
1
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
2
e.g., BloodSugart, StomachContentst, etc.
e.g., MeasuredBloodSugart, PulseRatet, FoodEatent
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
3
Second-order Markov process: P(Xt∣X0∶t−1) ≃ P(Xt∣Xt−2,Xt−1)
sensor model P(Et∣Xt) fixed for all t
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
4
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
5
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
6
belief state—input to the decision process of a rational agent
better estimate of past states, essential for learning
speech recognition, decoding with a noisy channel
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
7
P(Xt+1∣e1∶t+1) = P(Xt+1∣e1∶t,et+1) = αP(et+1∣Xt+1,e1∶t)P(Xt+1∣e1∶t) (Bayes rule) ≃ αP(et+1∣Xt+1)P(Xt+1∣e1∶t) (Sensor Markov assumption) = αP(et+1∣Xt+1)∑
xt
P(Xt+1∣xt,e1∶t)P(xt∣e1∶t) (multiplying out) ≃ αP(et+1∣Xt+1)∑
xt
P(Xt+1∣xt)P(xt∣e1∶t) (first order Markov model)
P(Xt+1∣e1∶t+1) ≃ αP(et+1∣Xt+1) ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
emission
∑
xt
P(Xt+1∣xt) ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
transition
P(xt∣e1∶t) ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
recursive call
Time and space constant (independent of t)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
8
emission transition transition emission
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
9
⇒ what is the state probability P(Xk∣e1∶t) including future evidence?
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
10
P(Xk∣e1∶t) = P(Xk∣e1∶k,ek+1∶t) = αP(Xk∣e1∶k)P(ek+1∶t∣Xk,e1∶k) ≃ αP(Xk∣e1∶k)P(ek+1∶t∣Xk) = αf1∶kbk+1∶t
P(ek+1∶t∣Xk) = ∑
xk+1
P(ek+1∶t∣Xk,xk+1)P(xk+1∣Xk) ≃ ∑
xk+1
P(ek+1∶t∣xk+1)P(xk+1∣Xk) = ∑
xk+1
P(ek+1∣xk+1)P(ek+2∶t∣xk+1)P(xk+1∣Xk)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
11
Forward–backward algorithm: cache forward messages along the way Time linear in t (polytree inference), space O(t∣f∣)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
12
= most likely path to some xt plus one more step max
x1...xt P(x1,...,xt,Xt+1∣e1∶t+1)
= P(et+1∣Xt+1)max
xt (P(Xt+1∣xt) max x1...xt−1 P(x1,...,xt−1,xt∣e1∶t))
m1∶t = max
x1...xt−1 P(x1,...,xt−1,Xt∣e1∶t)
i.e., m1∶t(i) gives the probability of the most likely path to state i.
m1∶t+1 = P(et+1∣Xt+1)max
xt (P(Xt+1∣xt)m1∶t)
Also requires back-pointers for backward pass to retrieve best sequence bXt+1,t+1 = argmaxxt (P(Xt+1∣xt)m1∶t)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
13
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
14
Domain of Xt is {1,...,S}
0.3 0.3 0.7 )
e.g., with U1 =true, O1 = ( 0.9 0.1 0.8 0.2 )
f1∶t+1 = αOt+1T⊺f1∶t bk+1∶t = TOk+1bk+2∶t
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
15
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
16
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
17
e.g., 20 state variables, three parents each DBN has 20×23 =160 parameters, HMM has 220 ×220 ≈ 1012
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
18
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
19
It’s not easy to wreck a nice beach
I.e., choose Words to maximize P(Words∣signal)
P(Words∣signal) = αP(signal∣Words)P(Words) i.e., decomposes into acoustic model + language model
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
20
configuration of articulators (lips, teeth, tongue, vocal cords, air flow)
⇒ acoustic model = pronunciation model + phone model
[iy] beat [b] bet [p] pet [ih] bit [ch] Chet [r] rat [ey] bet [d] debt [s] set [ao] bought [hh] hat [th] thick [ow] boat [hv] high [dh] that [er] Bert [l] let [w] wet [ix] roses [ng] sing [en] button ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ e.g., “ceiling” is [s iy l ih ng] / [s iy l ix ng] / [s iy l en]
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
21
processed into overlapping 30ms frames, each described by features
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
22
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
23
– an integer in [0...255] (using vector quantization); or – the parameters of a mixture of Gaussians
E.g., [t] has silent Onset, explosive Mid, hissing End ⇒ P(features∣phone,phase)
phones to its left and right E.g., [t] in “star” is written [t(s,aa)] (different from “tar”!)
and cannot switch instantaneously between positions E.g., [t] in “eighth” has tongue against front teeth
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
24
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
25
P([towmeytow]∣“tomato”) = P([towmaatow]∣“tomato”) = 0.1 P([tahmeytow]∣“tomato”) = P([tahmaatow]∣“tomato”) = 0.4
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
26
P(word∣e1∶t) = αP(e1∶t∣word)P(word)
P(e1∶t∣word) can be computed recursively: define ֠1∶t =P(Xt,e1∶t) and use the recursive update ֠1∶t+1 = FORWARD(ℓ1∶t,et+1) and then P(e1∶t∣word) = ∑xt ֠1∶t(xt)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
27
– adjacent words highly correlated – sequence of most likely words ≠ most likely sequence of words – segmentation: there are few gaps in speech – cross-word coarticulation—e.g., “next thing”
– mismatch between speaker in training and test – noise – crosstalk – bad microphone position
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
28
P(w1⋯wn) =
n
∏
i=1
P(wi∣w1⋯wi−1)
P(wi∣w1⋯wi−1) ≈ P(wi∣wi−1)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
29
the word we’re in + the phone in that word + the phone state in that phone
each word sequence is the sum over many state sequences
where “step cost” is −log P(wi∣wi−1)
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
30
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
31
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
32
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020
33
– transition modelP(Xt∣Xt−1) – sensor model P(Et∣Xt)
all done recursively with constant cost per time step
for speech recognition
Philipp Koehn Artificial Intelligence: Markov Decision Processes 7 April 2020