Decoding
1continued
Thursday, February 16, 12
Decoding continued 1 Thursday, February 16, 12 Activity Build a - - PowerPoint PPT Presentation
Decoding continued 1 Thursday, February 16, 12 Activity Build a translation model that well use later today. Instructions Subject is mt-class The body has six lines There is one, one- word translation per line 2
continued
Thursday, February 16, 12Build a translation model that we’ll use later today. Instructions
word translation per line
Thursday, February 16, 12ADMINISTRATIVE
3THE STORY SO FAR...
training data (parallel text) learner decoder
However , the sky remained clear under the strong north wind .
model 联合国 安全 理事会 的 五个 常任 理事 国都
Thursday, February 16, 12SCHEDULE
in conception
in practice
programming, pruning
Thursday, February 16, 12DECODING
assign a probability to them?
They still lack experience in international competitions
他们还缺乏国际比 赛的经验.
P = high
(C → E)
Thursday, February 16, 12DECODING
assign a probability to them?
P = low
(C → E)
This is not a good translation of the above sentence.
他们还缺乏国际比 赛的经验.
Thursday, February 16, 12MODEL
P(e | f ) ∝ P(f | e)P(e)
NOISE
[English words] [English words]NOISE
[French words]SPEECH RECOGNITION MACHINE TRANSLATION
Thursday, February 16, 12MODEL TRANSFORMS
P(e | f ) ∝ P(f | e)P(e) ∝ P(f | e)λ1P(e)λ2
Thursday, February 16, 12WEIGHTS
we trust people’s claims differently, we will want to learn how to trust different models
25 50 75 100
Your brother Paul Hamm
credibility “I can do a backflip off this pommel horse”
Thursday, February 16, 12MODEL TRANSFORMS
0.0001 * 0.0001 * 0.0001 = 0.000000000001 log(0.0001) + log(0.0001) + log(0.0001) = -12
11P(e | f ) ∝ P(f | e)P(e) ∝ P(f | e)λ1P(e)λ2 = λ1 log P(f | e) + λ2 log P(e)
Thursday, February 16, 12MODEL TRANSFORMS
P(e | f ) ∝ P(f | e)P(e) ∝ P(f | e)λ1P(e)λ2 = λ1 log P(f | e) + λ2 log P(e) = λ1φ1(f , e) + λ2φ2(f , e) =
λiφi(f , e)
Thursday, February 16, 12MODEL
search
how do we find it?
model
what is a good translation?
e∗, a∗ = argmax
e,a
Pr(e, a | c) =
λ
weight feature function
A better “fundamental equation” for MT
Thursday, February 16, 12DECODING
do we find the sentence that the model likes best?
STACK DECODING
empty hypothesis
FACTORING MODELS
by word
the space we search
16tengo →am
+ =
Thursday, February 16, 12FACTORING MODELS
17Yo → I
tengo →am
tengo→ have
hambre →hungry hambre →hunger
Thursday, February 16, 12FACTORING MODELS
by word
the space we search
this graph, and we accumulate the score as we go
18tengo →am
+ =
Thursday, February 16, 12FACTORING MODELS
translated independently hypothesis.score += PTM(am | tengo)
tengo →am
+ =
hypothesis add word new hypothesis
Thursday, February 16, 12FACTORING MODELS
models depend only on the previous word hypothesis.score += PLM(am | I)
tengo →am
+ =
hypothesis add word new hypothesis
Thursday, February 16, 12DYNAMIC PROGRAMMING
(2) <s> is implicit in the graph structure
tengo →am
+ =
hypothesis add word new hypothesis score += PTM(am | tengo) + PLM(am | I)
Thursday, February 16, 12DYNAMIC PROGRAMMING
... ... ... The score of the new hypothesis is the maximum way to compute it
Thursday, February 16, 12STACK DECODING (WITH DP)
empty hypothesis
IF either (1) no equivalent hypothesis exists
MORE GENERALLY
state:
haven’t translated)
24 Thursday, February 16, 12OLD GRAPH (BEFORE DP)
25 Thursday, February 16, 12PRUNING
, there are still too many hypotheses
beyond some distance from the most probable item in the stack
26 Thursday, February 16, 12STACK DECODING (WITH PRUNING)
hypothesis
(possible replacing an old one)
PITFALLS
28good hypotheses
http://cs.jhu.edu/~post/mt-class/stack-decoder/ Instructions (10 minutes) In groups or alone, find the highest-scoring translation under our model under different stack size and reordering settings. Are there any search or model errors?
Thursday, February 16, 12IMPORTANT CONCEPTS
30NOT DISCUSSED (BUT IMPORTANT)
31