1
Machine Translation Machine Translation
Berlin Chen 2003
References: 1. Natural Language Understanding, chapter 13 2.
- W. A. Gale and K. W. Church, A Program for Aligning Sentences in
Bilingual Corpora, Computational Linguistics 1993
Machine Translation Machine Translation Berlin Chen 2003 - - PowerPoint PPT Presentation
Machine Translation Machine Translation Berlin Chen 2003 References: 1. Natural Language Understanding, chapter 13 2. W. A. Gale and K. W. Church, A Program for Aligning Sentences in Bilingual Corpora, Computational Linguistics 1993 1
1
References: 1. Natural Language Understanding, chapter 13 2.
Bilingual Corpora, Computational Linguistics 1993
2
3
4
English Text (word string) French Text (word string) English (syntactic parse) French (syntactic parse) English (semantic representation) French (semantic representation) Interlingua (knowledge representation) word-for-word syntactic transfer semantic transfer knowledge-based translation
5
1950
6
N V Adv
7
8
9
10
with less literal translation
11
12
2:2 alignment 1:1 alignment 1:1 alignment 2:1 alignment
13
s1 s2 s3 s4 . . . sI t1 t2 t3 t4 . . . . tJ
14
Most cases are 1:1 alignments.
15
B1 B2 B3 Bk
k K k k A A
2 1 1
=
source target
{1:1, 1:0, 0:1, 2:1,1:2, 2:2,…} a bead
I
2 1
J
2 1
probability independence between beads
16
2 2 1 2 2 1 2 1
square difference of two paragraphs is a normal distribution
2 1 1 2 2 2 1
Ratio of texts in two languages µ =
1 2
L L
Bayes’ Law
2 2 1
k
The prob. distribution
distribution
17
Source Target si-1 si si-2 tj tj-1 tj-2
( ) ( )
( )
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
( )
+ − − + − − + − − + − − + − + − =
− − − − j j i i j i i j j i j i i j
t t s s j i D t s s j i D t t s j i D t s j i D s j i D t j i D j i D , , , align 2 : 2 cost 2 , 2 , , align 1 : 2 cost 1 , 2 , , align 2 : 1 cost 2 , 1 , align 1 : 1 cost 1 , 1 , align : 1 cost , 1 , align 1 : cost 1 , ,
1 1 1 1
φ φ
Or P(α align)
18
L1 alignment 2 L1 alignment 1
cost(align(s1, t1)) + cost(align(s2, t2)) + cost(align(s3,Ø)) + cost(align(s4, t3)) cost(align(s1, s2, t1)) + cost(align(s3, t2)) + cost(align(s4, t3))
19
20
21
22
23
Iterations
24
25
1 2 1
k
The translation model
=
K k k A
1
26
27
28
Matched n-grams Source Text Target Text
29
30
Matched word pairs Source Text Target Text
31
32
33
Language Model Translation Model Decoder
e P
e f P
( )
f e P e
e
max arg ˆ =
e: English f: French
j
f
k
f
j
a
e
k
a
e
= = =
l a l a m j a j
m j
1
|e|=l |f|=m
normalization constant all possible alignments translation probability
34
∈ ∈
f w e w f e e f w w
f e e f
, s.t. , ,
Number of times that occurred in the English sentences while in the corresponding French sentences
e
w
f
w
v w v w w e f
e e f
, ,