Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for - - PowerPoint PPT Presentation
Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for - - PowerPoint PPT Presentation
Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task Translation Task Kyoto University Toshiaki Nakazawa Toshiaki Nakazawa Sadao Kurohashi Sadao Kurohashi Overview of Kyoto-U System Overview of
Overview of Kyoto-U System Overview of Kyoto U System
Translation Examples
J: 図書館で新聞を読む E: I read a newspaper in the library E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf E: A book in politics was left on the shelf
・・・・・
Overview of Kyoto-U System Overview of Kyoto U System
Translation Examples 図書館 で 新聞 を I read
library in
新聞 を 読む a newspaper in the library
newspaper ACC read
政治 の a book 本 が 売れ残って いる 政治 の a book in politics was left
politics in book NOM
売れ残って いる
- n the shelf
left unsold
・・・・・ ・・・・・
Overview of Kyoto-U System
Input: 書館 政治
Overview of Kyoto U System
Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で
library in
I read 政治 の a book 本 を 読む 政治 の
book ACC politics in library in
a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む
read
in the library Output: I read a book 売れ残って いる
- n the shelf
in politics in the library ・・・・・ ・・・・・
Alignment Alignment
Alignment Alignment
J: 交差点で 突然あの車が E The car came at me from J: 交差点で、突然あの車が 飛び出して来たのです。 E:The car came at me from the side at the intersection.
Alignment Alignment
交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection
- 1. Transformation into dependency structure
J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree E: Charniak s nlparser → Dependency tree
Alignment Alignment
交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection
- 1. Transformation into dependency structure
- 2. Detection of word(s) correspondences
Finding Correspondences Finding Correspondences
- Bilingual dictionaries (500K entries)
g ( )
- Substring co-occurrence (Cromieres 2006)
) ( θ > ⋅ ) ( ) ( ) , ( e count j count e j count
- Numeral normalization
二百十六万 2 160 000 2 16 million
) ( ) ( j
二百十六万 → 2,160,000 ← 2.16 million
- Transliteration (Katakana words, NEs)
ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
Alignment Alignment
交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection
- 1. Transformation into dependency structure
- 2. Detection of word(s) correspondences
- 3. Disambiguation of correspondences
Alignment Alignment
交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection
- 1. Transformation into dependency structure
- 2. Detection of word(s) correspondences
- 3. Disambiguation of correspondences
- 4. Handling of remaining phrases
Extension to leaf-nodes
Alignment Alignment
交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection
- 1. Transformation into dependency structure
- 2. Detection of word(s) correspondences
- 3. Disambiguation of correspondences
- 4. Handling of remaining phrases
- 5. Registration to translation example database
Alignment Ambiguities Alignment Ambiguities
日本 で you will have to file 日本 で 保険
[in Japan]
will have to file insurance 保険 会社 に 対して
[insurance]
an claim 保険
[insurance] [to the company]
insurance 請求 の
[insurance] [of claim]
with the office 申し立て が
[ ] [file]
in Japan 可能です よ
[be able to]
Alignment: Consistency Alignment: Consistency
Near Far Far
( )
) ( ) (
∑ ∑
d d
n n
( )
2 / ) 1 ( ) , ( ), , ( max arg
1 1
−
∑ ∑
= + =
n n a a d a a d cs
n i n i j j i E j i J alignment
- For each pair of candidates ai and aj
2 / ) 1 (n n
alignment
For each pair of candidates ai and aj calculate the J-side distance dJ and the E-side distance dE
- Give a consistency score to the pair based
- Give a consistency score to the pair based
- n dJ and dE
- Calculate consistency scores for all the pairs
in a possible set of alignment candidates p g
Baseline Baseline
Distance of Each Branch: 1 Distance of Each Branch: 1
( )
1 1
Consistency Score:
( )
E J E J
d d d d cs 1 1 , + =
… … 1/1+1/2=1 5 … 1/1+1/2=1.5
Consistency Score Consistency Score
- The frequency of distance pair in gold-standard
li t d t (M i i hi 40K alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]
Frequency (log) (log) Dist of J-Side Dist of E-Side
Distance based on Dependency Type Distance based on Dependency Type
you 日本 で
デ格 NP 3 3
y will have to file 保険
文節内 1
[in Japan] [i ]
insurance 会社 に 対して
連用 NN 3 3 1
[insurance] [to the company]
an claim 保険
文節内 NP 1 2 3 1
[insurance]
insurance with the office 請求 の 申し立て が
ノ格 ガ格 PP NN 2 3 3
[of claim]
with the office in Japan 申し立て が 可能です よ
ガ格 PP PP 3
[file]
p 可能です よ
[be able to]
Distance based on Dependency Type
you 日本 で
デ格 NP 3 3
Distance based on Dependency Type
y will have to file 保険
文節内 1
[in Japan] [i ]
insurance 会社 に 対して
連用 NN 3 3 1
[insurance] [to the company]
an claim 保険
文節内 NP 1 2 3 1
[insurance]
insurance with the office 請求 の 申し立て が
ノ格 ガ格 PP NN 2 3 3
[of claim]
with the office in Japan 申し立て が 可能です よ
ガ格 PP PP 3
[file]
p 可能です よ
[be able to]
Distance based on Dependency Type
you 日本 で
3 3 デ格 NP
Distance based on Dependency Type
y will have to file 保険
1 文節内
[in Japan] [i ]
insurance 会社 に 対して
3 1 連用 NN 3
[insurance] [to the company]
an claim 保険
1 2 1 文節内 NP 3
[insurance]
insurance with the office 請求 の 申し立て が
2 3 3 ノ格 ガ格 PP NN
[of claim]
with the office in Japan 申し立て が 可能です よ
3 ガ格 PP PP
[file]
p 可能です よ
[be able to]
Example of Alignment I t Improvement
Proposed model Word base alignment Proposed model Word-base alignment
Translation Translation
Translation
Input: 書館 政治
Translation
Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で
library in
I read 政治 の a book 本 を 読む 政治 の
book ACC politics in library in
a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む
read
in the library Output: I read a book 売れ残って いる
- n the shelf
in politics in the library ・・・・・ ・・・・・
Selection of Translation Examples Selection of Translation Examples
- Score for an example
- 1. Size of an example
2 Si il it f i hb i d
[Sato 91]
- 2. Similarity of neighboring nodes
3 Translation probability
- 3. Translation probability
- Beam search from the root of the input
Beam search from the root of the input
I read a ne spaper Input: a newspaper in the library Translation example: 図書館 で 政治 の
politics in library in
I read 新聞 を 図書館 で 本 を 読む
read book ACC p
読む a newspaper in the library I study
2
0.7
2 ×
size
w
a newspaper in the library
7 . × +
sim
w 3 2 × +
trans
w
Combination of TMs
Input: 書館 政治
Combination of TMs
Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で
library in
I read 政治 の a book 本 を 読む 政治 の
book ACC politics in library in
a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む
read
in the library 売れ残って いる
- n the shelf
・・・・・ ・・・・・
Input:記録領域での変形形状と,記録特性の関係を調べた。 Input Output Dependency Tree
┌ 状況 を 調べた 。 ┌ the situation was examined
Translation Examples
┌ 記録 領域 ┌ the relationship ││ ┌ deformation ││┌ shape and
Input Dependency Tree
調 た 。 was examined ┌ 相互 ┌ 作用 と │┌ 記録 ┌ the relationship ││┌ interaction and ┌ 領域 で の ├ 変形 ┌ 形状 と , │ ┌ 記録 p │││ │ ┌ recording │││ └ in the region ││├ recording │└ between characteristics │┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ││├ recording │└ between characteristics was investigated │ ┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 │└ between characteristics was examined ┌ 大変 ┌ 形 ┌ 領域 で の ┌ cross-sectional ┌ shape ││ ┌ large ││ deformation ┌ ├ 断面 ┌ 形状 を 模擬 した ││┌ deformation │└ in the region was └ simulated
Output: The relationship
┌ 記録 領域 の ┌ recording
- f the areas
変形 d f ti
between deformation shape in the recording region and recording
┌ 変形 パターン を ┌ deformation the pattern
region and recording characteristics was examined .
E l ti R lt Evaluation Results and and Discussion
Intrinsic J-E Evaluation Result
BLEU Adequacy Fluency Average 27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt 27 14 moses 3 71 Japio 3 94 tsbmt 3 86 Japio 27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio 27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT 25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT 24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses 24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U 23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT 22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori 21 57 Kyoto U 2 58 tori 3 28 HIT2 2 94 KLE 21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE 19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2 19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR 19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel 15.90 TH 1.87 TH 2.42
FDU-MCandWI
2.13 TH 9.55
FDU-MCandWI
1.75
FDU-MCandWI
2.39 TH 2.08
FDU-MCandWI
1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU
Intrinsic E-J Intrinsic E J Evaluation Result
BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29 15 NICT ATR 2 90 moses 3 67 tsbmt 3 30 moses 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR y 17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U
Critical Defect in EJ Translation
- Not caring whether a child node is a pre-
Critical Defect in EJ Translation
g p child or post-child
– Resulting target structure goes wrong Resulting target structure goes wrong
- After resolving this defect, BLEU score in
EJ t l ti t 24 02 f 22 65 EJ translation rose to 24.02 from 22.65
BLEU Adequacy Fluency Average BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 29.15 NICT ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR
24.02
? ? ?
17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U
Conclusion
- Kyoto-U Fully Syntactic EBMT system:
Conclusion
y y y y
- 1. Alignment: Consistency
2 Alignment: Extension
- 2. Alignment: Extension
- 3. Translation: Discontinuous example
- 4. Translation: Easy combination
- By using syntactic information, we could
y g y , achieve reasonably high quality translation For patent translation we may need some
- For patent translation, we may need some