Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for - - PowerPoint PPT Presentation

kyoto u syntactical ebmt system for ntcir 7 patent system
SMART_READER_LITE
LIVE PREVIEW

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for - - PowerPoint PPT Presentation

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task Translation Task Kyoto University Toshiaki Nakazawa Toshiaki Nakazawa Sadao Kurohashi Sadao Kurohashi Overview of Kyoto-U System Overview of


slide-1
SLIDE 1

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task Translation Task

Kyoto University Toshiaki Nakazawa Sadao Kurohashi Toshiaki Nakazawa Sadao Kurohashi

slide-2
SLIDE 2

Overview of Kyoto-U System Overview of Kyoto U System

Translation Examples

J: 図書館で新聞を読む E: I read a newspaper in the library E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf E: A book in politics was left on the shelf

・・・・・

slide-3
SLIDE 3

Overview of Kyoto-U System Overview of Kyoto U System

Translation Examples 図書館 で 新聞 を I read

library in

新聞 を 読む a newspaper in the library

newspaper ACC read

政治 の a book 本 が 売れ残って いる 政治 の a book in politics was left

politics in book NOM

売れ残って いる

  • n the shelf

left unsold

・・・・・ ・・・・・

slide-4
SLIDE 4

Overview of Kyoto-U System

Input: 書館 政治

Overview of Kyoto U System

Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で

library in

I read 政治 の a book 本 を 読む 政治 の

book ACC politics in library in

a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む

read

in the library Output: I read a book 売れ残って いる

  • n the shelf

in politics in the library ・・・・・ ・・・・・

slide-5
SLIDE 5

Alignment Alignment

slide-6
SLIDE 6

Alignment Alignment

J: 交差点で 突然あの車が E The car came at me from J: 交差点で、突然あの車が 飛び出して来たのです。 E:The car came at me from the side at the intersection.

slide-7
SLIDE 7

Alignment Alignment

交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection

  • 1. Transformation into dependency structure

J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree E: Charniak s nlparser → Dependency tree

slide-8
SLIDE 8

Alignment Alignment

交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection

  • 1. Transformation into dependency structure
  • 2. Detection of word(s) correspondences
slide-9
SLIDE 9

Finding Correspondences Finding Correspondences

  • Bilingual dictionaries (500K entries)

g ( )

  • Substring co-occurrence (Cromieres 2006)

) ( θ > ⋅ ) ( ) ( ) , ( e count j count e j count

  • Numeral normalization

二百十六万 2 160 000 2 16 million

) ( ) ( j

二百十六万 → 2,160,000 ← 2.16 million

  • Transliteration (Katakana words, NEs)

ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

slide-10
SLIDE 10

Alignment Alignment

交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection

  • 1. Transformation into dependency structure
  • 2. Detection of word(s) correspondences
  • 3. Disambiguation of correspondences
slide-11
SLIDE 11

Alignment Alignment

交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection

  • 1. Transformation into dependency structure
  • 2. Detection of word(s) correspondences
  • 3. Disambiguation of correspondences
  • 4. Handling of remaining phrases

Extension to leaf-nodes

slide-12
SLIDE 12

Alignment Alignment

交差 点 で 、 突然 the car came 突然 あの 車 が at me from the side t th i t ti 飛び出して 来た のです at the intersection

  • 1. Transformation into dependency structure
  • 2. Detection of word(s) correspondences
  • 3. Disambiguation of correspondences
  • 4. Handling of remaining phrases
  • 5. Registration to translation example database
slide-13
SLIDE 13

Alignment Ambiguities Alignment Ambiguities

日本 で you will have to file 日本 で 保険

[in Japan]

will have to file insurance 保険 会社 に 対して

[insurance]

an claim 保険

[insurance] [to the company]

insurance 請求 の

[insurance] [of claim]

with the office 申し立て が

[ ] [file]

in Japan 可能です よ

[be able to]

slide-14
SLIDE 14

Alignment: Consistency Alignment: Consistency

Near Far Far

slide-15
SLIDE 15

( )

) ( ) (

∑ ∑

d d

n n

( )

2 / ) 1 ( ) , ( ), , ( max arg

1 1

∑ ∑

= + =

n n a a d a a d cs

n i n i j j i E j i J alignment

  • For each pair of candidates ai and aj

2 / ) 1 (n n

alignment

For each pair of candidates ai and aj calculate the J-side distance dJ and the E-side distance dE

  • Give a consistency score to the pair based
  • Give a consistency score to the pair based
  • n dJ and dE
  • Calculate consistency scores for all the pairs

in a possible set of alignment candidates p g

slide-16
SLIDE 16

Baseline Baseline

Distance of Each Branch: 1 Distance of Each Branch: 1

( )

1 1

Consistency Score:

( )

E J E J

d d d d cs 1 1 , + =

… … 1/1+1/2=1 5 … 1/1+1/2=1.5

slide-17
SLIDE 17

Consistency Score Consistency Score

  • The frequency of distance pair in gold-standard

li t d t (M i i hi 40K alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]

Frequency (log) (log) Dist of J-Side Dist of E-Side

slide-18
SLIDE 18

Distance based on Dependency Type Distance based on Dependency Type

you 日本 で

デ格 NP 3 3

y will have to file 保険

文節内 1

[in Japan] [i ]

insurance 会社 に 対して

連用 NN 3 3 1

[insurance] [to the company]

an claim 保険

文節内 NP 1 2 3 1

[insurance]

insurance with the office 請求 の 申し立て が

ノ格 ガ格 PP NN 2 3 3

[of claim]

with the office in Japan 申し立て が 可能です よ

ガ格 PP PP 3

[file]

p 可能です よ

[be able to]

slide-19
SLIDE 19

Distance based on Dependency Type

you 日本 で

デ格 NP 3 3

Distance based on Dependency Type

y will have to file 保険

文節内 1

[in Japan] [i ]

insurance 会社 に 対して

連用 NN 3 3 1

[insurance] [to the company]

an claim 保険

文節内 NP 1 2 3 1

[insurance]

insurance with the office 請求 の 申し立て が

ノ格 ガ格 PP NN 2 3 3

[of claim]

with the office in Japan 申し立て が 可能です よ

ガ格 PP PP 3

[file]

p 可能です よ

[be able to]

slide-20
SLIDE 20

Distance based on Dependency Type

you 日本 で

3 3 デ格 NP

Distance based on Dependency Type

y will have to file 保険

1 文節内

[in Japan] [i ]

insurance 会社 に 対して

3 1 連用 NN 3

[insurance] [to the company]

an claim 保険

1 2 1 文節内 NP 3

[insurance]

insurance with the office 請求 の 申し立て が

2 3 3 ノ格 ガ格 PP NN

[of claim]

with the office in Japan 申し立て が 可能です よ

3 ガ格 PP PP

[file]

p 可能です よ

[be able to]

slide-21
SLIDE 21

Example of Alignment I t Improvement

Proposed model Word base alignment Proposed model Word-base alignment

slide-22
SLIDE 22

Translation Translation

slide-23
SLIDE 23

Translation

Input: 書館 政治

Translation

Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で

library in

I read 政治 の a book 本 を 読む 政治 の

book ACC politics in library in

a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む

read

in the library Output: I read a book 売れ残って いる

  • n the shelf

in politics in the library ・・・・・ ・・・・・

slide-24
SLIDE 24

Selection of Translation Examples Selection of Translation Examples

  • Score for an example
  • 1. Size of an example

2 Si il it f i hb i d

[Sato 91]

  • 2. Similarity of neighboring nodes

3 Translation probability

  • 3. Translation probability
  • Beam search from the root of the input

Beam search from the root of the input

slide-25
SLIDE 25

I read a ne spaper Input: a newspaper in the library Translation example: 図書館 で 政治 の

politics in library in

I read 新聞 を 図書館 で 本 を 読む

read book ACC p

読む a newspaper in the library I study

2

0.7

2 ×

size

w

a newspaper in the library

7 . × +

sim

w 3 2 × +

trans

w

slide-26
SLIDE 26

Combination of TMs

Input: 書館 政治

Combination of TMs

Translation Examples 図書館で政治の 本を読む。 図書館 で 新聞 を I read 新聞 を 読む a newspaper in the library 図書館 で

library in

I read 政治 の a book 本 を 読む 政治 の

book ACC politics in library in

a book in politics 本 が 売れ残って いる 政治 の a book in politics was left 読む

read

in the library 売れ残って いる

  • n the shelf

・・・・・ ・・・・・

slide-27
SLIDE 27

Input:記録領域での変形形状と,記録特性の関係を調べた。 Input Output Dependency Tree

┌ 状況 を 調べた 。 ┌ the situation was examined

Translation Examples

┌ 記録 領域 ┌ the relationship ││ ┌ deformation ││┌ shape and

Input Dependency Tree

調 た 。 was examined ┌ 相互 ┌ 作用 と │┌ 記録 ┌ the relationship ││┌ interaction and ┌ 領域 で の ├ 変形 ┌ 形状 と , │ ┌ 記録 p │││ │ ┌ recording │││ └ in the region ││├ recording │└ between characteristics │┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ││├ recording │└ between characteristics was investigated │ ┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 │└ between characteristics was examined ┌ 大変 ┌ 形 ┌ 領域 で の ┌ cross-sectional ┌ shape ││ ┌ large ││ deformation ┌ ├ 断面 ┌ 形状 を 模擬 した ││┌ deformation │└ in the region was └ simulated

Output: The relationship

┌ 記録 領域 の ┌ recording

  • f the areas

変形 d f ti

between deformation shape in the recording region and recording

┌ 変形 パターン を ┌ deformation the pattern

region and recording characteristics was examined .

slide-28
SLIDE 28

E l ti R lt Evaluation Results and and Discussion

slide-29
SLIDE 29

Intrinsic J-E Evaluation Result

BLEU Adequacy Fluency Average 27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt 27 14 moses 3 71 Japio 3 94 tsbmt 3 86 Japio 27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio 27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT 25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT 24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses 24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U 23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT 22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori 21 57 Kyoto U 2 58 tori 3 28 HIT2 2 94 KLE 21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE 19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2 19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR 19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel 15.90 TH 1.87 TH 2.42

FDU-MCandWI

2.13 TH 9.55

FDU-MCandWI

1.75

FDU-MCandWI

2.39 TH 2.08

FDU-MCandWI

1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU

slide-30
SLIDE 30

Intrinsic E-J Intrinsic E J Evaluation Result

BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29 15 NICT ATR 2 90 moses 3 67 tsbmt 3 30 moses 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR y 17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

slide-31
SLIDE 31

Critical Defect in EJ Translation

  • Not caring whether a child node is a pre-

Critical Defect in EJ Translation

g p child or post-child

– Resulting target structure goes wrong Resulting target structure goes wrong

  • After resolving this defect, BLEU score in

EJ t l ti t 24 02 f 22 65 EJ translation rose to 24.02 from 22.65

BLEU Adequacy Fluency Average BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 29.15 NICT ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR

24.02

? ? ?

17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

slide-32
SLIDE 32

Conclusion

  • Kyoto-U Fully Syntactic EBMT system:

Conclusion

y y y y

  • 1. Alignment: Consistency

2 Alignment: Extension

  • 2. Alignment: Extension
  • 3. Translation: Discontinuous example
  • 4. Translation: Easy combination
  • By using syntactic information, we could

y g y , achieve reasonably high quality translation For patent translation we may need some

  • For patent translation, we may need some

pre-processings to handle special expressions which cause parsing errors