statistical machine translation
play

Statistical Machine Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1 Statistical Machine Translation Machine Translation Automatically translate between languages


  1. Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1

  2. Statistical Machine Translation Machine Translation ● Automatically translate between languages Source Target Taro 太郎が花子を visited 訪問した。 Hanako. ● Real products/services being created! NAIST Travel Conversation Translation System (@AHC Lab) 2

  3. Statistical Machine Translation How does machine translation work? Today I will give a lecture on machine translation . 3

  4. Statistical Machine Translation How does machine translation work? ● Divide sentence into translatable patterns, reorder, combine Today I will give a lecture on machine translation . Today I will give a lecture on machine translation . 今日は、 を行います の講義 機械翻訳 。 Today machine translation a lecture on I will give . 今日は、 の講義 を行います 機械翻訳 。 今日は、機械翻訳の講義を行います。 4

  5. Statistical Machine Translation Problem ● There are millions of possible translations! 花子 が 太郎 に 会った Hanako met Taro Hanako met to Taro Hanako ran in to Taro Taro met Hanako The Hanako met the Taro ● How do we tell which is better? 5

  6. Statistical Machine Translation Statistical Machine Translation ● Translation model: P(“ 今日” |“today”) = high P(“ 今日 は 、” |“today”) = medium P(“ 昨日” |“today”) = low ● Reordering Model: 鶏 を 食べる 鶏 が 食べる 鶏 が 食べる P( )=high P( )=high P( )=low chicken eats eats chicken eats chicken ● Language Model: P( “Taro met Hanako” )=high P( “the Taro met the Hanako” )=high 6

  7. Statistical Machine Translation Creating a Machine Translation System ● Learn patterns from documents Models Documents Translation Model 太郎が花子を訪問した。 Taro visited Hanako. Reordering Model 花子にプレセントを渡した。 He gave Hanako a present. ... Language Model United Nations Text (English/French/Chinese/Arabic ...) Yomiuri Shimbun, Wikipedia Text 7 (Japanese/English)

  8. Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ 4種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテー お米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 8

  9. Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ 4種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテー お米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 9

  10. Statistical Machine Translation Steps in Training a Phrase-based SMT System ● Collecting Data ● Tokenization ● Language Modeling ● Alignment ● Phrase Extraction/Scoring ● Reordering Models ● Decoding ● Evaluation ● Tuning

  11. Statistical Machine Translation Collecting Data ● Sentence parallel data ● Used in: Translation model/Reordering model これはペンです。 This is a pen. 昨日は友達と食べた。 I ate with my friend yesterday. 象は花が長い。 Elephants' trunks are long. ● Monolingual data (in the target language) ● Used in: Language model This is a pen. I ate with my friend yesterday. Elephants' trunks are long.

  12. Statistical Machine Translation Good Data is ● Big! → Translation Accuracy LM Data Size (Million Words) [Brants 2007] ● Clean ● In the same domain as test data

  13. Statistical Machine Translation Collecting Data ● High quality parallel data from: ● Government organizations ● Newspapers ● Patents ● Crawl the web ● Merge several data sources

  14. Statistical Machine Translation Finding Data on the Web ● Find bilingual pages [Resnik 03] [Image: Mainichi Shimbun]

  15. Statistical Machine Translation Finding Data on the Web ● Finding bilingual pages [Resnik 03] ● Sentence alignment [Moore 02]

  16. Statistical Machine Translation Question 1: ● Write down three candidates for sources of parallel data in English-Japanese, or some other language pair you are familiar with. ● They should all be of different genres.

  17. Statistical Machine Translation Tokenization ● Example: Divide Japanese into words 太郎が花子を訪問した。 太郎 が 花子 を 訪問 した 。 ● Example: Make English lowercase, split punctuation Taro visited Hanako. taro visited hanako .

  18. Statistical Machine Translation Tokenization is Important! ● Just Right: Can translate properly taro ○ 太郎 が 太郎 を taro ○ ● Too Long: Cannot translate if not in training data taro ○ 太郎が In Data 太郎を 太郎を ☓ Not in Data ● Too Short: May mistranslate fat ro ☓ 太 郎 が 太 郎 を fat ro ☓ 18

  19. Statistical Machine Translation Language Modeling ● Assign a probability to each sentence E1: Taro visited Hanako P(E1) E2: the Taro visited the Hanako LM P(E2) E3: Taro visited the bibliography P(E3) ● More fluent sentences get higher probability P(E1) > P(E2) P(E1) > P(E3)

  20. Statistical Machine Translation n-gram Models ● We want the probability of P(W = “Taro visited Hanako”) ● n-gram model calculates one word at a time ● Condition on n-1 previous words e.g. 2-gram model P(w 1 =“Taro”) * P(w 2 =”visited” | w 1 =“Taro”) * P(w 3 =”Hanako” | w 2 =”visited”) * P(w 4 =”</s>” | w 3 =”Hanako”) NOTE: sentence ending symbol </s> 20

  21. Statistical Machine Translation Calculating n-gram Models ● n-gram models are estimated from data: P ( w i ∣ w i − n + 1 … w i − 1 )= c ( w i − n + 1 … w i ) c ( w i − n + 1 … w i − 1 ) i live in osaka . </s> i am a graduate student . </s> my school is in nara . </s> P(osaka | in) = c(in osaka)/c(in) = 1 / 2 = 0.5 n=2 → P(nara | in) = c(in nara)/c(in) = 1 / 2 = 0.5 21

  22. Statistical Machine Translation Question 2: ● Calculate the 2-gram probabilities of the n-grams on the worksheet. 22

  23. Statistical Machine Translation Alignment ● Find which words correspond to each-other 太郎 が 花子 を 訪問 した 。 太郎 が 花子 を 訪問 した 。 taro visited hanako . taro visited hanako . ● Done automatically with probabilistic methods 日本語 日本語 日本語 日本語 日本語 日本語 P( 花子 |hanako) = 0.99 日本語 日本語 日本語 日本語 日本語 日本語 P( 太郎 |taro) = 0.97 P(visited| 訪問 ) = 0.46 English English P(visited| した ) = 0.04 English English English English English P( 花子 |taro) = 0.0001 English English English English English 23

  24. Statistical Machine Translation IBM/HMM Models ● One-to-many alignment model ホテル の 受付 the hotel front desk X X the hotel front desk ホテル の 受付 ● IBM Model 1: No structure (“bag of words”) ● IBM Models 2-5, HMM: Add more structure 24

  25. Statistical Machine Translation Combining One-to-Many Alignments ホテル の 受付 the hotel front desk X X the hotel front desk ホテル の 受付 Combine the hotel front desk ホテル の 受付 ● Several different heuristics 25

  26. Statistical Machine Translation Phrase Extraction ● Use alignments to find phrase pairs ホ テ 受 ホテル の → hotel ルの付 ホテル の → the hotel the 受付 → front desk hotel ホテルの受付 → hotel front desk front ホテルの受付 → the hotel front desk desk

  27. Statistical Machine Translation Phrase Extraction Criterion ● Must have ● 1) one alignment inside the phrase ● 2) no alignments outside and in the same row/column ホ テ 受 OK! No alignments inside ルの付 the “ の” outside hotel front desk

  28. Statistical Machine Translation Question 3: ● Given the alignments on the work sheet, which phrases will be extracted by the machine translation system? 28

  29. Statistical Machine Translation Phrase Scoring ● Calculate 5 standard features ● Phrase Translation Probabilities: P( f | e ) = c( f , e )/c( e ) P( e | f ) = c( f , e )/c( f ) e.g. c( ホテル の , the hotel) / c(the hotel) ● Lexical Translation Probabilities – Use word-based translation probabilities (IBM Model 1) – Helps with sparsity P(f|e) = Π f 1/| e | ∑ e P(f|e) e.g. (P( ホテル |the)+P( ホテル |hotel))/2 * (P( の |the)+P( の |hotel))/2 ● Phrase penalty: 1 for each phrase

  30. Statistical Machine Translation Lexicalized Reordering ● Probability of monotone, swap, discontinuous 細 太 訪し い男が郎を問た the thin mono disc. man visited Taro swap 細い → the thin 太郎 を → Taro high monotone probability high swap probability ● Conditioning on input/output, left/right, or both

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend