Statistical Machine Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1

Statistical Machine Translation Machine Translation ● Automatically translate between languages Source Target Taro 太郎が花子を visited 訪問した。 Hanako. ● Real products/services being created! NAIST Travel Conversation Translation System (@AHC Lab) 2

Statistical Machine Translation How does machine translation work? Today I will give a lecture on machine translation . 3

Statistical Machine Translation How does machine translation work? ● Divide sentence into translatable patterns, reorder, combine Today I will give a lecture on machine translation . Today I will give a lecture on machine translation . 今日は、を行いますの講義機械翻訳。 Today machine translation a lecture on I will give . 今日は、の講義を行います機械翻訳。今日は、機械翻訳の講義を行います。 4

Statistical Machine Translation Problem ● There are millions of possible translations! 花子が太郎に会った Hanako met Taro Hanako met to Taro Hanako ran in to Taro Taro met Hanako The Hanako met the Taro ● How do we tell which is better? 5

Statistical Machine Translation Statistical Machine Translation ● Translation model: P(“ 今日” |“today”) = high P(“ 今日は、” |“today”) = medium P(“ 昨日” |“today”) = low ● Reordering Model: 鶏を食べる鶏が食べる鶏が食べる P( )=high P( )=high P( )=low chicken eats eats chicken eats chicken ● Language Model: P( “Taro met Hanako” )=high P( “the Taro met the Hanako” )=high 6

Statistical Machine Translation Creating a Machine Translation System ● Learn patterns from documents Models Documents Translation Model 太郎が花子を訪問した。 Taro visited Hanako. Reordering Model 花子にプレセントを渡した。 He gave Hanako a present. ... Language Model United Nations Text (English/French/Chinese/Arabic ...) Yomiuri Shimbun, Wikipedia Text 7 (Japanese/English)

Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ４種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテーお米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 8

Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ４種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテーお米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 9

Statistical Machine Translation Steps in Training a Phrase-based SMT System ● Collecting Data ● Tokenization ● Language Modeling ● Alignment ● Phrase Extraction/Scoring ● Reordering Models ● Decoding ● Evaluation ● Tuning

Statistical Machine Translation Collecting Data ● Sentence parallel data ● Used in: Translation model/Reordering model これはペンです。 This is a pen. 昨日は友達と食べた。 I ate with my friend yesterday. 象は花が長い。 Elephants' trunks are long. ● Monolingual data (in the target language) ● Used in: Language model This is a pen. I ate with my friend yesterday. Elephants' trunks are long.

Statistical Machine Translation Good Data is ● Big! → Translation Accuracy LM Data Size (Million Words) [Brants 2007] ● Clean ● In the same domain as test data

Statistical Machine Translation Collecting Data ● High quality parallel data from: ● Government organizations ● Newspapers ● Patents ● Crawl the web ● Merge several data sources

Statistical Machine Translation Finding Data on the Web ● Find bilingual pages [Resnik 03] [Image: Mainichi Shimbun]

Statistical Machine Translation Finding Data on the Web ● Finding bilingual pages [Resnik 03] ● Sentence alignment [Moore 02]

Statistical Machine Translation Question 1: ● Write down three candidates for sources of parallel data in English-Japanese, or some other language pair you are familiar with. ● They should all be of different genres.

Statistical Machine Translation Tokenization ● Example: Divide Japanese into words 太郎が花子を訪問した。太郎が花子を訪問した。 ● Example: Make English lowercase, split punctuation Taro visited Hanako. taro visited hanako .

Statistical Machine Translation Tokenization is Important! ● Just Right: Can translate properly taro ○ 太郎が太郎を taro ○ ● Too Long: Cannot translate if not in training data taro ○ 太郎が In Data 太郎を太郎を ☓ Not in Data ● Too Short: May mistranslate fat ro ☓ 太郎が太郎を fat ro ☓ 18

Statistical Machine Translation Language Modeling ● Assign a probability to each sentence E1: Taro visited Hanako P(E1) E2: the Taro visited the Hanako LM P(E2) E3: Taro visited the bibliography P(E3) ● More fluent sentences get higher probability P(E1) > P(E2) P(E1) > P(E3)

Statistical Machine Translation n-gram Models ● We want the probability of P(W = “Taro visited Hanako”) ● n-gram model calculates one word at a time ● Condition on n-1 previous words e.g. 2-gram model P(w 1 =“Taro”) * P(w 2 =”visited” | w 1 =“Taro”) * P(w 3 =”Hanako” | w 2 =”visited”) * P(w 4 =”</s>” | w 3 =”Hanako”) NOTE: sentence ending symbol </s> 20

Statistical Machine Translation Calculating n-gram Models ● n-gram models are estimated from data: P ( w i ∣ w i − n + 1 … w i − 1 )= c ( w i − n + 1 … w i ) c ( w i − n + 1 … w i − 1 ) i live in osaka . </s> i am a graduate student . </s> my school is in nara . </s> P(osaka | in) = c(in osaka)/c(in) = 1 / 2 = 0.5 n=2 → P(nara | in) = c(in nara)/c(in) = 1 / 2 = 0.5 21

Statistical Machine Translation Question 2: ● Calculate the 2-gram probabilities of the n-grams on the worksheet. 22

Statistical Machine Translation Alignment ● Find which words correspond to each-other 太郎が花子を訪問した。太郎が花子を訪問した。 taro visited hanako . taro visited hanako . ● Done automatically with probabilistic methods 日本語日本語日本語日本語日本語日本語 P( 花子 |hanako) = 0.99 日本語日本語日本語日本語日本語日本語 P( 太郎 |taro) = 0.97 P(visited| 訪問 ) = 0.46 English English P(visited| した ) = 0.04 English English English English English P( 花子 |taro) = 0.0001 English English English English English 23

Statistical Machine Translation IBM/HMM Models ● One-to-many alignment model ホテルの受付 the hotel front desk X X the hotel front desk ホテルの受付 ● IBM Model 1: No structure (“bag of words”) ● IBM Models 2-5, HMM: Add more structure 24

Statistical Machine Translation Combining One-to-Many Alignments ホテルの受付 the hotel front desk X X the hotel front desk ホテルの受付 Combine the hotel front desk ホテルの受付 ● Several different heuristics 25

Statistical Machine Translation Phrase Extraction ● Use alignments to find phrase pairs ホテ受ホテルの → hotel ルの付ホテルの → the hotel the 受付 → front desk hotel ホテルの受付 → hotel front desk front ホテルの受付 → the hotel front desk desk

Statistical Machine Translation Phrase Extraction Criterion ● Must have ● 1) one alignment inside the phrase ● 2) no alignments outside and in the same row/column ホテ受 OK! No alignments inside ルの付 the “ の” outside hotel front desk

Statistical Machine Translation Question 3: ● Given the alignments on the work sheet, which phrases will be extracted by the machine translation system? 28

Statistical Machine Translation Lexicalized Reordering ● Probability of monotone, swap, discontinuous 細太訪しい男が郎を問た the thin mono disc. man visited Taro swap 細い → the thin 太郎を → Taro high monotone probability high swap probability ● Conditioning on input/output, left/right, or both

Statistical Machine Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1 Statistical Machine Translation Machine Translation Automatically translate between languages

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

A software processing chain for evaluating thesaurus quality Javier Lacasta, Gilles Falquet,

and Motion Planning Introduction Dan Halperin School of Computer Science Fall 2019-2020 Tel

Modeling Dynamic ynamic E Engineering ngineering Design esign P Processes in PSI rocesses

Semantics for Practitioners Lessons from the W3C/OGC Spatial Data on the Web Working Group Image:

Mining for Structure Massive increase in both computational power and the amount of data

Women and Logic in the Middle Ages Dr. Sara L. Uckelman s.l.uckelman@durham.ac.uk @SaraLUckelman

Zero entropy systems Dominique Perrin May 12, 2016 Dominique Perrin Zero entropy systems May

Feature-based Modelling and Information Systems in Engineering Emilio Sanfilippo and Stefano