cs224n ling284
play

CS224N/Ling284 Lecture 8: Machine Translation, - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Machine Translation, Sequence-to-sequence and Attention Abigail See Announcements We are taking attendance today Sign in with the TAs outside the auditorium


  1. Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Machine Translation, Sequence-to-sequence and Attention Abigail See

  2. Announcements • We are taking attendance today • Sign in with the TAs outside the auditorium • No need to get up now – there will be plenty of time to sign in after the lecture ends • For attendance policy special cases, see Piazza post for clarification • Assignment 4 content covered today • Get started early! The model takes 4 hours to train! • Mid-quarter feedback survey: • Will be sent out sometime in the next few days (watch Piazza). • Complete it for 0.5% credit 2

  3. Overview Today we will: • Introduce a new task: Machine Translation is a major use-case of • Introduce a new neural architecture: sequence-to-sequence is improved by • Introduce a new neural technique: attention 3

  4. Section 1: Pre-Neural Machine Translation 4

  5. Machine Translation Machine Translation (MT) is the task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language). x: L'homme est né libre, et partout il est dans les fers y: Man is born free, but everywhere he is in chains - Rousseau 5

  6. 1950s: Early Machine Translation Machine Translation research began in the early 1950s. • Russian → English (motivated by the Cold War!) 1 minute video showing 1954 MT: https://youtu.be/K-HfpsHPmvw • Systems were mostly rule-based, using a bilingual dictionary to map Russian words to their English counterparts 6

  7. 1990s-2010s: Statistical Machine Translation • Core idea: Learn a probabilistic model from data • Suppose we’re translating French → English. • We want to find best English sentence y, given French sentence x • Use Bayes Rule to break this down into two components to be learnt separately: Translation Model Language Model Models how words and phrases Models how to write should be translated ( fidelity ). good English ( fluency ). Learnt from monolingual data. Learnt from parallel data. 7

  8. 1990s-2010s: Statistical Machine Translation • Question: How to learn translation model ? • First, need large amount of parallel data (e.g. pairs of human-translated French/English sentences) The Rosetta Stone Ancient Egyptian Demotic Ancient Greek 8

  9. Learning alignment for SMT • Question: How to learn translation model from the parallel corpus? • Break it down further: we actually want to consider where a is the alignment, i.e. word-level correspondence between French sentence x and English sentence y 9

  10. • – ’ ’ • “ ” ” “ ” … – t’ “ …” es “ k” – “ ”… – – – – r … What is alignment? Alignment is the correspondence between particular words in the translated sentence pair. • Note: Some words have no counterpart etween words in f and words in e nouveaux s é ismes secou é spurious Japon é é deux word par Le Le Japan Japon Japan été secou é shaken shaken by par by two deux nouveaux new two sé ismes quakes new quakes 10 Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003

  11. Alignment is complex Alignment can be many-to-one autochtones appartenait The Le reste aux Le balance reste é was The the appartenait territory balance don’t of aux é was the the aboriginal autochtones territory people of the many-to-one aboriginal alignments people 11 Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 • … • • é é • … • – … • • – • é é

  12. • – ’ ’ • “ ” ” “ ” … t’ “ …” – es “ k” – “ ”… – – – – r … Alignment is complex Alignment can be one-to-many zero fertility word programme application not translated We call this a fertile word mis é é t é Le en a And Le é the programme And program a the été has é program been mis has implemented en application been sé implemented one-to-many alignment 12 Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003

  13. Alignment is complex Some words are very fertile! with me pie he hit a il il he a hit a m’ me m’ entarté with entarté a pie This word has no single- word equivalent in English 13

  14. Alignment is complex Alignment can be many-to-many (phrase-level) d é munis pauvres sont Les The Les The poor pauvres poor don’t sont don t d é munis have have any money any money many-to-many alignment phrase alignment 14 Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 • … • • é é • … • – … • • – • é é

  15. Learning alignment for SMT • We learn as a combination of many factors, including: • Probability of particular words aligning (also depends on position in sent) • Probability of particular words having particular fertility (number of corresponding words) • etc. 15

  16. Decoding for SMT Language Model Question: Translation Model How to compute this argmax? • We could enumerate every possible y and calculate the probability? → Too expensive! • Answer: Use a heuristic search algorithm to search for the best translation, discarding hypotheses that are too low-probability • This process is called decoding 16

  17. Decoding for SMT Source: ”Statistical Machine Translation", Chapter 6, Koehn, 2009. https://www.cambridge.org/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5 17

  18. Decoding for SMT Source: ”Statistical Machine Translation", Chapter 6, Koehn, 2009. https://www.cambridge.org/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5 18

  19. 1990s-2010s: Statistical Machine Translation • SMT was a huge research field • The best systems were extremely complex • Hundreds of important details we haven’t mentioned here • Systems had many separately-designed subcomponents • Lots of feature engineering • Need to design features to capture particular language phenomena • Require compiling and maintaining extra resources • Like tables of equivalent phrases • Lots of human effort to maintain • Repeated effort for each language pair! 19

  20. Section 2: Neural Machine Translation 20

  21. 2014 (dramatic reenactment) 21

  22. 2014 (dramatic reenactment) 22

  23. What is Neural Machine Translation? • Neural Machine Translation (NMT) is a way to do Machine Translation with a single neural network • The neural network architecture is called sequence-to-sequence (aka seq2seq) and it involves two RNNs. 23

  24. Neural Machine Translation (NMT) The sequence-to-sequence model Target sentence (output) Encoding of the source sentence. Provides initial hidden state with a pie <END> he hit me for Decoder RNN. argmax argmax argmax argmax argmax argmax argmax Encoder RNN Decoder RNN he hit me with a pie <START> il a m’ entarté Decoder RNN is a Language Model that generates Source sentence (input) target sentence, conditioned on encoding . Encoder RNN produces Note: This diagram shows test time behavior: an encoding of the decoder output is fed in as next step’s input source sentence. 24

  25. Sequence-to-sequence is versatile! • Sequence-to-sequence is useful for more than just MT • Many NLP tasks can be phrased as sequence-to-sequence: • Summarization (long text → short text) • Dialogue (previous utterances → next utterance) • Parsing (input text → output parse as sequence) • Code generation (natural language → Python code) 25

  26. Neural Machine Translation (NMT) • The sequence-to-sequence model is an example of a Conditional Language Model . • Language Model because the decoder is predicting the next word of the target sentence y • Conditional because its predictions are also conditioned on the source sentence x • NMT directly calculates : Probability of next target word, given target words so far and source sentence x • Question : How to train a NMT system? • Answer : Get a big parallel corpus… 26

  27. Training a Neural Machine Translation system = negative log = negative log = negative log prob of “with” prob of “he” prob of <END> 𝑈 𝐾 = 1 𝐾 3 𝐾 4 𝑈 ෍ 𝐾 𝑢 𝐾 1 𝐾 2 𝐾 5 𝐾 6 𝐾 7 = + + + + + + 𝑢=1 𝑧 3 ො 𝑧 1 ො 𝑧 2 ො 𝑧 4 ො 𝑧 5 ො 𝑧 6 ො 𝑧 7 ො Encoder RNN Decoder RNN <START> he hit me with a pie il a m’ entarté Target sentence (from corpus) Source sentence (from corpus) Seq2seq is optimized as a single system. Backpropagation operates “ end-to- end” . 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend