sequence to sequence models
play

Sequence to Sequence Models for Machine Translation CMSC 723 / LING - PowerPoint PPT Presentation

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig Machine Translation 3 problems Translation system Input: source sentence F Output:


  1. Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig

  2. Machine Translation • 3 problems • Translation system • Input: source sentence F • Output: target sentence E • Modeling • Can be viewed as a function • how to define P(.)? • Training/Learning • how to estimate parameters from • Statistical machine translation systems parallel corpora? • Search • How to solve argmax efficiently?

  3. Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Practical tricks • Sequence to sequence models for other NLP tasks

  4. A feedforward neural 3-gram model

  5. A recurrent language model

  6. A recurrent language model

  7. Examples of RNN variants • LSTMs • Aim to address vanishing/exploding gradient issue • Stacked RNNs • …

  8. Training in practice: online

  9. Training in practice: batch

  10. Training in practice: minibatch • Compromise between online and batch • Computational advantages • Can leverage vector processing instructions in modern hardware • By processing multiple examples simultaneously

  11. Problem with minibatches: in language modeling, examples don’t have the same length • 3 tricks • Padding • Add </s> symbol to make all sentences same length • Masking • Multiply loss function calculated over padded symbols by zero • + sort sentences by length

  12. Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Training tricks • Sequence to sequence models for other NLP tasks

  13. Encoder-decoder model

  14. Encoder-decoder model

  15. Generating Output • We have a model P(E|F), how can we generate translations? • 2 methods • Sampling : generate a random sentence according to probability distribution • Argmax : generate sentence with highest probability

  16. Ancestral Sampling • Randomly generate words one by one • Until end of sentence symbol • Done!

  17. Greedy search • One by one, pick single highest probability word • Problems • Often generates easy words first • Often prefers multiple common words to rare words

  18. Greedy Search Example

  19. Beam Search Example with beam size b = 2 We consider b top hypotheses at each time step

  20. Introduction to Neural Machine Translation • Neural language models review • Sequence to sequence models for MT • Encoder-Decoder • Sampling and search (greedy vs beam search) • Practical tricks • Sequence to sequence models for other NLP tasks

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend