Analysis and Visualization Philipp Koehn 10 November 2020 Philipp - PowerPoint PPT Presentation

Analysis and Visualization Philipp Koehn 10 November 2020 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

1 analytical evaluation Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Error Analysis 2 • Manually inspect output of machine translation system • Identify errors and categorize them • Specific problems of neural machine translation – dropped input / added output – gibberish ( the the the the ) – hallucinated output Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Hallucinated Output 3 • Examples of extreme translation failures – Low resource example Republican strategy to counter the re-election of Obama Un ´ organo de coordinaci´ on para el anuncio de libre determinaci´ on – Out of domain example Schaue um dich herum. EMEA / MB / 049 / 01-EN-Final Work progamme for 2002 • Neural MT goes off track – turns into generative language model – ignores input context Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Linguistic Categories 4 “Error Analysis of Statistical Machine Translation Output” (Vilar et al., LREC 2006) Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

MQM 5 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Bentivogli et al. (EMNLP 2016) 6 • Manually corrected machine translation • Breakdown of word edits – by part-of-speech tag – multi-word construction, e.g., AUX : V constructions such as can eat • Systems – NMT: neural machine translation – PBSY: phrase-based statistical – HPB: ierarchical phrase-based statistical – SPB: syntax-based statistical Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

7 targeted test sets Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Challenge Set 8 • Create challenging sentence pairs with specific problems • “A Challenge Set Approach to Evaluating Machine Translation” (Isabelle et al., EMNLP 2017) Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Challenge Set: Results 9 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Contrastive Translation Pairs 10 • Goal: find out how well certain translation problems are handled • Examples – noun phrase agreement – subject-verb agreement – separable verb particle – polarity (negative/positive) • Idea: forced decoding with contrastive translation pair – positive example: correct translation – negative example: translation with error • Check if positive example gets better score Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Contrastive Translation Pairs 11 • Noun phrase agreement – good: ... these interesting proposals ... – bad: ... this interesting proposals ... • Subject-verb agreement – good: ... the idea to extend voting rights was ... – bad: ... the idea to extend voting rights were ... • Separable verb prefix – good: ... switch the light on ... – bad: ... switch the light by ... Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Sennrich (EACL 2017) 12 • Compares neural machine translation systems for English–German • Varying word encoding – byte pair encoding (BPE) – character-based word embeddings (char) • Results agreement polarity (negation) system noun phrase subject-verb verb particle insertion deletion BPE-to-BPE 95.6 93.4 91.1 97.9 91.5 BPE-to-char 93.9 91.2 88.0 98.5 88.4 char-to-char 93.9 91.5 86.7 98.5 89.3 human 99.4 99.8 99.8 99.9 98.5 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Synthetic Languages 13 • Create artificial training examples to assess capability of systems • Example: bracketing language – ( { } ) – ( { } { ( ) } ) – { ( { } ( { } ) ) ( { } ) } • Check ability to make correct predictions based on nesting depth, length, etc. Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

14 visualization Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Word Alignment 15 Netanyahu relations between strained Obama years have been and for . die 56 16 Beziehungen 89 zwischen 72 26 96 Obama und 79 Netanjahu 98 sind 42 11 38 seit 22 54 10 Jahren 98 angespannt 84 . 11 14 23 49 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Multi-Head Attention 16 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Multi-Head Attention 17 “Many of the attention heads exhibit behaviour that seems related to the structure of the sentence.“ Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Word Embeddings 18 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Word Sense Clusters 19 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Input Context and Decoder State 20 Output Word E y i Embed Embed Embeddings y i Output Word <s> das Output Word t i Softmax Prediction s i Decoder State RNN RNN c i Input Context • Word predictions are informed by previous output (decoder state) and input • How much does each contribute? Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Input Context vs. Decoder State 21 • Input: Republican strategy to counter the re @-@ election of Obama • KL divergence between decoder predictions with and w/o input context • Input context matters more for content words Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

22 visualization tools Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Interactive Exploration 23 • Tools for inspecting behavior of models and algorithms • Helps to get insights • Examples – “Interactive Visualization and Manipulation of Attention-based Neural Machine Translation” (Lee et al., EMNLP 2017) – “SEQ2SEQ-VIS : A Visual Debugging Tool for Sequence-to-Sequence Models” (Strobelt et al., 2018) Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Search Graph 24 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Manipulating Search 25 • Inspect attention weights • Change attention weights → check change in word prediction Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Predictions 26 • E/D: encoder and decoder words • S3: attention weights • S4: top k predictions Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Trajectory of Decoder States 27 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Decoder State Neighborhoods 28 • 2-D projections of decoder states • Database of decoder states in training data • Show neighborhood Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Similar Decoder State 29 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

30 probing representations Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

What is in a Representation? 31 • What is contained in an intermediate representation? – word embedding – encoder state – decoder state • More specific questions – does the model discover morphological properties? – does the model disambiguate words? Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Classifier Approach 32 • Pose a hypothesis, e.g., Encoder states discover part-of-speech. • Formalize this as a classification problem – given: encoder state for word dog – label: singular noun ( NN ) • Train on representations generated by running inference – translate sentences not seen during training – record their encoder states – look up their part of speech tags (running POS tagger or use labeled data) → training example (encoder state ; label) • Test on new sentences Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

33 “Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks” (Belinkov et al., ACL 2017) Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Shi et al. (EMNLP 2016) 34 • LSTM sequence-to-sequence model without attention • Different tasks – translate English into Russian, German – copy English – copy permuted English – parse English into linearized parse structure • Predict – constituent phrase (NP, VP, etc.) – passive voice and tense • Findings – much better quality when translating than majority class – same quality for copying as majority class Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Analysis and Visualization Philipp Koehn 10 November 2020 Philipp - PowerPoint PPT Presentation

Analysis and Visualization Philipp Koehn 10 November 2020 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020 1 analytical evaluation Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Visualization Visualization Understand what ConvNets learn 2 Visualization The development of

Data Visualization Brait ispuu Types of Visualization Mathematical Visualization y =

Visualization CS 299 Introduction to Data Science Overview 1. What Is Visualization? 2.

Visualization Systems 11-1 Ronald Peikert SciVis 2008 - Visualization Systems Modular

Data Visualization Tools, How do you make a visualization? Is it the right visualization?

NAVIGATING THE IN SITU VISUALIZATION LANDSCAPE Tom Fogal, 4/6/2016 VISUALIZATION & ANALYSIS

Code Visualization 2 Code Visualization PaiMei and uDraw(Graph)

Interactive Data Visualization with Bokeh Interactive Data Visualization with Bokeh What is

Information Visualization Text: Information visualization, Robert Spence, Addison-Wesley, 2001

Glyph-based Visualization Applications David H. S. Chung Swansea University Outline Glyph

Scientific Visualization : From Data to Insight Vijay Natarajan Indian Institute of Science

Visualization History Visual Programming Visualization History Visual Programming

Scientific Visualization Algorithms Graphics & Visualization: Principles & Algorithms

Volume Visualization Overview: Volume Visualization (1) Introduction to volume visualization On

H517 Visualization Design, Analysis, & Evaluation Week 12: Visualization Tasks &

A Metal-Only-ECO S olver for Input-S lew and Output-Loading Violations Chien-Pang Lu, Mango

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #24:

Introd u ction to Teacher Forcing MAC H IN E TR AN SL ATION IN P YTH ON Th u shan Ganegedara

LTI system response Daniele Carnevale Dipartimento di Ing. Civile ed Ing. Informatica (DICII),

Upstream Forcing of Tidewater Glacier Retreat Ian Hewitt , University of Oxford Tidewater glaciers

Math 211 Math 211 Lecture #36 Forced Harmonic Motion Nonlinear Systems November 21, 2001 2

Recap What are the three components of a process? Address space CPU context OS

Designed Experiments, One Factor with Two levels The simplest experimental design problem is where

Analysis and Visualization Philipp Koehn 10 November 2020 Philipp - PowerPoint PPT Presentation

Analysis and Visualization Philipp Koehn 10 November 2020 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020 1 analytical evaluation Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020

Security Visualization Tim Vidas &amp; Hanan Hibshi UPS 2011 1 Visualization Visualization can

Visualization Visualization Understand what ConvNets learn 2 Visualization The development of

Data Visualization Brait ispuu Types of Visualization Mathematical Visualization y =

Visualization CS 299 Introduction to Data Science Overview 1. What Is Visualization? 2.

Visualization Systems 11-1 Ronald Peikert SciVis 2008 - Visualization Systems Modular

Data Visualization Tools, How do you make a visualization? Is it the right visualization?

NAVIGATING THE IN SITU VISUALIZATION LANDSCAPE Tom Fogal, 4/6/2016 VISUALIZATION &amp; ANALYSIS

Code Visualization 2 Code Visualization PaiMei and uDraw(Graph)

Interactive Data Visualization with Bokeh Interactive Data Visualization with Bokeh What is

Information Visualization Text: Information visualization, Robert Spence, Addison-Wesley, 2001

Glyph-based Visualization Applications David H. S. Chung Swansea University Outline Glyph

Scientific Visualization : From Data to Insight Vijay Natarajan Indian Institute of Science

Visualization History Visual Programming Visualization History Visual Programming

Scientific Visualization Algorithms Graphics &amp; Visualization: Principles &amp; Algorithms

Volume Visualization Overview: Volume Visualization (1) Introduction to volume visualization On

H517 Visualization Design, Analysis, &amp; Evaluation Week 12: Visualization Tasks &amp;

A Metal-Only-ECO S olver for Input-S lew and Output-Loading Violations Chien-Pang Lu, Mango

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #24:

Introd u ction to Teacher Forcing MAC H IN E TR AN SL ATION IN P YTH ON Th u shan Ganegedara

LTI system response Daniele Carnevale Dipartimento di Ing. Civile ed Ing. Informatica (DICII),

Upstream Forcing of Tidewater Glacier Retreat Ian Hewitt , University of Oxford Tidewater glaciers

Math 211 Math 211 Lecture #36 Forced Harmonic Motion Nonlinear Systems November 21, 2001 2

Recap What are the three components of a process? Address space CPU context OS

Designed Experiments, One Factor with Two levels The simplest experimental design problem is where

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

NAVIGATING THE IN SITU VISUALIZATION LANDSCAPE Tom Fogal, 4/6/2016 VISUALIZATION & ANALYSIS

Scientific Visualization Algorithms Graphics & Visualization: Principles & Algorithms

H517 Visualization Design, Analysis, & Evaluation Week 12: Visualization Tasks &