Analysis and Visualization
Philipp Koehn 10 November 2020
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
Analysis and Visualization Philipp Koehn 10 November 2020 Philipp - - PowerPoint PPT Presentation
Analysis and Visualization Philipp Koehn 10 November 2020 Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020 1 analytical evaluation Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
Philipp Koehn 10 November 2020
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
1
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
2
– dropped input / added output – gibberish (the the the the) – hallucinated output
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
3
– Low resource example Republican strategy to counter the re-election of Obama Un ´
– Out of domain example Schaue um dich herum. EMEA / MB / 049 / 01-EN-Final Work progamme for 2002
– turns into generative language model – ignores input context
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
4
“Error Analysis of Statistical Machine Translation Output” (Vilar et al., LREC 2006)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
5
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
6
– by part-of-speech tag – multi-word construction, e.g.,
AUX:V constructions such as can eat
– NMT: neural machine translation – PBSY: phrase-based statistical – HPB: ierarchical phrase-based statistical – SPB: syntax-based statistical
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
7
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
8
(Isabelle et al., EMNLP 2017)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
9
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
10
– noun phrase agreement – subject-verb agreement – separable verb particle – polarity (negative/positive)
– positive example: correct translation – negative example: translation with error
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
11
– good: ... these interesting proposals ... – bad: ... this interesting proposals ...
– good: ... the idea to extend voting rights was ... – bad: ... the idea to extend voting rights were ...
– good: ... switch the light on ... – bad: ... switch the light by ...
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
12
– byte pair encoding (BPE) – character-based word embeddings (char)
agreement polarity (negation) system noun phrase subject-verb verb particle insertion deletion BPE-to-BPE 95.6 93.4 91.1 97.9 91.5 BPE-to-char 93.9 91.2 88.0 98.5 88.4 char-to-char 93.9 91.5 86.7 98.5 89.3 human 99.4 99.8 99.8 99.9 98.5
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
13
– ( { } ) – ( { } { ( ) } ) – { ( { } ( { } ) ) ( { } ) }
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
14
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
15
relations between Obama and Netanyahu have been strained for years . die Beziehungen zwischen Obama und Netanjahu sind seit Jahren angespannt . 56 89 72 16 26 96 79 98 42 11 11 14 38 22 84 23 54 10 98 49
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
16
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
17
“Many of the attention heads exhibit behaviour that seems related to the structure
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
18
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
19
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
20
RNN RNN
Output Word Prediction Output Word Output Word Embeddings Decoder State Input Context ti
<s> Embed das Embed
yi E yi si ci
Softmax
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
21
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
22
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
23
– “Interactive Visualization and Manipulation of Attention-based Neural Machine Translation” (Lee et al., EMNLP 2017) – “SEQ2SEQ-VIS : A Visual Debugging Tool for Sequence-to-Sequence Models” (Strobelt et al., 2018)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
24
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
25
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
26
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
27
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
28
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
29
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
30
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
31
– word embedding – encoder state – decoder state
– does the model discover morphological properties? – does the model disambiguate words?
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
32
Encoder states discover part-of-speech.
– given: encoder state for word dog – label: singular noun (NN)
– translate sentences not seen during training – record their encoder states – look up their part of speech tags (running POS tagger or use labeled data) → training example (encoder state ; label)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
33
“Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks” (Belinkov et al., ACL 2017)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
34
– translate English into Russian, German – copy English – copy permuted English – parse English into linearized parse structure
– constituent phrase (NP, VP, etc.) – passive voice and tense
– much better quality when translating than majority class – same quality for copying as majority class
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
35
– part-of-speech tag – semantic tag ∗ type of named entity ∗ semantic relationships ∗ discourse relationships
– compare prediction quality of different encoder layers – mostly better performance at deeper layers – little impact from target language
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
36
embeddings
– part-of-speech tag – morphological properties
– character-based representations much better for learning morphology – word-based models are sufficient for learning structure of common words – lower layers better at word structure, deeper layers better at word meaning – target language matters for what information is learned – neural decoder learns very little about word structure
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
37
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
38
– prediction of a specific output word – which of the input words mattered most? – which of the previous output words mattered most?
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
39
i.e., high value for word in softmax
– consider values of previous layer – consider weights from previous layer – assign relevance values for each node in previous layer – normalize so they add up to one
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
40
“Visualizing and Understanding Neural Machine Translation” (Ding, Liu, Luang and Sun, ACL 2017)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
41
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
42
if a decision changes a lot if a specific input value changes ⇓ more relevant change in the input value has no impact on decision ⇓ not relevant
– relationship p(y0|x0) between an input value x0 and an output value y0 – assume this to be a linear relationship (which is approximately true locally) – compute slope by derivative saliency(x, y) = ∂p(x|y) ∂x
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
43
⇒ Trace back to word embeddings
– not interested in individual neurons – combine salience values in embedding vector
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
44
Human Reference Attention Saliency
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
45
what a model should be doing = what a model is doing
– bank most responsible to produce German translation Bank – credit or account may be crucial for word sense disambiguation – other words may provide clues that word is a noun (not a verb)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
46
Why did the network reach this decision?
⇒ Causal explanation
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
47
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
48
Karpathy et al. (2015): ”Visualizing and Understanding Recurrent Networks”
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
49
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
50
– given: encoder-decoder model without attention – does the encoder record the length of the consumed sequence? – does the decoder record the length of the generated sequence?
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
51
– value of neuron when processing xth word – position x
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
52
“Why neural translations are the right length” (Shi, Knight, Yuret, EMNLP 2016)
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020
53
Philipp Koehn Machine Translation: Analysis and Visualization 10 November 2020