Structure and Interpretation
- f Neural Codes
Jacob Andreas
Structure and Interpretation of Neural Codes Jacob Andreas - - PowerPoint PPT Presentation
Structure and Interpretation of Neural Codes Jacob Andreas Translating Neuralese Jacob Andreas, Anca Dragan and Dan Klein Learning to Communicate [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16] 3 3 Learning to Communicate
Jacob Andreas
Jacob Andreas, Anca Dragan and Dan Klein
Learning to Communicate
3 3 [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16]
Learning to Communicate
4 4
Neuralese
5 5
1.0 2.3
Translating neuralese
1.0 2.3
all clear
6
autonomous systems
Translating neuralese
1.0 2.3
all clear
[Lazaridou et al. 16] 7
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
8
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
9
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
10
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
11 11
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
12
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
13
A statistical MT problem
14
a a
a
1.0 2.3
all clear
[e.g. Koehn 10]
A statistical MT problem
15
How do we induce a translation model?
A statistical MT problem
16
a a
a
a
a
Strategy mismatch
17
ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x
Strategy mismatch
18
not sure
ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x
Strategy mismatch
19
not sure dunno
Strategy mismatch
20
not sure dunno yes
Strategy mismatch
21
not sure dunno yes yes no yes
Strategy mismatch
22
not sure yes
Stat MT criterion doesn’t capture meaning
23
moving (0,3)→(1,4) In the intersection
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
24
A “semantic MT” problem
25
I’m going north
The meaning of an utterance is given by its truth conditions
[Davidson 67]
A “semantic MT” problem
26
I’m going north
The meaning of an utterance is given by its truth conditions
[Davidson 67]
A “semantic MT” problem
27
(loc (goal blue) north) I’m going north
The meaning of an utterance is given by its truth conditions
A “semantic MT” problem
28
0.4 0.2 0.001
I’m going north
The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered
[Beltagy et al. 14]
A “semantic MT” problem
29
0.4 0.2 0.001
I’m going north
The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered
[Frank et al. 09, A & Klein 16]
Representing meaning
30
The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered
Representing meaning
31
The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered
This distribution is well-defined even if the “utterance” is a vector rather than a sequence of tokens.
Translating with meaning
32
1.0 2.3
Translating with meaning
33
1.0 2.3
In the intersection
Translating with meaning
34
1.0 2.3
I’m going north
Translating with meaning
35
1.0 2.3
I’m going north
p( | ) p( | )
a
Translating with meaning
36
1.0 2.3
I’m going north
a
Interlingua!
37
source text target text
Translation criterion
a
38
Translation criterion
a
39
Translation criterion
a
40
Translation criterion
a
41
Computing representations
argmin
a KL( || )
a
42
Computing representations: sparsity
argmin
a KL( || )
a
p( | )
a
p( | )
43
Computing representations: smoothing
argmin
a KL( || )
a
actions & messages agent policy
44
argmin
a KL( || )
a
actions & messages agent policy agent model
45
Computing representations: smoothing
argmin
a KL( || )
a
actions & messages human
46
Computing representations: smoothing
argmin
a KL( || )
a
actions & messages human policy human model
47
Computing representations: smoothing
argmin
a KL( || )
a
0.10 0.05 0.13 0.08 0.01 0.22
a
48
Computing representations: smoothing
Computing KL
argmin
a KL( || )
a
49
Computing KL
argmin
a KL( || )
a
50
p
Computing KL: sampling
argmin
a KL( || )
a
51
i i i i
Finding translations
argmin
a KL( || )
a
52
Finding translations: brute force
argmin
a KL( || )
a
going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7
53
argmin
a KL( || )
a
going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7
54
Finding translations: brute force
Finding translations
a
55
Outline
Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation
56
Referring expression games
57
1.0 2.3
with black face
Evaluation: translator-in-the-loop
58
1.0 2.3
with black face
59
1.0 2.3
with black face
Evaluation: translator-in-the-loop
Experiment: color references
60
Experiment: color references
0.50 1.00 Neuralese → Neuralese English → English* 0.83
61
0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT 0.83
0.72 0.70
62
Experiment: color references
0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT
0.72 0.70 0.86 0.73
0.83
63
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
64
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
65
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
66
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
67
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
68
Experiment: color references
magenta, hot, rose magenta, hot, violet
pinkish, grey, dull
69
Experiment: color references
Experiment: image references
50 95 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT 77
57 55 75 60
70
large bird, black wings, black crown
large bird, black wings, black crown small brown, light brown, dark brown
71
Experiment: image references
Experiment: driving game
1.35 1.93 Neuralese ↔ English* Neuralese → Neuralese Statistical MT Semantic MT
1.49 1.54
72
How to translate
at goal done left to top going in intersection proceed going you first following going down
un-language-like things (e.g. RNN states)
without logical forms if we have access to world states
Conclusions so far
74
non-language-like things (e.g. RNN states)
without logical forms if we have access to world states
75
Conclusions so far
non-language-like things (e.g. RNN states)
without logical forms if we have access to world states
76
Conclusions so far
Limitations
argmin
a KL( || )
a
77
i i i i
but what about compositionality?
Jacob Andreas and Dan Klein
“Flat” semantics
at goal done going in intersection proceed going you first following
80
Compositional semantics
81
Compositional semantics
82
Compositional semantics
✔ ✔ ✔
83
message
Compositional semantics
✔ ✔ ✔
everything but the blue shapes
84 [FitzGerald et al. 2013]
Compositional semantics
✔ ✔ ✔
lambda x: not(blue(x)) lambda x: or(orange(x), not(square(x))
85 [FitzGerald et al. 2013]
Compositional semantics
✔ ✔ ✔
86
???
Model architecture
87
Model architecture
88
Model architecture
✔ ✔ ✔
89
Model architecture
✔ ✔ ✔
90
1.0 2.3
Computing meaning representations
91
1.0 2.3
92
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
everything but squares
Computing meaning representations
everything but squares
93
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔
Computing meaning representations
everything but squares
94
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
Computing meaning representations
everything but squares
95
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
Computing meaning representations
96
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
everything but squares
Computing meaning representations
97
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
lambda x: not(square(x))
Computing meaning representations
98
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
lambda x: not(square(x))
Computing meaning representations
q( , ) =
a
a
0.10 0.05 0.13 0.08 0.01 0.22
99
a
Translation criterion
q( , ) =
a
a a
100
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
=
Translation criterion
Experiments
“High-level” communicative behavior “Low-level” message structure
101
Experiments
“High-level” communicative behavior “Low-level” message structure
102
103
Comparing strategies
104
everything but squares
0.5 -0.4 0.2 1.0
Comparing strategies
105
everything but squares
✔ ✔ ✔ ✔ ✔ ✔ ✔
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔
Comparing strategies
106
everything but squares
✔ ✔ ✔ ✔ ✔ ✔ ✔
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔
Comparing strategies
107
✔ ✔ ✔ ✔ ✔ ✔
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔
Theories of model behavior: random
108
0.5 -0.4 0.2 1.0
✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
Theories of model behavior: literal
109
0.00 0.50 1.00
50 63 27 Literal Human
Evaluation: high-level scene agreement
110
0.00 0.50 1.00
72 50 92 74 Random Literal Human
Evaluation: high-level object agreement
Experiments
“High-level” communicative behavior “Low-level” message structure
111
Collecting translation data
112
all the red shapes blue objects everything but red green squares not green squares
Collecting translation data
113
λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))
Collecting translation data
114
λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1
1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1
Extracting related pairs
115
λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1
Extracting related pairs
116
λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1
Learning compositional operators
117
argmin
Evaluating learned operators
λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1
Evaluating learned operators
λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1
Evaluating learned operators
λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1 ???
121
Evaluation: scene agreement for negation
0.00 0.50 1.00
50 81 12
122
all the toys that are not red every thing that is red
green objects all items that are not blue or green
Input Predicted True
Visualizing negation
123
0.00 0.50 1.00
50 54 9
Evaluation: scene agreement for disjunction
124
Input Predicted True
all of the red objects the blue and red items the blue objects the blue and yellow items all the yellow toys all yellow or red items
Visualizing disjunction
by grounding in distributions over world states
interpretable pragmatics & compositional structure
good general-purpose tool for interpreting deep reprs.
Conclusions
125
by grounding in distributions over world states
interpretable pragmatics & compositional structure
good general-purpose tool for interpreting deep reprs.
Conclusions
126
by grounding in distributions over world states
interpretable pragmatics & compositional structure
good general-purpose tool for interpreting deep reprs.
Conclusions
127
Conclusions
128
not sure dunno yes yes no yes
1.0 2.3
Thank you!
http://github.com/jacobandreas/{neuralese,rnn-syn}