[PPT] - Structure and Interpretation of Neural Codes Jacob Andreas PowerPoint Presentation

SLIDE 1

Structure and Interpretation 

f Neural Codes

Jacob Andreas

SLIDE 2

Translating Neuralese

Jacob Andreas, Anca Dragan and Dan Klein

SLIDE 3

Learning to Communicate

3 3 [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16]

SLIDE 4

Learning to Communicate

4 4

SLIDE 5

Neuralese

5 5

1.0 2.3

0.3 0.4
1.2 1.1

SLIDE 6

Translating neuralese

1.0 2.3

0.3 0.4
1.2 1.1

all clear

6

SLIDE 7

Interoperate with

autonomous systems

Diagnose errors
Learn from solutions

Translating neuralese

1.0 2.3

0.3 0.4
1.2 1.1

all clear

[Lazaridou et al. 16] 7

SLIDE 8

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

8

SLIDE 9

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

9

SLIDE 10

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

10

SLIDE 11

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

11 11

SLIDE 12

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

12

SLIDE 13

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

13

SLIDE 14

A statistical MT problem

14

max p( | ) p( )

a a

a

1.0 2.3

0.3 0.4
1.2 1.1

all clear

[e.g. Koehn 10]

SLIDE 15

A statistical MT problem

15

How do we induce a translation model?

SLIDE 16

A statistical MT problem

16

max p( | ) p( )

a a

a

max Σ p( | ) p( | ) p( )

a

∝

SLIDE 17

Strategy mismatch

17

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

SLIDE 18

Strategy mismatch

18

not sure

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

SLIDE 19

Strategy mismatch

19

not sure dunno

SLIDE 20

Strategy mismatch

20

not sure dunno yes

SLIDE 21

Strategy mismatch

21

not sure dunno yes yes no yes

SLIDE 22

Strategy mismatch

22

not sure yes

Σ p( , | not sure ) p( not sure )

SLIDE 23

Stat MT criterion doesn’t capture meaning

23

moving  (0,3)→(1,4) In the intersection

SLIDE 24

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

✘

24

SLIDE 25

A “semantic MT” problem

25

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

SLIDE 26

A “semantic MT” problem

26

✔ ✔ ✘

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

SLIDE 27

A “semantic MT” problem

27

(loc (goal blue) north) I’m going north

The meaning of an utterance is given by its truth conditions

SLIDE 28

A “semantic MT” problem

28

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

[Beltagy et al. 14]

SLIDE 29

A “semantic MT” problem

29

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

r equivalently, the belief it induces in listeners

[Frank et al. 09, A & Klein 16]

SLIDE 30

Representing meaning

30

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

r equivalently, the belief it induces in listeners

SLIDE 31

Representing meaning

31

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

r equivalently, the belief it induces in listeners

This distribution is well-defined even if the “utterance” is a vector rather than a sequence of tokens.

SLIDE 32

Translating with meaning

32

1.0 2.3

0.3 0.4
1.2 1.1

SLIDE 33

Translating with meaning

33

1.0 2.3

0.3 0.4
1.2 1.1

In the intersection

SLIDE 34

Translating with meaning

34

1.0 2.3

0.3 0.4
1.2 1.1

I’m going north

SLIDE 35

Translating with meaning

35

1.0 2.3

0.3 0.4
1.2 1.1

I’m going north

p( | ) p( | )

a

SLIDE 36

Translating with meaning

36

1.0 2.3

0.3 0.4
1.2 1.1

I’m going north

β( ) β( )

a

SLIDE 37

Interlingua!

37

source text target text

β( ) β( )

a

SLIDE 38

KL( || )

Translation criterion

argmin

β( ) β( )

a

38

SLIDE 39

KL( || )

Translation criterion

argmin

β( ) β( )

a

39

SLIDE 40

KL( || )

Translation criterion

argmin

β( ) β( )

a

40

SLIDE 41

KL( || )

Translation criterion

argmin

β( ) β( )

a

41

SLIDE 42

Computing representations

argmin

a KL( || )

β( ) β( )

a

42

SLIDE 43

Computing representations: sparsity

argmin

a KL( || )

β( ) β( )

a

p( | )

a

p( | )

43

SLIDE 44

Computing representations: smoothing

argmin

a KL( || )

β( ) β( )

a

actions & messages agent  policy

44

SLIDE 45

argmin

a KL( || )

β( ) β( )

a

actions & messages agent  policy agent  model

45

Computing representations: smoothing

SLIDE 46

argmin

a KL( || )

β( ) β( )

a

actions & messages human

46

Computing representations: smoothing

SLIDE 47

argmin

a KL( || )

β( ) β( )

a

actions & messages human  policy human  model

47

Computing representations: smoothing

SLIDE 48

argmin

a KL( || )

β( ) β( )

a

0.10 0.05 0.13 0.08 0.01 0.22

a

48

Computing representations: smoothing

SLIDE 49

Computing KL

argmin

a KL( || )

β( ) β( )

a

49

SLIDE 50

Computing KL

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = E p( ) q( )

50

p

SLIDE 51

Computing KL: sampling

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = Σ p( ) log p( ) q( )

51

i i i i

SLIDE 52

Finding translations

argmin

a KL( || )

β( ) β( )

a

52

SLIDE 53

Finding translations: brute force

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

53

SLIDE 54

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

54

Finding translations: brute force

SLIDE 55

KL( || )

Finding translations

argmin

β( ) β( )

a

55

SLIDE 56

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

56

SLIDE 57

Referring expression games

57

1.0 2.3

0.3 0.4
1.2 1.1
range bird

with black face

SLIDE 58

Evaluation: translator-in-the-loop

58

1.0 2.3

0.3 0.4
1.2 1.1
range bird

with black face

SLIDE 59

59

1.0 2.3

0.3 0.4
1.2 1.1
range bird

with black face

Evaluation: translator-in-the-loop

SLIDE 60

Experiment: color references

60

SLIDE 61

Experiment: color references

0.50 1.00 Neuralese → Neuralese English → English* 0.83

61

SLIDE 62

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT 0.83

0.72 0.70

62

Experiment: color references

SLIDE 63

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT

0.72 0.70 0.86 0.73

0.83

63

Experiment: color references

SLIDE 64

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

64

Experiment: color references

SLIDE 65

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

65

Experiment: color references

SLIDE 66

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

66

Experiment: color references

SLIDE 67

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

67

Experiment: color references

SLIDE 68

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

68

Experiment: color references

SLIDE 69

magenta, hot, rose magenta, hot, violet

live, puke, pea

pinkish, grey, dull

69

Experiment: color references

SLIDE 70

Experiment: image references

50 95 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT 77

57 55 75 60

70

SLIDE 71

large bird, black wings, black crown

large bird, black wings, black crown small brown, light brown, dark brown

71

Experiment: image references

SLIDE 72

Experiment: driving game

1.35 1.93 Neuralese ↔ English* Neuralese → Neuralese Statistical MT Semantic MT

1.49 1.54

72

SLIDE 73

How to translate

wn

at goal  done  left to top going in intersection  proceed  going you first  following  going down

SLIDE 74

Classical notions of “meaning” apply even to

un-language-like things (e.g. RNN states)

These meanings can be compactly represented

without logical forms if we have access to world states

Communicating policies “say” interpretable things!

Conclusions so far

74

SLIDE 75

Classical notions of “meaning” apply even to

non-language-like things (e.g. RNN states)

These meanings can be compactly represented

without logical forms if we have access to world states

Communicating policies “say” interpretable things!

75

Conclusions so far

SLIDE 76

Classical notions of “meaning” apply even to

non-language-like things (e.g. RNN states)

These meanings can be compactly represented

without logical forms if we have access to world states

Communicating policies “say” interpretable things!

76

Conclusions so far

SLIDE 77

Limitations

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = Σ p( ) log p( ) q( )

77

i i i i

SLIDE 78

but what about compositionality?

SLIDE 79

Analogs of linguistic structure in deep representations

Jacob Andreas and Dan Klein

SLIDE 80

“Flat” semantics

wn

at goal  done going in intersection  proceed  going you first  following

80

SLIDE 81

Compositional semantics

81

SLIDE 82

Compositional semantics

82

SLIDE 83

Compositional semantics

✔ ✔ ✔

83

message

SLIDE 84

Compositional semantics

✔ ✔ ✔

everything but the blue shapes

range square and non-squares

84 [FitzGerald et al. 2013]

SLIDE 85

Compositional semantics

✔ ✔ ✔

lambda x: not(blue(x)) lambda x: or(orange(x), not(square(x))

85 [FitzGerald et al. 2013]

SLIDE 86

Compositional semantics

✔ ✔ ✔

86

???

SLIDE 87

Model architecture

87

SLIDE 88

Model architecture

88

SLIDE 89

Model architecture

✔ ✔ ✔

89

SLIDE 90

Model architecture

✔ ✔ ✔

90

1.0 2.3

0.3 0.4
1.2 1.1

SLIDE 91

Computing meaning representations

91

1.0 2.3

0.3 0.4
1.2 1.1
n the left

SLIDE 92

92

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

everything but squares

Computing meaning representations

SLIDE 93

everything but squares

93

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

SLIDE 94

everything but squares

94

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

SLIDE 95

everything but squares

95

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

SLIDE 96

96

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

everything but squares

Computing meaning representations

SLIDE 97

97

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

lambda x: not(square(x))

Computing meaning representations

SLIDE 98

98

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

lambda x: not(square(x))

Computing meaning representations

SLIDE 99

q( , ) =

a

KL( || ) β( ) β( )

a

0.10 0.05 0.13 0.08 0.01 0.22

99

a

Translation criterion

SLIDE 100

q( , ) =

a

β( ) E[ ] β( )

a a

100

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

=

Translation criterion

SLIDE 101

Experiments

“High-level” communicative behavior “Low-level” message structure

101

SLIDE 102

Experiments

“High-level” communicative behavior “Low-level” message structure

102

SLIDE 103

103

Comparing strategies

SLIDE 104

104

everything but squares

0.1 1.3

0.5 -0.4   0.2 1.0

Comparing strategies

SLIDE 105

105

everything but squares

✔ ✔ ✔ ✔ ✔ ✔ ✔

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Comparing strategies

SLIDE 106

106

everything but squares

✔ ✔ ✔ ✔ ✔ ✔ ✔

=

?

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Comparing strategies

SLIDE 107

107

✔ ✔ ✔ ✔ ✔ ✔

=

?

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Theories of model behavior: random

SLIDE 108

108

0.1 1.3

0.5 -0.4   0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

=

?

Theories of model behavior: literal

SLIDE 109

109

0.00 0.50 1.00

50 63 27 Literal Human

Evaluation: high-level scene agreement

SLIDE 110

110

0.00 0.50 1.00

72 50 92 74 Random Literal Human

Evaluation: high-level object agreement

SLIDE 111

Experiments

“High-level” communicative behavior “Low-level” message structure

111

SLIDE 112

Collecting translation data

112

all the red shapes blue objects everything but red green squares not green squares

SLIDE 113

Collecting translation data

113

λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

SLIDE 114

Collecting translation data

114

λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1

0.3 0.2 0.1 0.1

1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

SLIDE 115

Extracting related pairs

115

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

SLIDE 116

Extracting related pairs

116

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

SLIDE 117

Learning compositional operators

117

argmin

2

SLIDE 118

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1

SLIDE 119

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1

0.2 0.4 -0.3 0.0

SLIDE 120

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1 ???

0.2 0.4 -0.3 0.0

SLIDE 121

121

Evaluation: scene agreement for negation

0.00 0.50 1.00

50 81 12

SLIDE 122

122

all the toys that are not red every thing that is red

nly the blue and

green objects all items that are not blue or green

Input Predicted True

Visualizing negation

SLIDE 123

123

0.00 0.50 1.00

50 54 9

Evaluation: scene agreement for disjunction

SLIDE 124

124

Input Predicted True

all of the red objects the blue and red items the blue objects the blue and yellow items all the yellow toys all yellow or red items

Visualizing disjunction

SLIDE 125

We can translate between neuralese and natural lang.

by grounding in distributions over world states

Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

125

SLIDE 126

We can translate between neuralese and natural lang.

by grounding in distributions over world states

Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

126

SLIDE 127

We can translate between neuralese and natural lang.

by grounding in distributions over world states

Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

127

SLIDE 128

Conclusions

128

not sure dunno yes yes no yes

SLIDE 129

1.0 2.3

0.3 0.4
1.2 1.1

Thank you!

http://github.com/jacobandreas/{neuralese,rnn-syn}