Structure and Interpretation of Neural Codes Jacob Andreas - - PowerPoint PPT Presentation

structure and interpretation of neural codes
SMART_READER_LITE
LIVE PREVIEW

Structure and Interpretation of Neural Codes Jacob Andreas - - PowerPoint PPT Presentation

Structure and Interpretation of Neural Codes Jacob Andreas Translating Neuralese Jacob Andreas, Anca Dragan and Dan Klein Learning to Communicate [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16] 3 3 Learning to Communicate


slide-1
SLIDE 1

Structure and Interpretation


  • f Neural Codes

Jacob Andreas

slide-2
SLIDE 2

Translating Neuralese

Jacob Andreas, Anca Dragan and Dan Klein

slide-3
SLIDE 3

Learning to Communicate

3 3 [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16]

slide-4
SLIDE 4

Learning to Communicate

4 4

slide-5
SLIDE 5

Neuralese

5 5

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
slide-6
SLIDE 6

Translating neuralese

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

6

slide-7
SLIDE 7
  • Interoperate with 


autonomous systems

  • Diagnose errors
  • Learn from solutions

Translating neuralese

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

[Lazaridou et al. 16] 7

slide-8
SLIDE 8

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

8

slide-9
SLIDE 9

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

9

slide-10
SLIDE 10

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

10

slide-11
SLIDE 11

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

11 11

slide-12
SLIDE 12

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

12

slide-13
SLIDE 13

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

13

slide-14
SLIDE 14

A statistical MT problem

14

max p( | ) p( )

a a

a

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

[e.g. Koehn 10]

slide-15
SLIDE 15

A statistical MT problem

15

How do we induce a translation model?

slide-16
SLIDE 16

A statistical MT problem

16

max p( | ) p( )

a a

a

max Σ p( | ) p( | ) p( )

a

a

slide-17
SLIDE 17

Strategy mismatch

17

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

slide-18
SLIDE 18

Strategy mismatch

18

not sure

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

slide-19
SLIDE 19

Strategy mismatch

19

not sure dunno

slide-20
SLIDE 20

Strategy mismatch

20

not sure dunno yes

slide-21
SLIDE 21

Strategy mismatch

21

not sure dunno yes yes no yes

slide-22
SLIDE 22

Strategy mismatch

22

not sure yes

Σ p( , | not sure ) p( not sure )

slide-23
SLIDE 23

Stat MT criterion doesn’t capture meaning

23

moving
 (0,3)→(1,4) In the intersection

slide-24
SLIDE 24

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

24

slide-25
SLIDE 25

A “semantic MT” problem

25

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

slide-26
SLIDE 26

A “semantic MT” problem

26

✔ ✔ ✘

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

slide-27
SLIDE 27

A “semantic MT” problem

27

(loc (goal blue) north) I’m going north

The meaning of an utterance is given by its truth conditions

slide-28
SLIDE 28

A “semantic MT” problem

28

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

[Beltagy et al. 14]

slide-29
SLIDE 29

A “semantic MT” problem

29

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners

[Frank et al. 09, A & Klein 16]

slide-30
SLIDE 30

Representing meaning

30

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners
slide-31
SLIDE 31

Representing meaning

31

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners

This distribution is well-defined even if the “utterance” is a vector rather than a sequence of tokens.

slide-32
SLIDE 32

Translating with meaning

32

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
slide-33
SLIDE 33

Translating with meaning

33

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

In the intersection

slide-34
SLIDE 34

Translating with meaning

34

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

slide-35
SLIDE 35

Translating with meaning

35

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

p( | ) p( | )

a

slide-36
SLIDE 36

Translating with meaning

36

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

β( ) β( )

a

slide-37
SLIDE 37

Interlingua!

37

source text target text

β( ) β( )

a

slide-38
SLIDE 38

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

38

slide-39
SLIDE 39

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

39

slide-40
SLIDE 40

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

40

slide-41
SLIDE 41

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

41

slide-42
SLIDE 42

Computing representations

argmin

a KL( || )

β( ) β( )

a

42

slide-43
SLIDE 43

Computing representations: sparsity

argmin

a KL( || )

β( ) β( )

a

p( | )

a

p( | )

43

slide-44
SLIDE 44

Computing representations: smoothing

argmin

a KL( || )

β( ) β( )

a

actions & messages agent
 policy

44

slide-45
SLIDE 45

argmin

a KL( || )

β( ) β( )

a

actions & messages agent
 policy agent
 model

45

Computing representations: smoothing

slide-46
SLIDE 46

argmin

a KL( || )

β( ) β( )

a

actions & messages human

46

Computing representations: smoothing

slide-47
SLIDE 47

argmin

a KL( || )

β( ) β( )

a

actions & messages human
 policy human
 model

47

Computing representations: smoothing

slide-48
SLIDE 48

argmin

a KL( || )

β( ) β( )

a

0.10 0.05 0.13 0.08 0.01 0.22

a

48

Computing representations: smoothing

slide-49
SLIDE 49

Computing KL

argmin

a KL( || )

β( ) β( )

a

49

slide-50
SLIDE 50

Computing KL

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = E p( ) q( )

50

p

slide-51
SLIDE 51

Computing KL: sampling

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = Σ p( ) log p( ) q( )

51

i i i i

slide-52
SLIDE 52

Finding translations

argmin

a KL( || )

β( ) β( )

a

52

slide-53
SLIDE 53

Finding translations: brute force

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

53

slide-54
SLIDE 54

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

54

Finding translations: brute force

slide-55
SLIDE 55

KL( || )

Finding translations

argmin

β( ) β( )

a

a

55

slide-56
SLIDE 56

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

56

slide-57
SLIDE 57

Referring expression games

57

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

slide-58
SLIDE 58

Evaluation: translator-in-the-loop

58

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

slide-59
SLIDE 59

59

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

Evaluation: translator-in-the-loop

slide-60
SLIDE 60

Experiment: color references

60

slide-61
SLIDE 61

Experiment: color references

0.50 1.00 Neuralese → Neuralese English → English* 0.83

61

slide-62
SLIDE 62

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT 0.83

0.72 0.70

62

Experiment: color references

slide-63
SLIDE 63

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT

0.72 0.70 0.86 0.73

0.83

63

Experiment: color references

slide-64
SLIDE 64

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

64

Experiment: color references

slide-65
SLIDE 65

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

65

Experiment: color references

slide-66
SLIDE 66

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

66

Experiment: color references

slide-67
SLIDE 67

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

67

Experiment: color references

slide-68
SLIDE 68

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

68

Experiment: color references

slide-69
SLIDE 69

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

69

Experiment: color references

slide-70
SLIDE 70

Experiment: image references

50 95 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT 77

57 55 75 60

70

slide-71
SLIDE 71

large bird, black wings, black crown

large bird, black wings, black crown small brown, light brown, dark brown

71

Experiment: image references

slide-72
SLIDE 72

Experiment: driving game

1.35 1.93 Neuralese ↔ English* Neuralese → Neuralese Statistical MT Semantic MT

1.49 1.54

72

slide-73
SLIDE 73

How to translate

  • wn

at goal
 done
 left to top going in intersection
 proceed
 going you first
 following
 going down

slide-74
SLIDE 74
  • Classical notions of “meaning” apply even to


un-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

Conclusions so far

74

slide-75
SLIDE 75
  • Classical notions of “meaning” apply even to


non-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

75

Conclusions so far

slide-76
SLIDE 76
  • Classical notions of “meaning” apply even to


non-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

76

Conclusions so far

slide-77
SLIDE 77

Limitations

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = Σ p( ) log p( ) q( )

77

i i i i

slide-78
SLIDE 78

but what about compositionality?

slide-79
SLIDE 79

Analogs of linguistic structure in deep representations

Jacob Andreas and Dan Klein

slide-80
SLIDE 80

“Flat” semantics

  • wn

at goal
 done going in intersection
 proceed
 going you first
 following

80

slide-81
SLIDE 81

Compositional semantics

81

slide-82
SLIDE 82

Compositional semantics

82

slide-83
SLIDE 83

Compositional semantics

✔ ✔ ✔

83

message

slide-84
SLIDE 84

Compositional semantics

✔ ✔ ✔

everything but the blue shapes

  • range square and non-squares

84 [FitzGerald et al. 2013]

slide-85
SLIDE 85

Compositional semantics

✔ ✔ ✔

lambda x: not(blue(x)) lambda x: or(orange(x), not(square(x))

85 [FitzGerald et al. 2013]

slide-86
SLIDE 86

Compositional semantics

✔ ✔ ✔

86

???

slide-87
SLIDE 87

Model architecture

87

slide-88
SLIDE 88

Model architecture

88

slide-89
SLIDE 89

Model architecture

✔ ✔ ✔

89

slide-90
SLIDE 90

Model architecture

✔ ✔ ✔

90

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
slide-91
SLIDE 91

Computing meaning representations

91

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • n the left
slide-92
SLIDE 92

92

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

everything but squares

Computing meaning representations

slide-93
SLIDE 93

everything but squares

93

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

slide-94
SLIDE 94

everything but squares

94

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

slide-95
SLIDE 95

everything but squares

95

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Computing meaning representations

slide-96
SLIDE 96

96

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

everything but squares

Computing meaning representations

slide-97
SLIDE 97

97

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

lambda x: not(square(x))

Computing meaning representations

slide-98
SLIDE 98

98

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

lambda x: not(square(x))

Computing meaning representations

slide-99
SLIDE 99

q( , ) =

a

KL( || ) β( ) β( )

a

0.10 0.05 0.13 0.08 0.01 0.22

99

a

Translation criterion

slide-100
SLIDE 100

q( , ) =

a

β( ) E[ ] β( )

a a

100

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

=

Translation criterion

slide-101
SLIDE 101

Experiments

“High-level” communicative behavior “Low-level” message structure

101

slide-102
SLIDE 102

Experiments

“High-level” communicative behavior “Low-level” message structure

102

slide-103
SLIDE 103

103

Comparing strategies

slide-104
SLIDE 104

104

everything but squares

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

Comparing strategies

slide-105
SLIDE 105

105

everything but squares

✔ ✔ ✔ ✔ ✔ ✔ ✔

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Comparing strategies

slide-106
SLIDE 106

106

everything but squares

✔ ✔ ✔ ✔ ✔ ✔ ✔

=

?

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Comparing strategies

slide-107
SLIDE 107

107

✔ ✔ ✔ ✔ ✔ ✔

=

?

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔

Theories of model behavior: random

slide-108
SLIDE 108

108

  • 0.1 1.3 


0.5 -0.4 
 0.2 1.0

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

=

?

Theories of model behavior: literal

slide-109
SLIDE 109

109

0.00 0.50 1.00

50 63 27 Literal Human

Evaluation: high-level scene agreement

slide-110
SLIDE 110

110

0.00 0.50 1.00

72 50 92 74 Random Literal Human

Evaluation: high-level object agreement

slide-111
SLIDE 111

Experiments

“High-level” communicative behavior “Low-level” message structure

111

slide-112
SLIDE 112

Collecting translation data

112

all the red shapes blue objects everything but red green squares not green squares

slide-113
SLIDE 113

Collecting translation data

113

λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x))

slide-114
SLIDE 114

Collecting translation data

114

λx.red(x) λx.blu(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1

  • 0.3 0.2 0.1 0.1

1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

slide-115
SLIDE 115

Extracting related pairs

115

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

slide-116
SLIDE 116

Extracting related pairs

116

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1

slide-117
SLIDE 117

Learning compositional operators

117

argmin

2

slide-118
SLIDE 118

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1

slide-119
SLIDE 119

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1

  • 0.2 0.4 -0.3 0.0
slide-120
SLIDE 120

Evaluating learned operators

λx.red(x) λx.¬red(x) λx.grn(x)∧sqr(x) λx.¬(grn(x)∧sqr(x)) 0.1 -0.3 0.5 1.1 1.4 -0.3 -0.5 0.8 0.2 -0.2 0.5 -0.1 0.3 -1.3 -1.5 0.1 λx.f(x) 0.2 -0.2 0.5 -0.1 ???

  • 0.2 0.4 -0.3 0.0
slide-121
SLIDE 121

121

Evaluation: scene agreement for negation

0.00 0.50 1.00

50 81 12

slide-122
SLIDE 122

122

all the toys that are not red every thing that is red

  • nly the blue and

green objects all items that are not blue or green

Input Predicted True

Visualizing negation

slide-123
SLIDE 123

123

0.00 0.50 1.00

50 54 9

Evaluation: scene agreement for disjunction

slide-124
SLIDE 124

124

Input Predicted True

all of the red objects the blue and red items the blue objects the blue and yellow items all the yellow toys all yellow or red items

Visualizing disjunction

slide-125
SLIDE 125
  • We can translate between neuralese and natural lang.

by grounding in distributions over world states

  • Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

  • Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

125

slide-126
SLIDE 126
  • We can translate between neuralese and natural lang.

by grounding in distributions over world states

  • Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

  • Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

126

slide-127
SLIDE 127
  • We can translate between neuralese and natural lang.

by grounding in distributions over world states

  • Under the right conditions, neuralese exhibits

interpretable pragmatics & compositional structure

  • Not just communication games—language might be a

good general-purpose tool for interpreting deep reprs.

Conclusions

127

slide-128
SLIDE 128

Conclusions

128

not sure dunno yes yes no yes

slide-129
SLIDE 129

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

Thank you!

http://github.com/jacobandreas/{neuralese,rnn-syn}