Translating Neuralese Jacob Andreas, Anca Dragan, and Dan Klein - - PowerPoint PPT Presentation

translating neuralese
SMART_READER_LITE
LIVE PREVIEW

Translating Neuralese Jacob Andreas, Anca Dragan, and Dan Klein - - PowerPoint PPT Presentation

Translating Neuralese Jacob Andreas, Anca Dragan, and Dan Klein Learning to Communicate [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16] 2 Learning to Communicate 3 Neuralese 1.0 2.3 -0.3 0.4 -1.2 1.1 4 Translating


slide-1
SLIDE 1

Translating Neuralese

Jacob Andreas, Anca Dragan, and Dan Klein

slide-2
SLIDE 2

Learning to Communicate

2 [Wagner et al. 03, Sukhbaatar et al. 16, Foerster et al. 16]

slide-3
SLIDE 3

Learning to Communicate

3

slide-4
SLIDE 4

Neuralese

4

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
slide-5
SLIDE 5

Translating neuralese

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

5

slide-6
SLIDE 6
  • Interoperate with 


autonomous systems

  • Diagnose errors
  • Learn from solutions

Translating neuralese

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

[Lazaridou et al. 16] 6

slide-7
SLIDE 7

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

7

slide-8
SLIDE 8

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

8

slide-9
SLIDE 9

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

9

slide-10
SLIDE 10

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

10 10

slide-11
SLIDE 11

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

11

slide-12
SLIDE 12

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

12

slide-13
SLIDE 13

A statistical MT problem

13

max p( | ) p( )

a a

a

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

all clear

[e.g. Koehn 10]

slide-14
SLIDE 14

A statistical MT problem

14

How do we induce a translation model?

slide-15
SLIDE 15

A statistical MT problem

15

max p( | ) p( )

a a

a

max Σ p( | ) p( | ) p( )

a

a

slide-16
SLIDE 16

Strategy mismatch

16

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

slide-17
SLIDE 17

Strategy mismatch

17

not sure

ζ(s) = 1 Γ(s) ∞ 1 ex − 1xs dx x

slide-18
SLIDE 18

Strategy mismatch

18

not sure dunno

slide-19
SLIDE 19

Strategy mismatch

19

not sure dunno yes

slide-20
SLIDE 20

Strategy mismatch

20

not sure dunno yes yes no yes

slide-21
SLIDE 21

Strategy mismatch

21

not sure yes

Σ p( , | not sure ) p( not sure )

slide-22
SLIDE 22

Stat MT criterion doesn’t capture meaning

22

moving
 (0,3)→(1,4) In the intersection

slide-23
SLIDE 23

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

23

slide-24
SLIDE 24

A “semantic MT” problem

24

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

slide-25
SLIDE 25

A “semantic MT” problem

25

✔ ✔ ✘

I’m going north

The meaning of an utterance is given by its truth conditions

[Davidson 67]

slide-26
SLIDE 26

A “semantic MT” problem

26

(loc (goal blue) north) I’m going north

The meaning of an utterance is given by its truth conditions

slide-27
SLIDE 27

A “semantic MT” problem

27

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

[Beltagy et al. 14]

slide-28
SLIDE 28

A “semantic MT” problem

28

0.4 0.2 0.001

I’m going north

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners

[Frank et al. 09, A & Klein 16]

slide-29
SLIDE 29

Representing meaning

29

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners
slide-30
SLIDE 30

Representing meaning

30

The meaning of an utterance is given by its truth conditions the distribution over states in which it is uttered

  • r equivalently, the belief it induces in listeners

This distribution is well-defined even if the “utterance” is a vector rather than a sequence of tokens.

slide-31
SLIDE 31

Translating with meaning

31

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
slide-32
SLIDE 32

Translating with meaning

32

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

In the intersection

slide-33
SLIDE 33

Translating with meaning

33

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

slide-34
SLIDE 34

Translating with meaning

34

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

p( | ) p( | )

a

slide-35
SLIDE 35

Translating with meaning

35

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

I’m going north

β( ) β( )

a

slide-36
SLIDE 36

Interlingua!

36

source text target text

β( ) β( )

a

slide-37
SLIDE 37

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

37

slide-38
SLIDE 38

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

38

slide-39
SLIDE 39

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

39

slide-40
SLIDE 40

KL( || )

Translation criterion

argmin

β( ) β( )

a

a

40

slide-41
SLIDE 41

Computing representations

argmin

a KL( || )

β( ) β( )

a

41

slide-42
SLIDE 42

Computing representations: sparsity

argmin

a KL( || )

β( ) β( )

a

p( | )

a

p( | )

42

slide-43
SLIDE 43

Computing representations: smoothing

argmin

a KL( || )

β( ) β( )

a

actions & messages agent
 policy

43

slide-44
SLIDE 44

argmin

a KL( || )

β( ) β( )

a

actions & messages agent
 policy agent
 model

44

Computing representations: smoothing

slide-45
SLIDE 45

argmin

a KL( || )

β( ) β( )

a

actions & messages human

45

Computing representations: smoothing

slide-46
SLIDE 46

argmin

a KL( || )

β( ) β( )

a

actions & messages human
 policy human
 model

46

Computing representations: smoothing

slide-47
SLIDE 47

argmin

a KL( || )

β( ) β( )

a

0.10 0.05 0.13 0.08 0.01 0.22

a

47

Computing representations: smoothing

slide-48
SLIDE 48

Computing KL

argmin

a KL( || )

β( ) β( )

a

48

slide-49
SLIDE 49

Computing KL

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = E p( ) q( )

49

p

slide-50
SLIDE 50

Computing KL: sampling

argmin

a KL( || )

β( ) β( )

a

KL(p || q) = Σ p( ) log p( ) q( )

50

i i i i

slide-51
SLIDE 51

Finding translations

argmin

a KL( || )

β( ) β( )

a

51

slide-52
SLIDE 52

Finding translations: brute force

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

52

slide-53
SLIDE 53

argmin

a KL( || )

β( ) β( )

a

going north crossing the intersection I’m done after you 0.5 2.3 0.2 9.7

53

Finding translations: brute force

slide-54
SLIDE 54

KL( || )

Finding translations

argmin

β( ) β( )

a

a

54

slide-55
SLIDE 55

Outline

Natural language & neuralese Statistical machine translation Semantic machine translation Implementation details Evaluation

55

slide-56
SLIDE 56

Referring expression games

56

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

slide-57
SLIDE 57

Evaluation: translator-in-the-loop

57

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

slide-58
SLIDE 58

58

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1
  • range bird

with black face

Evaluation: translator-in-the-loop

slide-59
SLIDE 59

Experiment: color references

59

slide-60
SLIDE 60

Experiment: color references

0.50 1.00 Neuralese → Neuralese English → English* 0.83

60

slide-61
SLIDE 61

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT 0.83

0.72 0.70

61

Experiment: color references

slide-62
SLIDE 62

0.50 1.00 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT

0.72 0.70 0.86 0.73

0.83

62

Experiment: color references

slide-63
SLIDE 63

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

63

Experiment: color references

slide-64
SLIDE 64

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

64

Experiment: color references

slide-65
SLIDE 65

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

65

Experiment: color references

slide-66
SLIDE 66

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

66

Experiment: color references

slide-67
SLIDE 67

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

67

Experiment: color references

slide-68
SLIDE 68

magenta, hot, rose magenta, hot, violet

  • live, puke, pea

pinkish, grey, dull

68

Experiment: color references

slide-69
SLIDE 69

Experiment: image references

50 95 Neuralese → English* English → Neuralese Neuralese → Neuralese English → English* Statistical MT Semantic MT 77

72 70 86 73

69

slide-70
SLIDE 70

large bird, black wings, black crown

large bird, black wings, black crown small brown, light brown, dark brown

70

Experiment: image references

slide-71
SLIDE 71

Experiment: driving game

1.35 1.93 Neuralese ↔ English* Neuralese → Neuralese Statistical MT Semantic MT

1.49 1.54

71

slide-72
SLIDE 72
  • Classical notions of “meaning” apply even to


un-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

Conclusions

72

slide-73
SLIDE 73
  • Classical notions of “meaning” apply even to


non-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

Conclusions

73

slide-74
SLIDE 74
  • Classical notions of “meaning” apply even to


non-language-like things (e.g. RNN states)

  • These meanings can be compactly represented

without logical forms if we have access to world states

  • Communicating policies “say” interpretable things!

Conclusions

74

slide-75
SLIDE 75

What about compositionality?

Jacob Andreas and Dan Klein

75

slide-76
SLIDE 76

1.0 2.3

  • 0.3 0.4
  • 1.2 1.1

Thank you!