Cooperative Learning of Disjoint Syntax and Semantics Serhii - - PowerPoint PPT Presentation

cooperative learning of disjoint syntax and semantics
SMART_READER_LITE
LIVE PREVIEW

Cooperative Learning of Disjoint Syntax and Semantics Serhii - - PowerPoint PPT Presentation

Cooperative Learning of Disjoint Syntax and Semantics Serhii Havrylov Germn Kruszewski Armand Joulin Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees) 2 Is using linguistic structures for Yes, it is!


slide-1
SLIDE 1

Cooperative Learning of Disjoint Syntax and Semantics

Serhii Havrylov Germán Kruszewski Armand Joulin

slide-2
SLIDE 2

2

Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees)

slide-3
SLIDE 3

3

Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees) Yes, it is! Let’s create more treebanks!

slide-4
SLIDE 4

4

Is using linguistic structures for sentence modelling useful? (e.g. syntactic trees) Yes, it is! Let’s create more treebanks! No! Annotations are expensive to make. Parse trees is just a linguists’ social construct. Just stack more layers and you will be fine!

slide-5
SLIDE 5

Recursive neural network

5

slide-6
SLIDE 6

Recursive neural network

6

slide-7
SLIDE 7

Recursive neural network

7

slide-8
SLIDE 8

Recursive neural network

8

slide-9
SLIDE 9

Recursive neural network

9

neutral

slide-10
SLIDE 10

Recursive neural network

10

neutral

slide-11
SLIDE 11

Latent tree learning

11

slide-12
SLIDE 12

Latent tree learning

12

slide-13
SLIDE 13

Latent tree learning

13

slide-14
SLIDE 14

Latent tree learning

14

slide-15
SLIDE 15

Latent tree learning

15

slide-16
SLIDE 16

Latent tree learning

16

  • RL-SPINN: Yogatama et al., 2016
  • Soft-CYK: Maillard et al., 2017
  • Gumbel Tree-LSTM: Choi et al., 2018
slide-17
SLIDE 17

Latent tree learning

17

  • RL-SPINN: Yogatama et al., 2016
  • Soft-CYK: Maillard et al., 2017
  • Gumbel Tree-LSTM: Choi et al., 2018

Recent work has shown that:

  • Trees do not resemble any semantic or syntactic formalisms

(Williams et al. 2018).

slide-18
SLIDE 18

Latent tree learning

18

  • RL-SPINN: Yogatama et al., 2016
  • Soft-CYK: Maillard et al., 2017
  • Gumbel Tree-LSTM: Choi et al., 2018

Recent work has shown that:

  • Trees do not resemble any semantic or syntactic formalism

(Williams et al. 2018).

  • Parsing strategies are not consistent across random restarts

(Williams et al. 2018).

slide-19
SLIDE 19

Latent tree learning

19

  • RL-SPINN: Yogatama et al., 2016
  • Soft-CYK: Maillard et al., 2017
  • Gumbel Tree-LSTM: Choi et al., 2018

Recent work has shown that:

  • Trees do not resemble any semantic or syntactic formalisms

(Williams et al. 2018).

  • Parsing strategies are not consistent across random restarts

(Williams et al. 2018).

  • These models fail to learn the simple context-free grammar

(Nangia et al. 2018).

slide-20
SLIDE 20

ListOps (Nangia, & Bowman (2018))

20

[MIN 1 [MAX [MIN 9 [MAX 1 0 ] 2 9 [MED 8 4 3 ] ] [MIN 7 5 ] 6 9 3 ] ] [MAX 1 4 0 9 ] [MAX 7 1 [MAX 6 8 1 7 ] [MIN 2 6 ] 3 ]

slide-21
SLIDE 21

ListOps (Nangia, & Bowman (2018))

21

[MIN 1 [MAX [MIN 9 [MAX 1 0 ] 2 9 [MED 8 4 3 ] ] [MIN 7 5 ] 6 9 3 ] ] [MAX 1 4 0 9 ] [MAX 7 1 [MAX 6 8 1 7 ] [MIN 2 6 ] 3 ]

9

slide-22
SLIDE 22

ListOps (Nangia, & Bowman (2018))

22

slide-23
SLIDE 23

Tree-LSTM parser (Choi et al., 2018)

23

slide-24
SLIDE 24

24

Tree-LSTM parser (Choi et al., 2018)

slide-25
SLIDE 25

25

Tree-LSTM parser (Choi et al., 2018)

slide-26
SLIDE 26

26

Tree-LSTM parser (Choi et al., 2018)

slide-27
SLIDE 27

27

Tree-LSTM parser (Choi et al., 2018)

slide-28
SLIDE 28

28

Tree-LSTM parser (Choi et al., 2018)

slide-29
SLIDE 29

29

Tree-LSTM parser (Choi et al., 2018)

slide-30
SLIDE 30

30

Tree-LSTM parser (Choi et al., 2018)

slide-31
SLIDE 31

31

Tree-LSTM parser (Choi et al., 2018)

slide-32
SLIDE 32

32

Tree-LSTM parser (Choi et al., 2018)

slide-33
SLIDE 33

33

Tree-LSTM parser (Choi et al., 2018)

slide-34
SLIDE 34

34

Tree-LSTM parser (Choi et al., 2018)

slide-35
SLIDE 35

35

Tree-LSTM parser (Choi et al., 2018)

slide-36
SLIDE 36

Separation of syntax and semantics

Parser Compositional Function

36

slide-37
SLIDE 37

Parsing as a RL problem

Parser Compositional Function

37

slide-38
SLIDE 38

Optimization challenges

38

Size of the search space is

slide-39
SLIDE 39

Optimization challenges

39

Size of the search space is

For a sentence with 20 words, there are 1_767_263_190 possible trees.

slide-40
SLIDE 40

Optimization challenges

40

Syntax and semantic has to be learnt simultaneously

model has to infer from examples that [MIN 0 1] = 0

slide-41
SLIDE 41

Optimization challenges

41

– nonstationary environment (i.e the same sequence of actions can receive different rewards)

Syntax and semantic has to be learnt simultaneously

model has to infer from examples that [MIN 0 1] = 0

slide-42
SLIDE 42

Optimization challenges

42

Typically, the compositional function θ is learned faster than the parser φ.

slide-43
SLIDE 43

Optimization challenges

43

Typically, the compositional function θ is learned faster than the parser φ. This fast coadaptation limits the exploration of the search space to parsing strategies similar to those found at the beginning of the training.

slide-44
SLIDE 44

Optimization challenges

44

  • High variance in the estimate of a parser’s gradient ∇φ

has to be addressed.

  • Learning paces of a parser θ and a compositional function φ

have to be levelled off.

slide-45
SLIDE 45

Variance reduction

45

slide-46
SLIDE 46

Variance reduction

46

reward

slide-47
SLIDE 47

Variance reduction

47

reward Is this a carrot?

slide-48
SLIDE 48

Variance reduction

48

the moving average of recent rewards new reward

slide-49
SLIDE 49

Variance reduction

49

  • [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ]
  • [MAX 1 0 ]
slide-50
SLIDE 50

Variance reduction

50

  • [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ]
  • [MAX 1 0 ]
slide-51
SLIDE 51

Variance reduction

51

  • [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ]
  • [MAX 1 0 ]
slide-52
SLIDE 52

Variance reduction

52

self-critical training (SCT) baseline Rennie et al. (2017)

  • [MIN 1 [MAX [MIN 9 [MIN 1 0 ] 2 [MED 8 4 3 ] ] [MAX 7 5 ] 6 9 ] ]
  • [MAX 1 0 ]
slide-53
SLIDE 53

Synchronizing syntax and semantics learning

Syntax Semantics

53

slide-54
SLIDE 54

Synchronizing syntax and semantics learning

54

slide-55
SLIDE 55

Synchronizing syntax and semantics learning

55

slide-56
SLIDE 56

Synchronizing syntax and semantics learning

56

Proximal Policy Optimization (PPO) of Schulman et al. (2017)

slide-57
SLIDE 57

Optimization challenges

57

  • High variance in the estimate of a parser’s gradient ∇φ is addressed

by using self-critical training (SCT) baseline of Rennie et al. (2017).

  • Learning paces of a parser φ and a compositional function θ is

levelled off by controlling parser’s updates using Proximal Policy Optimization (PPO) of Schulman et al. (2017).

slide-58
SLIDE 58

ListOps results

58

9

slide-59
SLIDE 59

ListOps results

59

9

slide-60
SLIDE 60

ListOps results

60

9

slide-61
SLIDE 61

ListOps results

61

9

slide-62
SLIDE 62

ListOps results

62

9

slide-63
SLIDE 63

Extrapolation

63

slide-64
SLIDE 64

64

Sentiment Analysis (SST-2)

slide-65
SLIDE 65

65

Sentiment Analysis (SST-2)

slide-66
SLIDE 66

66

Natural language inference (MultiNLI)

slide-67
SLIDE 67

67

Method Time complexity Space complexity ListOps RL-SPINN: Yogatama et al., 2016 O(nd2) O(nd2) Soft-CYK: Maillard et al., 2017 O(n3d+n2d2) O(n3d) Gumbel Tree-LSTM: Choi et al., 2018 O(n2d+nd2) O(n2d) Ours O(Knd2) O(nd2)

Time and Space complexities

n – sentence length d – tree-LSTM dimensionality K – number of updates in PPO

slide-68
SLIDE 68

68

Conclusions

  • The separation between syntax and semantics allows

coordination between optimisation schemes for each module.

  • Self-critical training mitigates credit assignment problem by

distinguishing “hard” and “easy” to solve datapoints.

  • The model can recover a simple context-free grammar of

mathematical expressions.

  • The model performs competitively on several real natural

language tasks.

github.com/facebookresearch/latent-treelstm