EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional - - PowerPoint PPT Presentation

emnlp 2017 copenhagen contributions
SMART_READER_LITE
LIVE PREVIEW

EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional - - PowerPoint PPT Presentation

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional


slide-1
SLIDE 1

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh

EMNLP 2017 Copenhagen

slide-2
SLIDE 2

Contributions

} Syntactic Graph Convolutional Networks } State-of-the-art semantic role labeling model

} English and Chinese

Sequa makes and repairs jet engines.

slide-3
SLIDE 3

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

Sequa makes and repairs jet engines. Sequa makes and repairs jet engines.

slide-4
SLIDE 4

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

} Discover and disambiguate predicates

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01

slide-5
SLIDE 5

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01 A0

slide-6
SLIDE 6

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01 A0 A1

slide-7
SLIDE 7

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01 A0 A1 A1 A0

slide-8
SLIDE 8

Semantic Role Labeling

} Predicting the predicate-argument structure of a sentence

} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01 A0 A1 A1 A0 A1

slide-9
SLIDE 9

Semantic Role Labeling

} Only the head of an argument is labeled } Sequence labeling task for each predicate } Focus on argument identification and labeling

Sequa makes and repairs jet engines.

make.01 repair.01 engine.01 A0 A1 A1 A0 A1

slide-10
SLIDE 10

Related work

} SRL systems that use syntax with simple NN architectures

} [FitzGerald et al., 2015] } [Roth and Lapata, 2016]

} Recent models ignore linguistic bias

} [Zhou and Xu, 2014] } [He et al., 2017] } [Marcheggiani et al., 2017]

slide-11
SLIDE 11

Motivations

} Some semantic dependencies are mirrored in the syntactic graph

Sequa makes and repairs jet engines.

creator creation SBJ COORD OBJ CONJ NMOD ROOT

slide-12
SLIDE 12

Sequa makes and repairs jet engines.

creator creation entity repaired repairer SBJ COORD OBJ CONJ NMOD ROOT

Motivations

} Some semantic dependencies are mirrored in the syntactic graph } Not all of them – syntax-semantic interface is not trivial

slide-13
SLIDE 13

Encoding Sentences with Graph Convolutional Networks

} Graph Convolutional Networks (GCNs) [Kipf and Welling, 2017] } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

slide-14
SLIDE 14

Graph Convolutional Networks (message passing)

Undirected graph

[Kipf and Welling, 2017]

slide-15
SLIDE 15

Graph Convolutional Networks (message passing)

Undirected graph Update of the blue node

[Kipf and Welling, 2017]

slide-16
SLIDE 16

Graph Convolutional Networks (message passing)

hi = ReLU @W0hi + X

j∈N (i)

W1hj 1 A

Undirected graph Update of the blue node

[Kipf and Welling, 2017] Self loop Neighborhood

slide-17
SLIDE 17

GCNs Pipeline

Hidden layer Hidden layer Input Output

X = H(0) H(1) H(2) Z = H(n) Initial feature representation of nodes Representation informed by nodes’ neighborhood

[Kipf and Welling, 2017]

… …

slide-18
SLIDE 18

GCNs Pipeline

Hidden layer Hidden layer Input Output

X = H(0) H(1) H(2) Z = H(n)

[Kipf and Welling, 2017]

… …

Extend GCNs for syntactic dependency trees

Initial feature representation of nodes Representation informed by nodes’ neighborhood

slide-19
SLIDE 19

Encoding Sentences with Graph Convolutional Networks

} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

slide-20
SLIDE 20

Example

Lane disputed those estimates NMOD SBJ OBJ

slide-21
SLIDE 21

Example

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

slide-22
SLIDE 22

Example

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj
slide-23
SLIDE 23

Example

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj

×W (1)

  • bj

×W (1)

nmod0

×W (1)

subj0

slide-24
SLIDE 24

Example

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj

×W (1)

  • bj

×W (1)

nmod0

×W (1)

subj0

slide-25
SLIDE 25

Example

×W (1)

self

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

s u b j

×W (1)

self

×W (1)

self

×W (1)

self

× W

( 1 )

  • b

j

× W (1)

nmod

×W (1)

nmod0

×W (1)

  • b

j

×W (1)

subj0

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

slide-26
SLIDE 26

Example

×W (1)

self

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

s u b j

×W (1)

self

×W (1)

self

×W (1)

self

× W

( 1 )

  • b

j

× W (1)

nmod

×W

( 1 ) n m

  • d

× W

( 1 )

  • b

j

×W (1)

subj0

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (2)

self

×W (2)

self

×W (2)

self

×W (2)

self

×W (2)

s u b j

×W (2)

subj0

× W

( 2 )

  • b

j

× W

( 2 )

  • b

j

×W (2)

nmod

×W (2)

nmod0

Stacking GCNs widens the syntactic neighborhood

slide-27
SLIDE 27

Syntactic GCNs

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

slide-28
SLIDE 28

Syntactic GCNs

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

Syntactic neighborhood

slide-29
SLIDE 29

Syntactic GCNs

Syntactic neighborhood

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

Message

slide-30
SLIDE 30

Syntactic GCNs

Syntactic neighborhood

Self-loop is included in N

Messages are direction and label specific

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

Message

slide-31
SLIDE 31

} Overparametrized: one matrix for each label-direction pair

}

Syntactic GCNs

Syntactic neighborhood

W (k)

L(u,v) = V (k) dir(u,v)

Self-loop is included in N

Messages are direction and label specific

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

Message

slide-32
SLIDE 32

Edge-wise Gates

} Not all edges are equally important

slide-33
SLIDE 33

Edge-wise Gates

} Not all edges are equally important } We should not blindly rely on predicted syntax

slide-34
SLIDE 34

Edge-wise Gates

} Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message

Lane disputed those estimates NMOD SBJ OBJ ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

g g g g g g g g g g

slide-35
SLIDE 35

Edge-wise Gates

} Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message Gates depend on nodes and edges

Lane disputed those estimates NMOD SBJ OBJ ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

g g g g g g g g g g

slide-36
SLIDE 36

Encoding Sentences with Graph Convolutional Networks

} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

slide-37
SLIDE 37

Our Model

} Word representation } Bidirectional LSTM encoder } GCN Encoder } Local role classifier

slide-38
SLIDE 38

Word Representation

} Pretrained word embeddings } Word embeddings } POS tag embeddings } Predicate lemma embeddings } Predicate flag

Lane disputed those estimates

word representation

slide-39
SLIDE 39

BiLSTM Encoder

} Encode each word with its left and right context } Stacked BiLSTM

Lane disputed those estimates

word representation J layers BiLSTM

slide-40
SLIDE 40

GCNs Encoder

} Syntactic GCNs after BiLSTM encoder

} Add syntactic information } Skip connections } Longer dependencies are captured

Lane disputed those estimates

word representation J layers BiLSTM

dobj nmod nsubj

K layers GCN

slide-41
SLIDE 41

Semantic Role Classifier

Lane disputed those estimates

word representation J layers BiLSTM

dobj nmod nsubj

K layers GCN A1 Classifier

  • predicate

representation candidate argument representation } Local log-linear classifier

p(r|ti, tp, l) / exp(Wl,r(ti tp))

slide-42
SLIDE 42

Encoding Sentences with Graph Convolutional Networks

} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

slide-43
SLIDE 43

Experiments

} Data

} CoNLL-2009 dataset - English and Chinese } F1 evaluation measure

} Model

} Hyperparameters tuned on English development set } State-of-the-art predicate disambiguation models

slide-44
SLIDE 44

Ablation Experiments (English Dev set)

82,7

82 83 84

Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)

SRL w/o predicate disambiguation

slide-45
SLIDE 45

Ablation Experiments (English Dev set)

82,7 83,0

82 83 84

Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)

SRL w/o predicate disambiguation

slide-46
SLIDE 46

Ablation Experiments (English Dev set)

82,7 83,0 83,3

82 83 84

Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)

SRL w/o predicate disambiguation

slide-47
SLIDE 47

Ablation Experiments (English Dev set)

82,7 83,0 83,3 82,7

82 83 84

Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)

SRL w/o predicate disambiguation

slide-48
SLIDE 48

English T est Set

87,3 87,7 87,7 88

86 87 88 89

FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017, CoNLL) (local) Ours (Bi-LSTM + GCN) (local)

SRL with predicate disambiguation

slide-49
SLIDE 49

English Out of Domain

75,2 76,1 77,7 77,2

74 75 76 77 78

FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017, CoNLL) (local) Ours (Bi-LSTM + GCN) (local)

SRL with predicate disambiguation

slide-50
SLIDE 50

English T est Set (Ensemble)

87,7 87,9 89,1

86 87 88 89 90

FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)

SRL with predicate disambiguation

slide-51
SLIDE 51

English T est Set (Ensemble)

87,7 87,9 89,1

86 87 88 89 90

FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)

SRL with predicate disambiguation

Best-reported score on CoNLL 2009

slide-52
SLIDE 52

Chinese T est Set

77,7 78,6 79,4 82,5

76 77 78 79 80 81 82 83

Zhao et al. (2009) (global) Bjö̈rkelund et al. (2009) (global) Roth and Lapata (2016) (global) Ours (Bi-LSTM + GCN) (local)

SRL with predicate disambiguation

slide-53
SLIDE 53

Long-range Dependencies (English Dev Set)

slide-54
SLIDE 54

Conclusion

} Syntax-aware state-of-the-art model for dependency-based SRL

} English and Chinese

} GCNs for encoding syntactic structures into NN

} Semantics, coreference, discourse

slide-55
SLIDE 55

Conclusion

} Funding:

} ERC StG BroadSem 678254 } NWO VIDI 639.022.518 } Amazon Web Services (AWS) grant

github.com/diegma/neural-dep-srl

} Syntax-aware state-of-the-art model for dependency-based SRL

} English and Chinese

} GCNs for encoding syntactic structures into NN

} Semantics, coreference, discourse