EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional - - PowerPoint PPT Presentation
EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional - - PowerPoint PPT Presentation
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional
Contributions
} Syntactic Graph Convolutional Networks } State-of-the-art semantic role labeling model
} English and Chinese
Sequa makes and repairs jet engines.
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
Sequa makes and repairs jet engines. Sequa makes and repairs jet engines.
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01 A0
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01 A0 A1
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01 A0 A1 A1 A0
Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
} Discover and disambiguate predicates } Identify arguments and label them with their semantic roles
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01 A0 A1 A1 A0 A1
Semantic Role Labeling
} Only the head of an argument is labeled } Sequence labeling task for each predicate } Focus on argument identification and labeling
Sequa makes and repairs jet engines.
make.01 repair.01 engine.01 A0 A1 A1 A0 A1
Related work
} SRL systems that use syntax with simple NN architectures
} [FitzGerald et al., 2015] } [Roth and Lapata, 2016]
} Recent models ignore linguistic bias
} [Zhou and Xu, 2014] } [He et al., 2017] } [Marcheggiani et al., 2017]
Motivations
} Some semantic dependencies are mirrored in the syntactic graph
Sequa makes and repairs jet engines.
creator creation SBJ COORD OBJ CONJ NMOD ROOT
Sequa makes and repairs jet engines.
creator creation entity repaired repairer SBJ COORD OBJ CONJ NMOD ROOT
Motivations
} Some semantic dependencies are mirrored in the syntactic graph } Not all of them – syntax-semantic interface is not trivial
Encoding Sentences with Graph Convolutional Networks
} Graph Convolutional Networks (GCNs) [Kipf and Welling, 2017] } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Graph Convolutional Networks (message passing)
Undirected graph
[Kipf and Welling, 2017]
Graph Convolutional Networks (message passing)
Undirected graph Update of the blue node
[Kipf and Welling, 2017]
Graph Convolutional Networks (message passing)
hi = ReLU @W0hi + X
j∈N (i)
W1hj 1 A
Undirected graph Update of the blue node
[Kipf and Welling, 2017] Self loop Neighborhood
GCNs Pipeline
Hidden layer Hidden layer Input Output
X = H(0) H(1) H(2) Z = H(n) Initial feature representation of nodes Representation informed by nodes’ neighborhood
[Kipf and Welling, 2017]
… …
…
GCNs Pipeline
Hidden layer Hidden layer Input Output
X = H(0) H(1) H(2) Z = H(n)
[Kipf and Welling, 2017]
… …
…
Extend GCNs for syntactic dependency trees
Initial feature representation of nodes Representation informed by nodes’ neighborhood
Encoding Sentences with Graph Convolutional Networks
} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Example
Lane disputed those estimates NMOD SBJ OBJ
Example
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
self
×W (1)
self
×W (1)
self
×W (1)
self
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
Example
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
self
×W (1)
self
×W (1)
self
×W (1)
self
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
×W (1)
subj
×W (1)
nmod
× W (1)
- bj
Example
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
self
×W (1)
self
×W (1)
self
×W (1)
self
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
×W (1)
subj
×W (1)
nmod
× W (1)
- bj
×W (1)
- bj
×W (1)
nmod0
×W (1)
subj0
Example
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
self
×W (1)
self
×W (1)
self
×W (1)
self
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
×W (1)
subj
×W (1)
nmod
× W (1)
- bj
×W (1)
- bj
×W (1)
nmod0
×W (1)
subj0
Example
×W (1)
self
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
s u b j
×W (1)
self
×W (1)
self
×W (1)
self
× W
( 1 )
- b
j
× W (1)
nmod
×W (1)
nmod0
×W (1)
- b
j
×W (1)
subj0
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
Example
×W (1)
self
Lane disputed those estimates NMOD SBJ OBJ ×W (1)
s u b j
×W (1)
self
×W (1)
self
×W (1)
self
× W
( 1 )
- b
j
× W (1)
nmod
×W
( 1 ) n m
- d
× W
( 1 )
- b
j
×W (1)
subj0
ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
×W (2)
self
×W (2)
self
×W (2)
self
×W (2)
self
×W (2)
s u b j
×W (2)
subj0
× W
( 2 )
- b
j
× W
( 2 )
- b
j
×W (2)
nmod
×W (2)
nmod0
Stacking GCNs widens the syntactic neighborhood
Syntactic GCNs
h(k+1)
v
= ReLU @ X
u∈N (v)
W (k)
L(u,v)h(k) u
+ b(k)
L(u,v)
1 A
Syntactic GCNs
h(k+1)
v
= ReLU @ X
u∈N (v)
W (k)
L(u,v)h(k) u
+ b(k)
L(u,v)
1 A
Syntactic neighborhood
Syntactic GCNs
Syntactic neighborhood
h(k+1)
v
= ReLU @ X
u∈N (v)
W (k)
L(u,v)h(k) u
+ b(k)
L(u,v)
1 A
Message
Syntactic GCNs
Syntactic neighborhood
Self-loop is included in N
Messages are direction and label specific
h(k+1)
v
= ReLU @ X
u∈N (v)
W (k)
L(u,v)h(k) u
+ b(k)
L(u,v)
1 A
Message
} Overparametrized: one matrix for each label-direction pair
}
Syntactic GCNs
Syntactic neighborhood
W (k)
L(u,v) = V (k) dir(u,v)
Self-loop is included in N
Messages are direction and label specific
h(k+1)
v
= ReLU @ X
u∈N (v)
W (k)
L(u,v)h(k) u
+ b(k)
L(u,v)
1 A
Message
Edge-wise Gates
} Not all edges are equally important
Edge-wise Gates
} Not all edges are equally important } We should not blindly rely on predicted syntax
Edge-wise Gates
} Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message
Lane disputed those estimates NMOD SBJ OBJ ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
g g g g g g g g g g
Edge-wise Gates
} Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message Gates depend on nodes and edges
Lane disputed those estimates NMOD SBJ OBJ ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)
g g g g g g g g g g
Encoding Sentences with Graph Convolutional Networks
} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Our Model
} Word representation } Bidirectional LSTM encoder } GCN Encoder } Local role classifier
Word Representation
} Pretrained word embeddings } Word embeddings } POS tag embeddings } Predicate lemma embeddings } Predicate flag
Lane disputed those estimates
word representation
BiLSTM Encoder
} Encode each word with its left and right context } Stacked BiLSTM
Lane disputed those estimates
word representation J layers BiLSTM
GCNs Encoder
} Syntactic GCNs after BiLSTM encoder
} Add syntactic information } Skip connections } Longer dependencies are captured
Lane disputed those estimates
word representation J layers BiLSTM
dobj nmod nsubj
K layers GCN
Semantic Role Classifier
Lane disputed those estimates
word representation J layers BiLSTM
dobj nmod nsubj
K layers GCN A1 Classifier
- predicate
representation candidate argument representation } Local log-linear classifier
p(r|ti, tp, l) / exp(Wl,r(ti tp))
Encoding Sentences with Graph Convolutional Networks
} Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Experiments
} Data
} CoNLL-2009 dataset - English and Chinese } F1 evaluation measure
} Model
} Hyperparameters tuned on English development set } State-of-the-art predicate disambiguation models
Ablation Experiments (English Dev set)
82,7
82 83 84
Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)
SRL w/o predicate disambiguation
Ablation Experiments (English Dev set)
82,7 83,0
82 83 84
Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)
SRL w/o predicate disambiguation
Ablation Experiments (English Dev set)
82,7 83,0 83,3
82 83 84
Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)
SRL w/o predicate disambiguation
Ablation Experiments (English Dev set)
82,7 83,0 83,3 82,7
82 83 84
Bi-LSTM (only) Bi-LSTMs + GCNs (K=1), no gates Bi-LSTMs + GCNs (K=1) Bi-LSTMs + GCNs (K=2)
SRL w/o predicate disambiguation
English T est Set
87,3 87,7 87,7 88
86 87 88 89
FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017, CoNLL) (local) Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
English Out of Domain
75,2 76,1 77,7 77,2
74 75 76 77 78
FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017, CoNLL) (local) Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
English T est Set (Ensemble)
87,7 87,9 89,1
86 87 88 89 90
FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)
SRL with predicate disambiguation
English T est Set (Ensemble)
87,7 87,9 89,1
86 87 88 89 90
FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)
SRL with predicate disambiguation
Best-reported score on CoNLL 2009
Chinese T est Set
77,7 78,6 79,4 82,5
76 77 78 79 80 81 82 83
Zhao et al. (2009) (global) Bjö̈rkelund et al. (2009) (global) Roth and Lapata (2016) (global) Ours (Bi-LSTM + GCN) (local)
SRL with predicate disambiguation
Long-range Dependencies (English Dev Set)
Conclusion
} Syntax-aware state-of-the-art model for dependency-based SRL
} English and Chinese
} GCNs for encoding syntactic structures into NN
} Semantics, coreference, discourse
Conclusion
} Funding:
} ERC StG BroadSem 678254 } NWO VIDI 639.022.518 } Amazon Web Services (AWS) grant