Neural Methods for Semantic Role Labeling Diego Marcheggiani , - - PowerPoint PPT Presentation

neural methods for semantic role labeling
SMART_READER_LITE
LIVE PREVIEW

Neural Methods for Semantic Role Labeling Diego Marcheggiani , - - PowerPoint PPT Presentation

Semantic Role Labeling Tutorial Part 2 Neural Methods for Semantic Role Labeling Diego Marcheggiani , Michael Roth, Ivan Titov, Benjamin Van Durme University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen Outline: the fall and


slide-1
SLIDE 1

Semantic Role Labeling Tutorial Part 2

Neural Methods for Semantic Role Labeling

Diego Marcheggiani, Michael Roth, Ivan Titov, Benjamin Van Durme

University of Amsterdam University of Edinburgh

EMNLP 2017 Copenhagen

slide-2
SLIDE 2

Outline: the fall and rise of syntax in SRL

} Early SRL methods } Symbolic approaches + Neural networks (syntax-aware models) } Syntax-agnostic neural methods } Syntax-aware neural methods

slide-3
SLIDE 3

Disclaimer

} Recent papers which involve neural networks and SRL } English language } Skip predicate identification and disambiguation methods } Focus on labeling of semantic roles } PropBank [Palmer et al. 2005]

} CoNLL 2005 dataset (span-based SRL) } CoNLL 2009 dataset (dependency-based SRL)

} F1 measure for role labeling and predicate disambiguation

slide-4
SLIDE 4

Outline: the fall and rise of syntax in SRL

} Early SRL methods } Symbolic approaches + Neural networks (syntax-aware models) } Syntax-agnostic neural methods } Syntax-aware neural methods

slide-5
SLIDE 5

General SRL Pipeline

} Given a predicate:

Sequa makes and repairs jet engines

repair.01

slide-6
SLIDE 6

General SRL Pipeline

} Given a predicate:

} Argument identification

Sequa makes and repairs jet engines

repair.01

slide-7
SLIDE 7

General SRL Pipeline

} Given a predicate:

} Argument identification } Role labeling

Sequa makes and repairs jet engines

repair.01 ARG 0 ARG 1 ARG 1 ARG 1

slide-8
SLIDE 8

General SRL Pipeline

} Given a predicate:

} Argument identification } Role labeling } Global and/or constrained inference

Sequa makes and repairs jet engines

repair.01 ARG 0 ARG 1

slide-9
SLIDE 9

Argument identification

} Hand-crafted rules on the full syntactic tree [Xue and Palmer, 2004] } Binary classifier [Pradhan et al., 2005; Toutanova et al., 2008] } Both [Punyakanok et al., 2008]

slide-10
SLIDE 10

Role labeling

} Labeling is performed using a classifier (SVM, logistic regression) } For each argument we get a label distribution } Argmax over roles will result in a local assignment } No guarantee the labeling is well formed

} overlapping arguments, duplicate core roles, etc.

slide-11
SLIDE 11

Inference

} Enforce linguistic and structural constraint (e.g., no overlaps, discontinuous

arguments, reference arguments, …)

} Viterbi decoding (k-best list with constraints) [Täckström et al., 2015] } Dynamic programming [Täckström et al., 2015; Toutanova et al., 2008] } Integer linear programming [Punyakanok et al., 2008] } Re-ranking [Toutanova et al., 2008; Bjö̈rkelund et al., 2009]

slide-12
SLIDE 12

Early symbolic models

} 3 steps pipeline } Massive feature engineering

} argument identification } role labeling } re-ranking

} Most of the features are syntactic [Gildea and Jurafsky, 2002]

slide-13
SLIDE 13

Outline: the fall and rise of syntax in SRL

} Early SRL framework

} Symbolic approaches + Neural networks (syntax-aware models)

} Syntax-agnostic neural methods } Syntax-Aware neural methods

slide-14
SLIDE 14

Fitzgerald et al., 2015

} Rule based argument identification

} as in [Xue and Palmer, 2004] but for dependency parsing

} Neural network for local role labeling } Global structural inference based on dynamic programming

} [Täckström et al., 2015]

slide-15
SLIDE 15

Candidate argument features

es

Fitzgerald et al., 2015: Architecture

Embedding layer Hidden layer

slide-16
SLIDE 16

Candidate argument features

vs es

Fitzgerald et al., 2015: Architecture

Embedding layer Hidden layer

slide-17
SLIDE 17

Candidate argument features

er vs es ef

Fitzgerald et al., 2015: Architecture

Embedding layer Hidden layer Predicate embedding Role embedding

slide-18
SLIDE 18

Candidate argument features

er vs es vf,r ef

Fitzgerald et al., 2015: Architecture

Embedding layer Hidden layer Nonlinear transform Predicate embedding Role embedding Predicate-specific role representation

slide-19
SLIDE 19

Candidate argument features Predicate-specific role representation

er vs es vf,r ef

Fitzgerald et al., 2015: Architecture

Embedding layer Hidden layer Compatibility score Nonlinear transform

gNN(s, r, θ)

Dot product Predicate embedding Role embedding

slide-20
SLIDE 20

Fitzgerald et al., 2015: Span-based SRL results

79,9 79,7 77,2 79,4

74 75 76 77 78 79 80 81

Täckström et al. (2015) (global) T

  • utanova et al. (2008) (global)

Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global)

CoNLL 2005 test

slide-21
SLIDE 21

Fitzgerald et al., 2015: Span-based SRL results

71,3 67,8 67,7 71,2

65 66 67 68 69 70 71 72

Täckström et al. (2015) (global) T

  • utanova et al. (2008) (global)

Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global)

CoNLL 2005 out of domain

slide-22
SLIDE 22

Fitzgerald et al., 2015: Dependency-based SRL results

86,6 86,9 87,3 87,3

86 87 88

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Täckström et al. (2015) (global) FitzGerald et al. (2015) (global)

CoNLL 2009 test

slide-23
SLIDE 23

Fitzgerald et al., 2015: Dependency-based SRL results

75,6 75,7 75,9 75,2

74 75 76 77

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Roth and Woodsend (2014) (global) FitzGerald et al. (2015) (global)

CoNLL 2009 out of domain

slide-24
SLIDE 24

Fitzgerald et al., 2015

} Predicate-role composition

} Predicate-specific role representation } Learning distributed predicate representation across different formalisms } State of the art on FrameNet dataset

} Feature embeddings

} Use “simple” span features } Let the network figure out how to compose them } Reduced feature engineering

slide-25
SLIDE 25

Roth and Lapata, 2016

} Dependency-based SRL } Neural network with dependency path embeddings as local classifier

} Argument identification } Role labeling

} Global re-ranking of k-best local assignments

slide-26
SLIDE 26

Roth and Lapata, 2016: Dependency path embeddings

} Syntactic paths between predicates and arguments are an important feature } It may be extremely sparse } Creating a distributed representation can solve the problem } Use LSTM [Hochreiter and Schmidhuber, 1995] to encode paths

slide-27
SLIDE 27

Roth and Lapata, 2016: Example

Sequa makes and repairs jet engines.

repair.01 A1 A0 SBJ COORD OBJ CONJ NMOD ROOT

repairs CONJ and COORD makes SUBJ Sequa

slide-28
SLIDE 28

Roth and Lapata, 2016: Dependency path embeddings example

Embedding Layer LSTM over dependency path

repairs CONJ and COORD makes SUBJ Sequa

slide-29
SLIDE 29

Candidate argument features Predicate

xpos

1

xw

1

xrel

1

xpos

2

xw

n

Roth and Lapata, 2016: Architecture

Non linear layer Embedding Layer Softmax Layer Candidate argument

slide-30
SLIDE 30

Roth and Lapata, 2016: Dependency-based SRL results

86,6 86,9 87,3 87,3 87,7

86 87 88

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Täckström et al. (2015) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global)

CoNLL 2009 test

slide-31
SLIDE 31

Roth and Lapata, 2016: Dependency-based SRL results

75,6 75,7 75,9 75,2 76,1

74 75 76 77

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Roth and Woodsend (2014) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global)

CoNLL 2009 out of domain

slide-32
SLIDE 32

Roth and Lapata, 2016: Analysis

slide-33
SLIDE 33

Roth and Lapata, 2016

} Encode syntactic paths with LSTMs

} Overcome sparsity

} Combination of symbolic features and continuous syntactic paths

slide-34
SLIDE 34

Outline: the fall and rise of syntax in SRL

} Early SRL framework } Symbolic approaches + Neural networks

} Syntax-agnostic neural methods (the fall)

} Syntax-aware neural methods

slide-35
SLIDE 35

Syntax-agnostic neural methods

} SRL as a sequence labeling task

Sequa makes and repairs jet engines

repair.01 ARG 0 ARG 1

slide-36
SLIDE 36

Syntax-agnostic neural methods

} SRL as a sequence labeling task

} Argument identification and role labeling in one step

Sequa makes and repairs jet engines

repair.01 ARG 0 ARG 1

B-A0 O O O B-A1 I-A1

slide-37
SLIDE 37

Syntax-agnostic neural methods

} General architecture

} Word encoding } Sentence encoding (via LSTM) } Decoding

} No use of any kind of treebank syntax (not trivial to encode it) } Differentiable end-to-end

} [Collobert et al., (2011)]

slide-38
SLIDE 38

Zhou and Xu, 2015: Word encoding

} Pretrained word embedding

Lane disputed those estimates

word representation

slide-39
SLIDE 39

Zhou and Xu, 2015: Word encoding

} Pretrained word embedding } Distance from the predicate

Lane disputed those estimates

word representation

slide-40
SLIDE 40

Zhou and Xu, 2015: Word encoding

} Pretrained word embedding } Distance from the predicate } Predicate context (for disambiguation)

Lane disputed those estimates

word representation

slide-41
SLIDE 41

Zhou and Xu, 2015: Word encoding

} Pretrained word embedding } Distance from the predicate } Predicate context (for disambiguation) } Predicate region mark

Lane disputed those estimates

word representation

slide-42
SLIDE 42

} Bidirectional LSTM

} Forward (left context)

Lane disputed those estimates

word representation K layers BiLSTM

Zhou and Xu, 2015: Sentence encoding

slide-43
SLIDE 43

Zhou and Xu, 2015: Sentence encoding

} Bidirectional LSTM

} Forward (left context) } Backward (right context)

Lane disputed those estimates

word representation K layers BiLSTM

slide-44
SLIDE 44

Zhou and Xu, 2015: Sentence encoding

} Bidirectional LSTM

} Forward (left context) } Backward (right context) } Snake BiLSTM

Lane disputed those estimates

word representation K layers BiLSTM

slide-45
SLIDE 45

Zhou and Xu, 2015: Decoder

} Conditional Random Field

} [Lafferty et al., 2001] } Markov assumption between role labels

Lane disputed those estimates

word representation K layers BiLSTM CRF Classifier A1

slide-46
SLIDE 46

Zhou and Xu, 2015: Results

79,9 79,7 77,2 79,4 82,8

74 75 76 77 78 79 80 81 82 83 84

Täckström et al. (2015) (global) T

  • utanova et al. (2008)

(global) Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global) Zhou and Xu (2015) (CRF)

CoNLL 2005 test

slide-47
SLIDE 47

Zhou and Xu, 2015: Results

71,3 67,8 67,7 71,2 69,4

65 66 67 68 69 70 71 72

Täckström et al. (2015) (global) T

  • utanova et al. (2008)

(global) Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global) Zhou and Xu (2015) (CRF)

CoNLL 2005 out of domain

slide-48
SLIDE 48

Zhou and Xu, 2015: Analysis

slide-49
SLIDE 49

Zhou and Xu, 2015

} No syntax } Minimal word representation } Sentence encoding with “Snake” BiLSTM

slide-50
SLIDE 50

He et al., 2017: Word encoding

} Pretrained word embedding } Predicate flag

Lane disputed those estimates

word representation

Lane disputed those estimates

word representation

slide-51
SLIDE 51

He et al., 2017: Sentence encoding

} “Snake” Bi-LSTM } Highway connections [Srivastava et al., 2015] } Recurrent dropout [Gal and Ghahramani, 2016]

Lane disputed those estimates

word representation

Lane disputed those estimates

word representation

slide-52
SLIDE 52

He et al., 2017: Highway connections [Srivastava et al., 2015]

Lane disputed those estimates

word representation 4 layers highway BiLSTM

slide-53
SLIDE 53

He et al., 2017: Highway connections [Srivastava et al., 2015]

Transform gate

rl,t = σ(W l(hl,t−1 hl−1,t))

slide-54
SLIDE 54

He et al., 2017: Highway connections [Srivastava et al., 2015]

Current hidden state Gated hidden state

rl,t = σ(W l(hl,t−1 hl−1,t)) hl,t = rl,t h0

l,t + (1 rl,t) V hl1,t

Previous layer hidden state Transform gate

slide-55
SLIDE 55

He et al., 2017: Recurrent dropout [Gal and Ghahramani, 2016]

Lane disputed those estimates

word representation 4 layers highway BiLSTM

Gated hidden state Random binary mask

˜ hl,t = zl hl,t

slide-56
SLIDE 56

He et al., 2017: Recurrent dropout [Gal and Ghahramani, 2016]

Gated hidden state Random binary mask

Lane disputed those estimates

word representation 4 layers highway BiLSTM

˜ hl,t = zl hl,t

slide-57
SLIDE 57

He et al., 2017: Decoding

} A* decoding algorithm

} BIO constraint } Continuation constraint } Uniqueness core roles } Reference constraint } Syntactic constraint

Lane disputed those estimates

word representation K layers highway BiLSTM Constrained A* Decoding A1

Lane disputed those estimates

word representation K layers highway BiLSTM Constrained A* Decoding A1

slide-58
SLIDE 58

He et al., 2017: Results

79,9 79,7 77,2 79,4 82,8 83,1

74 75 76 77 78 79 80 81 82 83 84

Täckström et al. (2015) (global) T

  • utanova et al. (2008)

(global) Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global) Zhou and Xu (2015) (CRF) He et al. (2017) (global)

CoNLL 2005 test

slide-59
SLIDE 59

He et al., 2017: Results

71,3 67,8 67,7 71,2 69,4 72,1

65 66 67 68 69 70 71 72 73

Täckström et al. (2015) (global) T

  • utanova et al. (2008)

(global) Surdenau et al. (2007) (global) FitzGerald et al. (2015) (global) Zhou and Xu (2015) (CRF) He et al. (2017) (global)

CoNLL 2005 out of domain

slide-60
SLIDE 60

He et al., 2017: Analysis syntactic constraints

slide-61
SLIDE 61

He et al., 2017

} No syntax } Super minimal word representation } Exploit at best the representational power of NN

} Highway networks } Recurrent dropout

slide-62
SLIDE 62

Marcheggiani et al., 2017

} Dependency-based SRL } Shallow syntactic information (POS tags) } Intuitions from syntactic dependency parsing } Local classifier

slide-63
SLIDE 63

Marcheggiani et al., 2017: Word encoding

} Pretrained word embedding } Randomly initialized embedding } Randomly initialized embedding of POS tags } Embeddings of the predicate lemmas } Predicate flag

Lane disputed those estimates

word representation 0 1 0 0

slide-64
SLIDE 64

Marcheggiani et al., 2017: Sentence encoding

} Standard (non-snake) BI-LSTM

} Forward LSTM encode left context } Backward LSTM encode right context } Forw. and Backw. states are concatenated

Lane disputed those estimates

word representation K layers BiLSTM 0 1 0 0

slide-65
SLIDE 65

Marcheggiani et al., 2017: Decoding

Concatenation of argument and predicate states [Kiperwasser and Goldberg, 2016]

Lane disputed those estimates

word representation K layers BiLSTM 0 1 0 0 A1 Local classifier

  • p(r|ti, tp, l) / exp(Wl,r(ti tp))
slide-66
SLIDE 66

Marcheggiani et al., 2017: Decoding

Concatenation of argument and predicate states [Kiperwasser and Goldberg, 2016] Predicate lemma embedding Role embedding Fitzgerald et al. 2015

Lane disputed those estimates

word representation K layers BiLSTM 0 1 0 0 A1 Local classifier

  • p(r|ti, tp, l) / exp(Wl,r(ti tp))

Wl,r = ReLU(U(ql qr))

slide-67
SLIDE 67

Marcheggiani et al., 2017: Results

86,6 86,9 87,3 87,3 87,7 87,7

86 87 88

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Täckström et al. (2015) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017) (local)

CoNLL 2009 test

slide-68
SLIDE 68

Marcheggiani et al., 2017: Results

75,6 75,7 75,9 75,2 76,1 77,7

74 75 76 77 78

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Roth and Woodsend (2014) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017) (local)

CoNLL 2009 out of domain

slide-69
SLIDE 69

Marcheggiani et al., 2017: Ablation study

86,6 85,9

82 83 84 85 86 87

Full model w/o POS tags

CoNLL 2009 development

slide-70
SLIDE 70

Marcheggiani et al., 2017

} Little bit of syntax (POS tags) } More sophisticated word representation } Fast local classifier conditioned on predicate representation

slide-71
SLIDE 71

Outline: the fall and rise of syntax in SRL

} Early SRL framework } Symbolic approaches + Neural networks } Syntax-agnostic neural methods

} Syntax-aware neural methods (syntax strikes back!)

slide-72
SLIDE 72

Is syntax important for semantics?

} POS tags are beneficial [Marcheggiani et al., 2017] } Gold syntax is beneficial (but hard to encode) [He at al., 2017] } Encoding syntax with Graph Convolutional Networks

} [Marcheggiani and Titov, 2017]

slide-73
SLIDE 73

Marcheggiani and Titov, 2017

} Word encoding [Marcheggiani et. al, 2017] } Sentence encoding with BiLSTM [Marcheggiani et. al, 2017] } Syntax encoding with Graph Convolutional Networks (GCN)

} [Kipf and Welling, 2016] } Each word is enriched with the representation of its syntactic neighborhood

} Local classifier [Marcheggiani et. al, 2017]

slide-74
SLIDE 74

Marcheggiani and Titov, 2017: Syntactic GCN example

Lane disputed those estimates NMOD SBJ OBJ

slide-75
SLIDE 75

Marcheggiani and Titov, 2017: Syntactic GCN example

Lane disputed those estimates NMOD SBJ OBJ Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

slide-76
SLIDE 76

Marcheggiani and Titov, 2017: Syntactic GCN example

Lane disputed those estimates NMOD SBJ OBJ Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj
slide-77
SLIDE 77

Marcheggiani and Titov, 2017: Syntactic GCN example

Lane disputed those estimates NMOD SBJ OBJ Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj

×W (1)

  • bj

×W (1)

nmod0

×W (1)

subj0

slide-78
SLIDE 78

Marcheggiani and Titov, 2017: Syntactic GCN example

Lane disputed those estimates NMOD SBJ OBJ Lane disputed those estimates NMOD SBJ OBJ ×W (1)

self

×W (1)

self

×W (1)

self

×W (1)

self

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (1)

subj

×W (1)

nmod

× W (1)

  • bj

×W (1)

  • bj

×W (1)

nmod0

×W (1)

subj0

slide-79
SLIDE 79

Marcheggiani and Titov, 2017: Syntactic GCN example

×W (1)

self

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

s u b j

×W (1)

self

×W (1)

self

×W (1)

self

× W

( 1 )

  • b

j

× W (1)

nmod

×W (1)

nmod0

×W (1)

  • b

j

×W (1)

subj0

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

slide-80
SLIDE 80

Marcheggiani and Titov, 2017: Syntactic GCN example

Stacking GCNs widens the syntactic neighborhood

×W (1)

self

Lane disputed those estimates NMOD SBJ OBJ ×W (1)

s u b j

×W (1)

self

×W (1)

self

×W (1)

self

× W

( 1 )

  • b

j

× W (1)

nmod

×W

( 1 ) n m

  • d

× W

( 1 )

  • b

j

×W (1)

subj0

ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·) ReLU(Σ·)

×W (2)

self

×W (2)

self

×W (2)

self

×W (2)

self

×W (2)

s u b j

×W (2)

subj0

× W

( 2 )

  • b

j

× W

( 2 )

  • b

j

×W (2)

nmod

×W (2)

nmod0

slide-81
SLIDE 81

Marcheggiani and Titov, 2017: Syntactic GCN

Sum over the syntactic neighborhood Each node is transformed according to label and direction

h(k+1)

v

= ReLU @ X

u∈N (v)

W (k)

L(u,v)h(k) u

+ b(k)

L(u,v)

1 A

slide-82
SLIDE 82

Marcheggiani and Titov, 2017: Architecture

} Same architecture of [Marcheggiani et al., 2017] } Syntactic GCN after BiLSTM encoder

} Skip connections } Longer dependencies are captured

Lane disputed those estimates

word representation J layers BiLSTM

dobj nmod nsubj

K layers GCN A1 Classifier

slide-83
SLIDE 83

Marcheggiani and Titov, 2017: Results

86,6 86,9 87,3 87,3 87,7 87,7 88

86 87 88 89

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Täckström et al. (2015) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017) (local) Marcheggiani and Titov (2017) (local)

CoNLL 2009 test

slide-84
SLIDE 84

Marcheggiani and Titov, 2017: Results

75,6 75,7 75,9 75,2 76,1 77,7 77,2

74 75 76 77 78

Lei et al. (2016) (local) Bjö̈rkelund et al. (2010) (global) Roth and Woodensen (2014) (global) FitzGerald et al. (2015) (global) Roth and Lapata (2016) (global) Marcheggiani et al. (2017) (local) Marcheggiani and Titov (2017) (local)

CoNLL 2009 out of domain

slide-85
SLIDE 85

Marcheggiani and Titov, 2017: Analysis

82,7 83,3 86,4

81 82 83 84 85 86 87 No syntax Syntactic GCN (predicted) Syntactic GCN (gold)

CoNLL 2009 development

slide-86
SLIDE 86

Marcheggiani and Titov, 2017

} Encoding structured prior linguistic knowledge in NN

} Syntax } Semantics } Coreference } Discourse

} Complement LSTM with skip connections for long dependencies

slide-87
SLIDE 87

Conclusion

} We can live without syntax (out of domain)

slide-88
SLIDE 88

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

slide-89
SLIDE 89

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

slide-90
SLIDE 90

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

} What’s the (present) future?

slide-91
SLIDE 91

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

} What’s the (present) future?

} Multi-task learning } Swayamdipta et al. (2017) frame-semantic parsing + syntax } Peng et al. (2017) multi-task on different semantic formalisms

slide-92
SLIDE 92

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

} What’s the (present) future?

} Multi-task learning } Swayamdipta et al. (2017) frame-semantic parsing + syntax } Peng et al. (2017) multi-task on different semantic formalisms

} Neural networks work (I kid you not) …

slide-93
SLIDE 93

Conclusion

} We can live without syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

} What’s the (present) future?

} Multi-task learning } Swayamdipta et al. (2017) frame-semantic parsing + syntax } Peng et al. (2017) multi-task on different semantic formalisms

} Neural networks work (I kid you not) … } … but we do have (a lot of) linguistic prior knowledge…

slide-94
SLIDE 94

Conclusion

} We can live without (treebank) syntax (out of domain) } But life with syntax is better

} and the better the syntax (parsers) the better our semantic role labeler

} What’s the (present) future?

} Multi-task learning } Swayamdipta et al. (2017) frame-semantic parsing + syntax } Peng et al. (2017) multi-task on different semantic formalisms

} Neural networks work (I kid you not) … } … but we do have (a lot of) linguistic prior knowledge… } … and it is time to use it again.

slide-95
SLIDE 95

References

} Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank:

An annotated corpus of semantic roles. Computational linguistics, 31(1):71–106.

} Nianwen Xue and Martha Palmer. 2004. Calibrating features for semantic role

  • labeling. In Proceedings of EMNLP.

} Sameer Pradhan, Kadri Hacioglu, Valerie Krugler, Wayne Ward, James H Martin,

and Daniel Jurafsky. 2005. Support vector learning for semantic argument

  • classification. Machine Learning, 60(1-3):11– 39.

} Kristina Toutanova, Aria Haghighi, and Christopher D Manning. 2008. A global

joint model for semantic role labeling. Computational Linguistics, 34(2):161–191.

} Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of

syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2):257–287.

slide-96
SLIDE 96

References

} Oscar Täckström, Kuzman Ganchev, and Dipanjan Das. 2015. Efficient

inference and structured learning for semantic role labeling. Transactions of the Association for Computational Linguistics, 3:29–41.

} Anders Björkelund, Bernd Bohnet, Love Hafdell, and Pierre Nugues. 2010. A

high-performance syntactic and semantic dependency parser. In Proceedings of ICCL: Demonstrations.

} Anders Björkelund, Love Hafdell, and Pierre Nugues. 2009. Multilingual

semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning.

} Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles.

Computational Linguistics, 28(3):245–288.

} Nicholas FitzGerald, Oscar Täckström, Kuzman Ganchev, and Dipanjan Das.

  • 2015. Semantic role labeling with neural network factors. In Proceedings

EMNLP.

slide-97
SLIDE 97

References

} Michael Roth and Mirella Lapata. 2016. Neural semantic role labeling with

dependency path embeddings. In Proceedings of ACL.

} Tao Lei, Yuan Zhang, Lluís Márquez, Alessandro Moschitti, and Regina Barzilay.

  • 2015. High-order low-rank tensors for semantic role labeling. In Proceedings of

NAACL.

} Mihai Surdeanu, Lluís Màrquez, Xavier Carreras, and Pere Comas. 2007.

Combination strategies for semantic role labeling. Journal of Artificial Intelligence Research, 29:105–151.

} Ronan Collobert, Jason Weston, Le ́on Bottou, Michael Karlen, Koray

Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Re- search, 12:2493–2537.

} Michael Roth and Kristian

  • Woodsend. 2014. Composition of word

representations improves semantic role labelling. In Proceedings of EMNLP.

slide-98
SLIDE 98

References

} Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using

recurrent neural networks. In Proceedings of ACL.

} John Lafferty,

Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML.

} Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep

Semantic Role Labeling: What Works and What’s Next. In Proceedings of ACL.

} Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application

  • f dropout in recurrent neural networks. In Proceedings of NIPS.

} Rupesh K Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training very

deep networks. In Proceedings of NIPS.

slide-99
SLIDE 99

References

} Diego Marcheggiani, Anton Frolov, and Ivan Titov. 2017. A simple and accurate

syntax-agnostic neural model for dependency-based semantic role labeling. In Proceedings of CoNLL.

} Eliyahu Kiperwasser and

Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. In Transactions of the Association for Computational Linguistics.

} Thomas N. Kipf and Max Welling. 2017. Semi- supervised classification with

graph convolutional networks. In Proceedings of ICLR.

} Diego Marcheggiani, and Ivan Titov. 2017. Encoding Sentences with Graph

Convolutional Networks for Semantic Role Labeling. In Proceedings of EMNLP.

} Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A. Smith. 2017.

Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic

  • Scaffold. arXiv preprint.
slide-100
SLIDE 100

References

} Hao Peng, Sam Thomson, and Noah A. Smith. 2017. Deep multitask learning for

semantic dependency parsing. In Proceedings of ACL.