Improved Relation Extraction with Feature-Rich Compositional Embedding Models
September 21, 2015 EMNLP
1
Mo Yu* Matt Gormley* Mark Dredze
*Co-first authors
Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * - - PowerPoint PPT Presentation
Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 1 FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo
1
*Co-first authors
2
*Co-first authors
3
NNP : VBN NNP VBD PER LOC Egypt - born Proyas directed S NP VP ADJP VP NP egypt
proyas direct
born-in
4
hand-crafted features
Sun et al., 2011 Zhou et al., 2005 First word before M1 Second word before M1 Bag-of-words in M1 Head word of M1 Other word in between First word after M2 Second word after M2 Bag-of-words in M2 Head word of M2 Bigrams in between Words on dependency path Country name list Personal relative triggers Personal title list WordNet Tags Heads of chunks in between Path of phrase labels Combination of entity types
5
hand-crafted features
Sun et al., 2011 Zhou et al., 2005
word embeddings
Mikolov et al., 2013
CBOW model in Mikolov et al. (2013)
input
(context words)
embeddin g missing word
Look-up table Classifier
0.13 .26 …
0.11 .23 …
dog: cat: similar words, similar embeddings unsupervised learning
6
hand-crafted features
Sun et al., 2011 Zhou et al., 2005
word embeddings
Mikolov et al., 2013
string embeddings
Collobert & Weston, 2008 Socher, 2011
Convolutional Neural Networks (Collobert and Weston 2008)
The [movie] showed [wars] pooling
CNN
Recursive Auto Encoder (Socher 2011)
The [movie] showed [wars]
RAE
7
hand-crafted features
Sun et al., 2011 Zhou et al., 2005
word embeddings
Mikolov et al., 2013
tree embeddings
Socher et al., 2013 Hermann & Blunsom, 2013
string embeddings
Collobert & Weston, 2008 Socher, 2011
The [movie] showed [wars] WNP,VP WDT,NN WV,NN
S NP VP
8
word embeddings tree embeddings hand-crafted features string embeddings
Sun et al., 2011 Zhou et al., 2005 Mikolov et al., 2013 Collobert & Weston, 2008 Socher, 2011 Socher et al., 2013 Hermann & Blunsom, 2013 Hermann et al. 2014
word embedding features
Turian et al. 2010 Koo et al. 2008
9
word embeddings tree embeddings word embedding features hand-crafted features
string embeddings
Sun et al., 2011 Zhou et al., 2005 Mikolov et al., 2013 Collobert & Weston, 2008 Socher, 2011 Socher et al., 2013 Turian et al. 2010 Koo et al. 2008 Hermann et al. 2014 Hermann & Blunsom, 2013
10
11
nil noun-
noun- person noun-
verb- percep. verb- comm.
is-between(wi) head-of-M1(wi) head-of-M2(wi) before-M1(wi) before-M2(wi) … 1 … 1 1 … 1 … 1 … 1 1 1 … 1 1 …
f1 f2 f3 f4 f5 f6
12
nil noun-
noun- person noun-
verb- percep. verb- comm.
is-between(wi) head-of-M1(wi) head-of-M2(wi) before-M1(wi) before-M2(wi) … 1 1 1 …
f5
13
nil noun-
noun- person noun-
verb- percep. verb- comm.
is-between(wi) & wi= “depicted” head-of-M1(wi) & wi= “depicted” head-of-M2(wi) & wi= “depicted” before-M1(wi) & wi= “depicted” before-M2(wi) & wi= “depicted” … 1 1 1 …
f5
14
nil noun-
noun- person noun-
verb- percep. verb- comm.
1 1 1 …
f5
is-between(wi) head-of-M1(wi) head-of-M2(wi) before-M1(wi) before-M2(wi) …
.9 .1
edepicted Outer-product
15
nil noun-
noun- person noun-
verb- percep. verb- comm.
1 1 1 …
f5
is-between(wi) head-of-M1(wi) head-of-M2(wi) before-M1(wi) before-M2(wi) …
.9 .1
edepicted
.9 .1
.9 .1
.9 .1
.9 .1
… … … …
16
fi ewi
i=1 n Our full model sums
sentence Then takes the dot- product with a parameter tensor Ty And finally, exponentiates and renormalizes
17
18
p(y|x) f1 e h1 hn ex
e
w1 wn
Binary features Embeddings Parameter tensor
19
Yi,j
NNP : VBN NNP VBD PER LOC Egypt - born Proyasdirected S NP VP ADJP VP NP egypt - born proyas direct born-in
p(y|x) µ exp(Θyf (
)
– type of the left entity mention – dependency path between mentions – bag of words in right mention – …
20
Yi,j Σ
Τ
p(y|x) f1 e h1 hn ex
e
w1 wn
NNP : VBN NNP VBD PER LOC Egypt - born Proyasdirected S NP VP ADJP VP NP egypt - born proyas direct born-in
p(y|x) µ exp(Θyf (
)
– Newswire (nw) – Broadcast Conversation (bc) – Broadcast News (bn) – Telephone Speech (cts) – Usenet Newsgroups (un) – Weblogs (wl)
Dev: ½ of bc Test: ½ of bc, cts, wl`
(given entity mention)
– Newswire (nw) – Broadcast Conversation (bc) – Broadcast News (bn) – Telephone Speech (cts) – Usenet Newsgroups (un) – Weblogs (wl)
Dev: Test:
(given entity boundaries)
21
Standard split from shared task
22
45% 50% 55% 60% 65% Broadcast Conversation Conversational Telephone Speech Weblogs Micro F1 Test Set Baseline FCM Baseline+FCM
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8
70 72 74 76 78 80 82 84 86 Best in SemEval-2010 Shared Task
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0
70 72 74 76 78 80 82 84 86 Best in SemEval-2010 Shared Task
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0
70 72 74 76 78 80 82 84 86
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Xu et al. (2015)
SDP-LSTM 82.4
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8
Xu et al. (2015)
SDP-LSTM (full) 83.7 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0
70 72 74 76 78 80 82 84 86
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Xu et al. (2015)
SDP-LSTM 82.4
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8
Xu et al. (2015)
SDP-LSTM (full) 83.7 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 FCM (log-bilinear) (task-spec-emb) 83.7
70 72 74 76 78 80 82 84 86
Source Classifier F1
Socher et al. (2012) RNN
74.8
Socher et al. (2012)
MVRNN 79.1
Hashimoto et al. (2015) RelEmb
81.8
Rink and Harabagiu (2010)
SVM 82.2
Xu et al. (2015)
SDP-LSTM 82.4
Zeng et al. (2014) CNN
82.7
Santos et al. (2015)
CR-CNN (log-loss) 82.7
Liu et al. (2015)
DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb)
82.8
Xu et al. (2015)
SDP-LSTM (full) 83.7
Santos et al. (2015)
CR-CNN (ranking-loss) 84.1 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 FCM (log-bilinear) (task-spec-emb) 83.7
70 72 74 76 78 80 82 84 86
29