Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * - PowerPoint PPT Presentation

Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 1

FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 2

Handcrafted Features born-in LOC PER S p(y|x) ∝ exp( Θ y  f ( ) ) NP VP ADJP NP VP NNP : VBN NNP VBD egypt - born proyas direct Egypt - born Proyas directed 3

Where do features come from? First word before M1 Second word before M1 hand-crafted Feature Engineering Bag-of-words in M1 features Head word of M1 Other word in between First word after M2 Sun et al., 2011 Second word after M2 Bag-of-words in M2 Head word of M2 Bigrams in between Words on dependency path Country name list Personal relative triggers Personal title list Zhou et al., WordNet Tags 2005 Heads of chunks in between Path of phrase labels Combination of entity types Feature Learning 4

Where do features come from? Look-up table Classifier embeddin input missing word (context words) g hand-crafted Feature Engineering features unsupervised learning Sun et al., 2011 cat: 0.11 .23 … -.45 similar words, similar embeddings dog: 0.13 .26 … -.52 CBOW model in Mikolov et al. (2013) Zhou et al., 2005 word embeddings Mikolov et al., 2013 Feature Learning 5

Where do features come from? pooling hand-crafted Feature Engineering features The [movie] showed [wars] Sun et al., 2011 The [movie] showed [wars] Recursive Auto Encoder Convolutional Neural Networks (Socher 2011) (Collobert and Weston 2008) CNN RAE Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 6

Where do features come from? hand-crafted S Feature Engineering features W NP,VP NP VP Sun et al., 2011 W DT,NN W V,NN tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 The [movie] showed [wars] Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 7

Where do features come from? word embedding hand-crafted features Feature Engineering features Turian et al. 2010 Hermann et al. Koo et al. Sun et al., 2011 2014 2008 tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 8

Where do features come from? word embedding Our model hand-crafted features (FCM) Feature Engineering features Turian et al. 2010 Hermann et al. Koo et al. Sun et al., 2011 2014 2008 tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 9

Feature-rich Compositional Embedding Model (FCM) Goals for our Model: 1. Incorporate semantic/syntactic structural information 2. Incorporate word meaning 3. Bridge the gap between feature engineering and feature learning – but remain as simple as possible 10

Feature-rich Compositional Embedding Model (FCM) Per-word Features: f 1 f 2 f 3 f 4 f 5 f 6 0 1 1 0 0 1 on-path(w i ) 0 0 1 1 1 0 is-between(w i ) 0 1 0 0 0 0 head-of-M1(w i ) 0 0 0 0 0 1 head-of-M2(w i ) 1 0 0 0 0 before-M1(w i ) 0 0 0 0 1 0 before-M2(w i ) 0 … … … … … … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 11

Feature-rich Compositional Embedding Model (FCM) Per-word Features: f 5 1 on-path(w i ) 1 is-between(w i ) 0 head-of-M1(w i ) 0 head-of-M2(w i ) 0 before-M1(w i ) 1 before-M2(w i ) … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 12

Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with conjunction) f 5 1 on-path(w i ) & w i = “depicted” 1 is-between(w i ) & w i = “depicted” 0 head-of-M1(w i ) & w i = “depicted” 0 head-of-M2(w i ) & w i = “depicted” 0 before-M1(w i ) & w i = “depicted” 1 before-M2(w i ) & w i = “depicted” … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 13

Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f 5 1 on-path(w i ) Outer-product is-between(w i ) 1 head-of-M1(w i ) 0 head-of-M2(w i ) 0 before-M1(w i ) 0 before-M2(w i ) 1 … … -.3 .9 .1 -1 e depicted noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 14

Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f 5 1 on-path(w i ) -.3 .9 .1 -1 is-between(w i ) 1 -.3 .9 .1 -1 head-of-M1(w i ) 0 -.3 .9 .1 -1 head-of-M2(w i ) 0 0 0 0 0 before-M1(w i ) 0 0 0 0 0 before-M2(w i ) 1 -.3 .9 .1 -1 … … … … … … -.3 .9 .1 -1 e depicted noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 15

Feature-rich Compositional Embedding Model (FCM) T y f i Σ n p(y|x) ∝ exp i=1 And finally, exponentiates and renormalizes Then takes the dot- e w i product with a parameter tensor Our full model sums over each word in the sentence 16

Features for FCM • Let M1 and M2 denote the left and right entity mentions • Our per-word Binary Features:  head of M1  head of M2  in-between M1 and M2  -2, -1, +1, or +2 of M1  -2, -1, +1, or +2 of M2  on dependency path between M1 and M2 • Optionally: Add the entity type of M1, M2, or both 17

FCM as a Neural Network • Embeddings are (optionally) treated as model parameters • A log-bilinear model • We initialize, then fine-tune the embeddings p(y|x)� ฀฀฀ � Parameter Τ � tensor ฀฀฀ � e x� ฀฀฀ � Σ � h 1� ฀฀฀ � h n� ฀฀฀ � ฀฀฀ � � f 1� f n� e � e � w n� w 1� Binary Embeddings features 18

Baseline Model born-in� LOC� PER� p(y|x) µ S exp( Θ y ฀ f ( ) NP VP ) ADJP� NP� VP� NNP� :� VBN� NNP� VBD� egypt� -� born� proyas� direct� Egypt� -� born� Proyas�directed� Y i,j • Multinomial logistic regression ( standard approach ) • Bring in all the usual binary NLP features (Sun et al., 2011) – type of the left entity mention – dependency path between mentions – bag of words in right mention – … 19

Hybrid Model: Baseline + FCM born-in� LOC� PER� p(y|x) µ S exp( Θ y ฀ f ( ) NP VP ) ADJP� NP� VP� NNP� :� VBN� NNP� VBD� egypt� -� born� proyas� direct� Egypt� -� born� Proyas�directed� Y i,j p(y|x)� ฀฀฀ � Τ � ฀฀฀ � e x� ฀฀฀ � Σ � h 1� ฀฀฀ � ฀฀฀ � h n� ฀฀฀ � � f 1� f n� e � e � w n� w 1� Product of Experts: 1 p(y|x) = p Baseline (y|x) p FCM (y|x) Z(x) 20

Experimental Setup ACE 2005 SemEval-2010 Task 8 • • Data: 6 domains Data: Web text – Newswire (nw) – Newswire (nw) – Broadcast Conversation (bc) – Broadcast Conversation (bc) – Broadcast News (bn) – Broadcast News (bn) – Telephone Speech (cts) – Telephone Speech (cts) – Usenet Newsgroups (un) – Usenet Newsgroups (un) – Weblogs (wl) – Weblogs (wl) • • Train: bn+nw ( ~3600 relations ) Train: Standard split Dev: ½ of bc Dev: from shared task Test: ½ of bc, cts, wl` Test: • • Metric: Micro F1 Metric: Macro F1 (given entity mention) (given entity boundaries) 21

ACE 2005 Results 65% Baseline FCM Baseline+FCM 60% Micro F1 55% 50% 45% Broadcast Conversation Conversational Telephone Weblogs Speech Test Set 22

SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Best in SemEval-2010 Shared Task Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 70 72 74 76 78 80 82 84 86

SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Best in SemEval-2010 Shared Task Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 70 72 74 76 78 80 82 84 86

SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 70 72 74 76 78 80 82 84 86

Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * - PowerPoint PPT Presentation

Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 1 FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Unusual compositional dependence of the Unusual compositional dependence of the exciton reduced

Compositional Analysis of Compositional Analysis of Soluble Salts in Bresle Bresle Extraction

A Compositional Logic A Compositional Logic for Control Flow for Control Flow Gang Tan, Boston

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36 Compositional

THE GOOD Nutritional value of seafood: Rich source of vitamins Rich source of minerals Rich

modelling rich interaction sensor-based systems statusevent analysis rich set of

Fast and Adaptive Online Training of Feature-Rich Translation Models Spence Green Sida Wang

Fast and Adaptive Online Training of Feature-Rich Translation Models Spence Green Sida Wang

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Measu easuri ring What Mat atters: : KPI PIs for Data Quality, Cost st, and Sp Speed

The case against specialized graph engines Jing Fan, Adalbert Gerald

How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers Kazuki Irie *,

Combining Teaching and Research in Text-Mining from Social and Cultural Data Claire Brierley and

Interactive image segmentation with integrated use of the markers and the hierarchical watershed

The Neutrino Monologues Deborah Harris Fermi Na6onal Accelerator Laboratory dharris@fnal.gov

Toponymy and socio-religious organization of the Anaa atoll in pre-Christian times, Tuamotuan

Monetary Policy According to HANK Greg Kaplan Ben Moll Gianluca Violante Mannheim, May 16, 2017