context2vec: Learning Generic Context Embedding with Bidirectional - PowerPoint PPT Presentation

Oren Melamud, Jacob Goldberger, Ido Dagan CoNLL, 2016 context2vec: Learning Generic Context Embedding with Bidirectional LSTM

• Target: bank 2 What context is They robbed the _bank_ last night. • Sentential context: They robbed the last night.

IBM this company for 100 million dollars. They robbed the _bank_ last night. I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for

They robbed the _bank_ last night. I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars.

I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars. They robbed the _bank_ last night.

• Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars. They robbed the _bank_ last night. I can’t find _April_ .

• IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. sentence representation • Context representation 4 Different context words, similar contextual information • Information on the target slot/word Similar context words, different contextual information sum of context words • Contextual information What we want from context representations

• IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words

• IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars.

• Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday.

• Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Context representation ̸ = sentence representation

• Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • context2vec average of word embeddings • context2vec state-of-the-art (more complex models) • Toolkit available for your NLP application 5 Our work

• Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • Toolkit available for your NLP application 5 Our work • context2vec >> average of word embeddings • context2vec ∼ state-of-the-art (more complex models)

Background

Limited scope loses word order Variable-size 7 Popular recent context representations

• Word order captured with biLSTM • Task-specific training • Supervision is limited in size • Pre-trained word embeddings carry valuable information from large corpora • Can we bring even more information? NER (Lample, 2016) 8 Supervised biLSTM with pre-trained word embeddings

10 Baseline architecture: word2vec with CBOW objective function averaged target word context embeddings embeddings context window context word embeddings John had [ submitted ] a paper submitted ( ) c avg · ⃗ c avg · ⃗ S = ∑ log σ ( ⃗ t ) + ∑ t ′ ∈ NEGS ( t , c ) log σ ( − ⃗ t ′ ) ( t , c ) ∈ PAIRS

11 context2vec word2vec CBOW context2vec = word2vec - CBOW + biLSTM objective function sentential context embeddings target word embeddings MLP objective function averaged target word context embeddings LSTM LSTM LSTM LSTM LSTM embeddings context window LSTM LSTM LSTM LSTM LSTM context word embeddings John had [ submitted ] a paper submitted John had [ submitted ] a paper submitted

12 Learning architecture: context2vec objective function sentential context embeddings target word embeddings MLP LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM John had [ submitted ] a paper submitted ( ) c c 2 v · ⃗ c c 2 v · ⃗ log σ ( ⃗ t ′ ∈ NEGS ( t , c ) log σ ( − ⃗ S = ∑ t ) + ∑ t ′ ) ( t , c ) ∈ PAIRS

13 The context2vec embedding space IBM [ ] this company I [ ] t2c this necklace for acquired my wife’s bought birthday company technology IBM bought this [ ] target word sentential context

14 The context2vec embedding space IBM [ ] this company c2c I [ ] this necklace for acquired t2t my wife’s bought birthday company technology IBM bought this [ ] target word sentential context

Evaluation & Results

• Using simple cosine similarity measures • Standalone evaluation of context2vec 16 Evaluation goals

write migrate climb swear contribute • Implementation: Shortest target-context cosine distance • Benchmark: Microsoft sentence completion challenge (Zweig and Burges, 2011) 17 Tasks: Sentence completion I have seen it on him, and could to it.

skilled luminous vivid hopeful smart • Implementation: Rank by target-context cosine distance • Benchmarks: • Lexical sample (McCarthy and Navigli and Burges, 2007) • All-words (Kremer et al., 2014) 18 Tasks: Lexical substitution Charlie is a bright boy.

• They add (s2) a touch of humor. • The minister added (s4) : the process remains fragile. TEST TRAIN • Implementation: Shortest context-context cosine distance (kNN) • Benchmark: Senseval-3 English lexical sample (Mihalcea et al. , 2004) 19 Tasks: Supervised word sense disambiguation This adds a wider perspective.

context2vec: Learning Generic Context Embedding with Bidirectional - PowerPoint PPT Presentation

Oren Melamud, Jacob Goldberger, Ido Dagan CoNLL, 2016 context2vec: Learning Generic Context Embedding with Bidirectional LSTM Target: bank 2 What context is They robbed the _bank_ last night. Sentential context: They robbed the last

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

What are Generics? e.g. Generics, Generic Programming, Generic Types, Generic Methods 6

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Generic Programming in a Dependently Typed Language Generic proofs for generic programs Peter

Generic Methods 36 What are Generic Methods? Generic methods = methods that introduce type

1 Definition of a simple generic class Why generic programming (cont.) class Pair <T> {

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Planning and Optimization C14. Merge-and-Shrink Abstractions: Generic Algorithm Malte Helmert and

Generic classes Declaration Use Annotations 54 Generic classes Declaration add

New Generic Attacks on Hash-based MACs G. Leurent (Inria) New Generic Attacks on Hash-based MACs

Generic absoluteness and universally Baire sets of reals Trevor Wilson Miami University, Ohio

Generic Programming Department of Computer Science University of Maryland, College Park Generic

GENERICS A generic type is a generic class or interface that is parameterized over types. 1 / 6

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

Mining Data Graphs Semi-supervised learning, label propagation, Web Search Data graphs Data

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Identifying Generic Expressions Nils Reiter and Anette Frank Department of Computational

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

Temporal and Event Analysis of Natural Language Texts Siim Orasmaa Data Estonian Reference

Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team Overview of FIRE 2011 p.

Resolution by weighted blowing up Dan Abramovich, Brown University Joint work with Michael T

Example: Find intervals where the function f ( x ) = x 2 16 x is increasing, where it is

context2vec: Learning Generic Context Embedding with Bidirectional - PowerPoint PPT Presentation

Oren Melamud, Jacob Goldberger, Ido Dagan CoNLL, 2016 context2vec: Learning Generic Context Embedding with Bidirectional LSTM Target: bank 2 What context is They robbed the _bank_ last night. Sentential context: They robbed the last

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

What are Generics? e.g. Generics, Generic Programming, Generic Types, Generic Methods 6

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Generic Programming in a Dependently Typed Language Generic proofs for generic programs Peter

Generic Methods 36 What are Generic Methods? Generic methods = methods that introduce type

1 Definition of a simple generic class Why generic programming (cont.) class Pair &lt;T&gt; {

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Planning and Optimization C14. Merge-and-Shrink Abstractions: Generic Algorithm Malte Helmert and

Generic classes Declaration Use Annotations 54 Generic classes Declaration add

New Generic Attacks on Hash-based MACs G. Leurent (Inria) New Generic Attacks on Hash-based MACs

Generic absoluteness and universally Baire sets of reals Trevor Wilson Miami University, Ohio

Generic Programming Department of Computer Science University of Maryland, College Park Generic

GENERICS A generic type is a generic class or interface that is parameterized over types. 1 / 6

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

Mining Data Graphs Semi-supervised learning, label propagation, Web Search Data graphs Data

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Identifying Generic Expressions Nils Reiter and Anette Frank Department of Computational

Computational Semantics and Pragmatics Autumn 2011 Raquel Fernndez Institute for Logic,

Temporal and Event Analysis of Natural Language Texts Siim Orasmaa Data Estonian Reference

Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team Overview of FIRE 2011 p.

Resolution by weighted blowing up Dan Abramovich, Brown University Joint work with Michael T

Example: Find intervals where the function f ( x ) = x 2 16 x is increasing, where it is

1 Definition of a simple generic class Why generic programming (cont.) class Pair <T> {