Jointly Learning Word and Phrase Embeddings Using Neural Networks - - PowerPoint PPT Presentation

jointly learning word and phrase embeddings using neural
SMART_READER_LITE
LIVE PREVIEW

Jointly Learning Word and Phrase Embeddings Using Neural Networks - - PowerPoint PPT Presentation

Jointly Learning Word and Phrase Embeddings Using Neural Networks and Implicit Tensor Factorization Kazuma Hashimoto Tsuruoka Laboratory, University of Tokyo 19/06/2015 Talk@UCL Machine Reading Lab. Self Introduction Name Kazuma


slide-1
SLIDE 1

Jointly Learning Word and Phrase Embeddings Using Neural Networks and Implicit Tensor Factorization

Kazuma Hashimoto Tsuruoka Laboratory, University of Tokyo

19/06/2015 Talk@UCL Machine Reading Lab.

slide-2
SLIDE 2
  • Name

– Kazuma Hashimoto (橋本 和真 in Japanese) – http://www.logos.t.u-tokyo.ac.jp/~hassy/

  • Belong

– Tsuruoka Laboratory, University of Tokyo

  • April 2015 – present

Ph.D. student

  • April 2013 – March 2015 Master’s student

– National Centre for Text Mining (NaCTeM)

  • Research Interest

– Word/phrase/document embeddings and their applications

Self Introduction

19/06/2015 Talk@UCL Machine Reading Lab.

2 / 39

slide-3
SLIDE 3
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

3 / 39

slide-4
SLIDE 4
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

4 / 39

slide-5
SLIDE 5
  • Word: String  Index  Vector
  • Why vectors?

– Word similarities can be measured using distance metrics of the vectors (e.g., the cosine similarity)

Assigning Vectors to Words

cause trigger disorder disease animal mouse rat

animal mouse rat disease disorder trigger cause

Embedding words in a vector space

19/06/2015 Talk@UCL Machine Reading Lab.

5 / 39

slide-6
SLIDE 6
  • Two approaches using large corpora:

(systematic comparison of them in Baroni+ (2014)) – Count-based approach

  • e.g.) Reducing the dimension of word co-
  • ccurrence matrix using SVD

– Prediction-based approach

  • e.g.) Predicting words from their contexts using

neural networks

  • We focus on prediction-based approach

– Why?

Approaches to Word Representations

19/06/2015 Talk@UCL Machine Reading Lab.

6 / 39

slide-7
SLIDE 7
  • Prediction-based approaches usually

– parameterize the word embeddings – learn them based on co-occurrence statistics

  • Word embeddings appearing in similar contexts get

close to each other

Learning Word Embeddings

  • text data

… the prevalence of drunken driving and accidents caused by drinking …

target word prediction using the word embedding SkipGram model (Mikolov+, 2013) in word2vec

19/06/2015 Talk@UCL Machine Reading Lab.

7 / 39

slide-8
SLIDE 8
  • Learning word embeddings for relation classification

– To appear at CoNLL 2015 (just advertising)

Task-Oriented Word Embeddings

19/06/2015 Talk@UCL Machine Reading Lab.

8 / 39

slide-9
SLIDE 9
  • Treating phrases and sentences as well as words

– gaining much attention recently!

Beyond Word Embeddings

make payment pay money

money pay pay money make payment payment make

Embedding phrases in a vector space

19/06/2015 Talk@UCL Machine Reading Lab.

9 / 39

slide-10
SLIDE 10
  • Element-wise addition/multiplication (Lapata+, 2010)

– 𝑤 sentnce = 𝑗 𝑤 𝑥𝑗

  • Recursive autoencoders (Socher+, 2011; Hermann+, 2013)

– Using parse trees – 𝑤 parent = 𝑔(𝑤 left child , 𝑤 right child )

  • Tensor/matrix-based methods

– 𝑤 adj noun = 𝑁 adj 𝑤(noun) (Baroni+, 2010) – 𝑁 verb = 𝑗,𝑘 𝑤 subj𝑗 T𝑤 obj𝑘

(Grefenstette+, 2011)

  • 𝑁 subj, verb, obj ={𝑤 subj T𝑤 obj } ∗ 𝑁(verb)
  • 𝑤 subj, verb, obj = 𝑁 verb 𝑤 obj

∗ 𝑤 subj

(Kartsaklis+, 2012)

Approaches to Phrase Embeddings

19/06/2015 Talk@UCL Machine Reading Lab.

10 / 39

slide-11
SLIDE 11
  • Co-occurrence matrix + SVD
  • C&W (Collobert+, 2011)
  • RNNLM (Mikolov+, 2013)
  • SkipGram/CBOW (Mikolov+, 2013)
  • vLBL/ivLBL (Mnih+, 2013)
  • Dependency-based SkipGram (Levy+, 2014)
  • Glove (Pennington+, 2014)

Which Word Embeddings are the Best?

19/06/2015 Talk@UCL Machine Reading Lab.

Which word embeddings should we use for which composition methods?

Joint leaning

11 / 39

slide-12
SLIDE 12
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

12 / 39

slide-13
SLIDE 13
  • Word co-occurrence statistics  word embeddings
  • How about phrase embeddings?

– Phrase co-occurrence statistics!

Co-Occurrence Statistics of Phrases

The importer made payment in his own domestic currency

19/06/2015 Talk@UCL Machine Reading Lab.

The businessman pays his monthly fee in yen

similar contexts similar meanings?

13 / 39

slide-14
SLIDE 14
  • Using Predicate-Argument Structures (PAS)

– Enju parer (Miyao+, 2008)

  • Analyzes relations between phrases and words

How to Identify Phrase-Word Relations?

The importer made payment in his own domestic currency NP NP NP VP NP verb preposition predicates

19/06/2015 Talk@UCL Machine Reading Lab.

arguments

14 / 39

slide-15
SLIDE 15
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

15 / 39

slide-16
SLIDE 16
  • Meanings of transitive verbs are affected by their

arguments – e.g.) run, make, etc.  Good target to test composition models

Why Transitive Verb Phrases?

19/06/2015 Talk@UCL Machine Reading Lab.

make make payment make money make use (of) pay earn use

16 / 39

slide-17
SLIDE 17
  • Embedding subject-verb-object tuples in a vector space

– Semantic similarities between SVOs can be used!

Possible Application: Semantic Search

19/06/2015 Talk@UCL Machine Reading Lab.

17 / 39

slide-18
SLIDE 18
  • Focusing on the role of prepositional adjuncts

– Prepositional adjuncts complement meanings of verb phrases  should be useful

Training Data from Large Corpora

simplification

How to model the relationships between predicates and arguments?

19/06/2015 Talk@UCL Machine Reading Lab.

  • English Wikipedia,

BNC, etc. parse

18 / 39

slide-19
SLIDE 19
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

19 / 39

slide-20
SLIDE 20
  • Predicting words in predicate-argument tuples

Word Prediction Model (like word2vec)

arg1

+

currency furniture max(0, 1-s(currency)+s(furniture))  cost function pred [importer make payment] in

𝐪 = tanh(𝐢𝑏𝑠𝑕1

prep⊙𝐰𝑏𝑠𝑕1 +

𝐢𝑞𝑠𝑓𝑒

prep⊙𝐰𝑞𝑠𝑓𝑒)

𝐰𝑏𝑠𝑕1 𝐰𝑞𝑠𝑓𝑒

feature vector for the word prediction

𝐢𝑏𝑠𝑕1

prep

𝐢𝑞𝑠𝑓𝑒

prep

19/06/2015 Talk@UCL Machine Reading Lab.

PAS-CLBLM

20 / 39

slide-21
SLIDE 21
  • Two methods:

– (a) assigning a vector to each SVO tuple – (b) composing SVO embeddings

How to Compute SVO Embeddings?

[importer make payment] subj

  • bj

+

verb [importer make payment] (a) (b)

  • parameterized vectors
  • composed vectors

19/06/2015 Talk@UCL Machine Reading Lab.

21 / 39

slide-22
SLIDE 22
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

22 / 39

slide-23
SLIDE 23
  • Only element-wise vector operations

– Pros: Fast training – Cons: Poor interaction between predicates and arguments

  • Interactions between predicates and arguments are

important for transitive verbs

Weakness of PAS-CLBLM

19/06/2015 Talk@UCL Machine Reading Lab.

make make payment make money make use (of) pay earn use

23 / 39

slide-24
SLIDE 24
  • Tensor/matrix-based approaches (Noun: vector)

– Adjective: matrix (Baroni+, 2010) – Transitive verb: matrix

(Grefenstette+, 2011; Van de Cruys+, 2013)

Focusing on Tensor-Based Approaches

19/06/2015 Talk@UCL Machine Reading Lab.

verb subject verb 𝑒 𝑒 𝑒 subject

𝑄𝑁𝐽(importer, make, payment) = 0.31

Given Given Given

pre-trained 24 / 39

slide-25
SLIDE 25
  • Parameterizing

– Predicate matrices and – Argument embeddings

Implicit Tensor Factorization (1)

19/06/2015 Talk@UCL Machine Reading Lab.

predicate argument 2 predicate 𝑒 𝑒 𝑒 argument 2

Given Given Given

25 / 39

slide-26
SLIDE 26
  • Calculating plausibility scores

– Using predicate matrices & argument embeddings

Implicit Tensor Factorization (2)

19/06/2015 Talk@UCL Machine Reading Lab.

predicate argument 2 predicate 𝑒 𝑒 𝑒 argument 2

Given Given Given

𝑈(i, j, k) = i j k

26 / 39

slide-27
SLIDE 27
  • Learning model parameters

– Using plausibility judgment task

  • Observed tuple: (i, j, k)
  • Collapsed tuple: (i’, j, k), (i, j’, k), (i, j, k’)

– Negative sampling (Mikolov+, 2013)

Implicit Tensor Factorization (3)

19/06/2015 Talk@UCL Machine Reading Lab.

Cost function

27 / 39

slide-28
SLIDE 28
  • Discriminating between observed and collapsed ones

Example

19/06/2015 Talk@UCL Machine Reading Lab.

(i, j, k) = (in, importer make payment, currency) (i’, j, k)= (on, importer make payment, currency) (i, j’, k)= (in, child eat pizza, currency) (i, j, k’)= (in, importer make payment, furniture)

28 / 39

slide-29
SLIDE 29
  • Two methods:

– (a) assigning a vector to each SVO tuple – (b) composing SVO embeddings

How to Compute SVO Embeddings?

  • parameterized vectors
  • composed vectors

19/06/2015 Talk@UCL Machine Reading Lab.

[importer make payment] [importer make payment] (a) (b)

  • parameterized matrices

(Kartsaklis+, 2012) 29 / 39

slide-30
SLIDE 30
  • The function is presented in Kartsaklis+ (2012)

– Using verb matrices in Grefenstette+ (2011)

  • Our verb matrices are related to Grefenstette+

(2011)

  • The function can compute

– verb-object phrase embeddings – subject-verb-object phrase embeddings

Why the Copy-Subject Function?

19/06/2015 Talk@UCL Machine Reading Lab.

30 / 39

slide-31
SLIDE 31
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

31 / 39

slide-32
SLIDE 32
  • Training corpus: English Wikipedia

– SVO data: 23.6 million instances – SVO-preposition-noun data: 17.3 million instances

  • Parameter Initialization: random values
  • Optimization: mini-batch AdaGrad (Duchi+, 2011)
  • Embedding dimensionality

– PAS-CLBLM: 200 – Tensor method: 50

  • # of model parameters of PAS-CLBLM is a little

bit larger than that of the tensor method

Experimental Settings

19/06/2015 Talk@UCL Machine Reading Lab.

32 / 39

slide-33
SLIDE 33
  • Case 1: assigning a vector to each SVO tuple

Examples of Learned SVO Embeddings

Adjuncts seem to be helpful in learning the meanings of verb phrases This approach omits the information about individual words!

19/06/2015 Talk@UCL Machine Reading Lab.

33 / 39

slide-34
SLIDE 34
  • Case 2: composing SVO embeddings

Examples of Learned SVO Embeddings

Tensor (CVSC 2015) PAS-CLBLM (EMNLP 2014)

More flexible!

19/06/2015 Talk@UCL Machine Reading Lab.

Strongly enhancing the head word

34 / 39

slide-35
SLIDE 35
  • In the latest approach, the learned verb matrices

capture multiple meanings

Multiple Meanings in Verb Matrices

19/06/2015 Talk@UCL Machine Reading Lab.

35 / 39

slide-36
SLIDE 36
  • Measuring semantic similarities of verb pairs taking

the same subjects and objects (Grefenstette+, 2011) – Evaluation: Speaman’s rank correlation between similarity scores and human ratings

Verb Sense Disambiguation Task

ve verb p pai air w with th subj&obj human an rat ating student write name student spell name 7 child show sign child express sign 6 system meet criterion system visit criterion 1

19/06/2015 Talk@UCL Machine Reading Lab.

36 / 39

slide-37
SLIDE 37
  • State-of-the-art results on the disambiguation task

– Prepositional adjuncts improve the results

  • How about other kinds of adjuncts?

Results

Method

Tensor (only verb data) 0.480 Tensor (verb and preposition data) 0.614 PAS-CLBLM (this experiment) 0.374 Milajevs+, 2014 0.456 Hashimoto+, 2014 0.422 Future work: improving real-world applications using the method

19/06/2015 Talk@UCL Machine Reading Lab.

37 / 39

slide-38
SLIDE 38
  • 1. Background

– Word and Phrase Embeddings

  • 2. Jointly Learning Word and Phrase Embeddings

– General Idea

  • 3. Our Methods Focusing on Transitive Verb Phrases

– Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015)

  • 4. Experiments and Results
  • 5. Summary

Today’s Agenda

19/06/2015 Talk@UCL Machine Reading Lab.

38 / 39

slide-39
SLIDE 39
  • Word and phrase embeddings are jointly learned

using large corpora parsed by syntactic parsers – Tensor-based method is suitable for verb sense disambiguation – Adjuncts are useful in learning verb phrases

  • Future directions:

– improving the embedding methods – applying them to real-world NLP applications

  • What kind of information should be captured?

Summary

19/06/2015 Talk@UCL Machine Reading Lab.

39 / 39