[PPT] - Symmetric Pattern Based Word Embeddings for Improved Word Similarity PowerPoint Presentation

SLIDE 1

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

Roy Schwartz+, Roi Reichart * and Ari Rappoport+

+The Hebrew University, *Technion IIT

CoNLL 2015

SLIDE 2

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SLIDE 3

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

SLIDE 4

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

SLIDE 5

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

SLIDE 6

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

X and

SLIDE 7

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

X and

SLIDE 8

Overview

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

The problem

– Word embeddings do not capture pure word similarity

The Solution

– symmetric patterns-based word embeddings – First embeddings to support for antonyms (e.g., good/bad) w/o using a dictionary

Results

– 5.5% improvement over six state-of-the-art models – 10% improvement with a joint model – 20% improvement on verbs

3

SLIDE 9

Whether two words are semantically similar

– cats are similar to dogs

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

SLIDE 10

–
Definition is not entirely clear

– Synonyms (i.e., share the same meaning) – Co-hyponyms (i.e., belong to the same category)

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

SLIDE 11

–
–

–

Human judgment evaluation

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

SLIDE 12

5 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. Examples taken from the ukwac corpus (Baroni et al., 2009)

Vector Space Models

DS Hypothesis (Harris, 1954)

... tokens to date, friend lists and recent ... ... by my dear friend and companion, Fritz von ... ... even have a friend who never fails ... ... by my worthy friend Doctor Haygarth of ... ... and as a friend pointed out to ... ... partner, in-laws, relatives or friends speak a different ... ... petition to a friend Go to the ... ... otherwise, to a friend or family member ... ...images from my friend Rory though - ... ... great, and a friend as well as a colleague, who, ... …

SLIDE 13

5 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. Examples taken from the ukwac corpus (Baroni et al., 2009)

Vector Space Models

DS Hypothesis (Harris, 1954)

... tokens to date, friend lists and recent ... ... by my dear friend and companion, Fritz von ... ... even have a friend who never fails ... ... by my worthy friend Doctor Haygarth of ... ... and as a friend pointed out to ... ... partner, in-laws, relatives or friends speak a different ... ... petition to a friend Go to the ... ... otherwise, to a friend or family member ... ...images from my friend Rory though - ... ... great, and a friend as well as a colleague, who, ... …

SLIDE 14

0.5 0.76

0.12

0.76

0.51

. . .

6 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Vector Space Models

SLIDE 15

0.5 0.76

0.12

0.76

0.51

. . .

6 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Θ friend colleague

Vector Space Models

SLIDE 16

7 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Relatedness?

Hill et al., 2014

SLIDE 17

7 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Relatedness?

Hill et al., 2014

SLIDE 18

8 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Dissimilarity?

SLIDE 19

8 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Dissimilarity?

SLIDE 20

9 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Current Vector Space Models do not Capture (pure) Word

SLIDE 21

Symmetric Patterns Contexts

Davidov and Rappoport, 2006

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. 10

X Y X Y X Y X Y X Y

SLIDE 22

Symmetric Patterns Contexts

Davidov and Rappoport, 2006

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. 10

bright shiny shiny bright

SLIDE 23

Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

SLIDE 24

Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

John and Mike bold and beautiful neither here nor there Paris or Rome

SLIDE 25

Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

#neither cup nor coffee #dog and leash #car or wheel

SLIDE 26

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

12 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

SLIDE 27

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

12 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

similarity rather than relatedness

SLIDE 28

Some SPs are indicative of antonymy (Lin et al., 2003)

– “either X or Y” (either big or small) – “from X to Y” (from poverty to richness)

13 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Antonyms

big / small

SLIDE 29

13 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Antonyms

big / small

SLIDE 30

Revisiting Word Embedding for Contrasting Meaning (Chen et al.)
Learning Semantic Word Embeddings based on Ordinal Knowledge

Constraints (Liu et al.)

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and

Lexemes (Rothe and Schutze, Best paper award)

14 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Embeddings that Identify Antonyms

ACL 2015 Papers

SLIDE 31

14 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Embeddings that Identify Antonyms

ACL 2015 Papers

First model to support for antonyms without using a dictionary or a thesaurus!

SLIDE 32

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

15 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

similarity rather than relatedness

SLIDE 33

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

15 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

support for antonyms similarity rather than relatedness

SLIDE 34

Embeddings are generated using an 8G words corpus
Baselines: six state-of-the-art models
Word similarity task

– SimLex999 dataset (Hill et al., 2014)

16 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Experiments

SLIDE 35

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

SLIDE 36

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

SLIDE 37

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f



      

SLIDE 38

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f



      

SLIDE 39

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f



      

SLIDE 40

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

SLIDE 41

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

SLIDE 42

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

SLIDE 43

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

SLIDE 44

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

SLIDE 45

19 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

More Results

List of SPs is acquired automatically (not manually defined)
Antonymy as Word Analogy
Wordsim353 experiments
And more…

SLIDE 46

19 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

More Results

SLIDE 47

Summary

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word embeddings based on symmetric patterns

– They capture similarity and not relatedness – The first word embeddings model to mark antonym pairs as dissimilar (w/o using a dictionary)

Experiments on SimLex999

– 5.5% improvement over six state-of-the-art models – 10% improvement with a joint model – 20% improvement on verbs

20

SLIDE 48

Future Work

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Enhancing bag-of-words models with SPs
Does order count? asymmetric symmetric patterns

21

SLIDE 49

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Roy Schwartz (roys02@cs.huji.ac.il) ww.cs.huji.ac.il/~roys02/papers/sp_embeddings/sp_embeddings.html

22