Symmetric Pattern Based Word Embeddings for Improved Word Similarity - - PowerPoint PPT Presentation

symmetric pattern based word embeddings
SMART_READER_LITE
LIVE PREVIEW

Symmetric Pattern Based Word Embeddings for Improved Word Similarity - - PowerPoint PPT Presentation

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + , Roi Reichart * and Ari Rappoport + + The Hebrew University, * Technion IIT CoNLL 2015 Symmetric Pattern Based Word Embeddings for 2 Improved Word


slide-1
SLIDE 1

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction

Roy Schwartz+, Roi Reichart * and Ari Rappoport+

+The Hebrew University, *Technion IIT

CoNLL 2015

slide-2
SLIDE 2

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

slide-3
SLIDE 3

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

slide-4
SLIDE 4

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

slide-5
SLIDE 5

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Apples and

slide-6
SLIDE 6

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

X and

slide-7
SLIDE 7

2 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

X and

slide-8
SLIDE 8

Overview

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

  • The problem

– Word embeddings do not capture pure word similarity

  • The Solution

– symmetric patterns-based word embeddings – First embeddings to support for antonyms (e.g., good/bad) w/o using a dictionary

  • Results

– 5.5% improvement over six state-of-the-art models – 10% improvement with a joint model – 20% improvement on verbs

3

slide-9
SLIDE 9
  • Whether two words are semantically similar

– cats are similar to dogs

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

slide-10
SLIDE 10
  • Definition is not entirely clear

– Synonyms (i.e., share the same meaning) – Co-hyponyms (i.e., belong to the same category)

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

slide-11
SLIDE 11

  • Human judgment evaluation

4 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Similarity

slide-12
SLIDE 12

5 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. Examples taken from the ukwac corpus (Baroni et al., 2009)

Vector Space Models

DS Hypothesis (Harris, 1954)

... tokens to date, friend lists and recent ... ... by my dear friend and companion, Fritz von ... ... even have a friend who never fails ... ... by my worthy friend Doctor Haygarth of ... ... and as a friend pointed out to ... ... partner, in-laws, relatives or friends speak a different ... ... petition to a friend Go to the ... ... otherwise, to a friend or family member ... ...images from my friend Rory though - ... ... great, and a friend as well as a colleague, who, ... …

slide-13
SLIDE 13

5 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. Examples taken from the ukwac corpus (Baroni et al., 2009)

Vector Space Models

DS Hypothesis (Harris, 1954)

... tokens to date, friend lists and recent ... ... by my dear friend and companion, Fritz von ... ... even have a friend who never fails ... ... by my worthy friend Doctor Haygarth of ... ... and as a friend pointed out to ... ... partner, in-laws, relatives or friends speak a different ... ... petition to a friend Go to the ... ... otherwise, to a friend or family member ... ...images from my friend Rory though - ... ... great, and a friend as well as a colleague, who, ... …

slide-14
SLIDE 14

0.5 0.76

  • 0.12

0.76

  • 0.51

. . .

6 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Vector Space Models

slide-15
SLIDE 15

0.5 0.76

  • 0.12

0.76

  • 0.51

. . .

6 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Θ friend colleague

Vector Space Models

slide-16
SLIDE 16

7 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Relatedness?

Hill et al., 2014

slide-17
SLIDE 17

7 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Relatedness?

Hill et al., 2014

slide-18
SLIDE 18

8 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Dissimilarity?

slide-19
SLIDE 19

8 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Similarity or Dissimilarity?

slide-20
SLIDE 20

9 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Current Vector Space Models do not Capture (pure) Word

slide-21
SLIDE 21

Symmetric Patterns Contexts

Davidov and Rappoport, 2006

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. 10

X Y X Y X Y X Y X Y

slide-22
SLIDE 22

Symmetric Patterns Contexts

Davidov and Rappoport, 2006

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al. 10

bright shiny shiny bright

slide-23
SLIDE 23
  • Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

slide-24
SLIDE 24
  • Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

John and Mike bold and beautiful neither here nor there Paris or Rome

slide-25
SLIDE 25
  • Words that co-occur in SPs tend to be semantically similar

– Widdows and Dorow, 2002 – Davidov and Rappoport, 2006 – Kozareva et al., 2008 – Feng et al., 2013 – Schwartz et al., 2014

11 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Symmetric Patterns (SPs)

#neither cup nor coffee #dog and leash #car or wheel

slide-26
SLIDE 26

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

12 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

slide-27
SLIDE 27

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

12 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

similarity rather than relatedness

slide-28
SLIDE 28
  • Some SPs are indicative of antonymy (Lin et al., 2003)

– “either X or Y” (either big or small) – “from X to Y” (from poverty to richness)

13 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Antonyms

big / small

slide-29
SLIDE 29

13 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Antonyms

big / small

slide-30
SLIDE 30
  • Revisiting Word Embedding for Contrasting Meaning (Chen et al.)
  • Learning Semantic Word Embeddings based on Ordinal Knowledge

Constraints (Liu et al.)

  • AutoExtend: Extending Word Embeddings to Embeddings for Synsets and

Lexemes (Rothe and Schutze, Best paper award)

14 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Embeddings that Identify Antonyms

ACL 2015 Papers

slide-31
SLIDE 31

14 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Word Embeddings that Identify Antonyms

ACL 2015 Papers

First model to support for antonyms without using a dictionary or a thesaurus!

slide-32
SLIDE 32

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

15 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

similarity rather than relatedness

slide-33
SLIDE 33

PPMI(dog,house) PPMI(dog,mouse) PPMI(dog,zebra) PPMI(dog,wine) PPMI(dog,cat) PPMI(dog,dolphin) PPMI(dog,bottle) PPMI(dog,pen) . . .

15 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

SP-based Word Embeddings

Vsp

dog =

* Simple smoothing applied

support for antonyms similarity rather than relatedness

slide-34
SLIDE 34
  • Embeddings are generated using an 8G words corpus
  • Baselines: six state-of-the-art models
  • Word similarity task

– SimLex999 dataset (Hill et al., 2014)

16 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Experiments

slide-35
SLIDE 35

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

slide-36
SLIDE 36

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

slide-37
SLIDE 37

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f

      

slide-38
SLIDE 38

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f

      

slide-39
SLIDE 39

17 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Results

Spearman’s ρ Model 0.35 Glove (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al,. 2013) 0.517 SP 0.563 Joint

) , ( ) 1 ( ) , ( ) , (

int j i gram skip j i SP j i jo

w w f w w f w w f

      

slide-40
SLIDE 40

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

slide-41
SLIDE 41

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

slide-42
SLIDE 42

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

slide-43
SLIDE 43

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

slide-44
SLIDE 44

Verbs Nouns Adj. Model 0.163 0.377 0.571 Glove (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al,. 2013) 0.578 0.497 0.663 SP

18 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

POS Analysis

slide-45
SLIDE 45

19 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

More Results

  • List of SPs is acquired automatically (not manually defined)
  • Antonymy as Word Analogy
  • Wordsim353 experiments
  • And more…
slide-46
SLIDE 46

19 Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

More Results

slide-47
SLIDE 47

Summary

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

  • Word embeddings based on symmetric patterns

– They capture similarity and not relatedness – The first word embeddings model to mark antonym pairs as dissimilar (w/o using a dictionary)

  • Experiments on SimLex999

– 5.5% improvement over six state-of-the-art models – 10% improvement with a joint model – 20% improvement on verbs

20

slide-48
SLIDE 48

Future Work

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

  • Enhancing bag-of-words models with SPs
  • Does order count? asymmetric symmetric patterns

21

slide-49
SLIDE 49

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction @ Schwartz et al.

Roy Schwartz (roys02@cs.huji.ac.il) ww.cs.huji.ac.il/~roys02/papers/sp_embeddings/sp_embeddings.html

22