Multi-Prototype Models of Word Meaning Joseph Reisinger and Raymond - - PowerPoint PPT Presentation

multi prototype models of word meaning
SMART_READER_LITE
LIVE PREVIEW

Multi-Prototype Models of Word Meaning Joseph Reisinger and Raymond - - PowerPoint PPT Presentation

Multi-Prototype Models of Word Meaning Joseph Reisinger and Raymond J. Mooney The University of Texas at Austin Vector Space Lexical Semantics Represent meaning as a point in some high- dimensional space Word relatedness


slide-1
SLIDE 1

Multi-Prototype Models of Word Meaning

Joseph Reisinger and Raymond J. Mooney The University of Texas at Austin

slide-2
SLIDE 2
  • Represent “meaning” as a point in some high-

dimensional space

  • Word relatedness correlates with some distance metric
  • Attributional: Almuhareb and Poesio (2004), Bullinaria and

Levy (2007), Erk (2007), Griffiths et al. (2007), Landauer and Dumais (1997), Padó and Lapata (2007), Sahlgren (2006), Schütze (1997)

  • Relational: Moldovan (2006), Pantel and Pennacchiotti

(2006), Turney (2006)

Vector Space Lexical Semantics

slide-3
SLIDE 3

Ω =

club bat

slide-4
SLIDE 4

d

Ω =

club bat

slide-5
SLIDE 5

yellow

Ω =

club bat

slide-6
SLIDE 6

yellow

Ω =

d

club bat

slide-7
SLIDE 7

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007)

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-8
SLIDE 8

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007)

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-9
SLIDE 9

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007)

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-10
SLIDE 10

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007)

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-11
SLIDE 11

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007)

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-12
SLIDE 12

club disco bat

Ω =

Tversky and Gati (1982), Griffiths et al. (2007) bat disco bat disco club

“violates the triangle inequality”

  • Any inner product space; e.g. “dense” semantic spaces like LSA
slide-13
SLIDE 13

club disco bat

Ω =

  • Similar to unsupervised Word Sense Discovery, e.g. Pantel

and Lin (2002), Schütze (1998), Yarowsky (1995)

Using multiple prototypes

slide-14
SLIDE 14

club disco

Ω =

bat (instrument) bat (animal)

  • Similar to unsupervised Word Sense Discovery, e.g. Pantel

and Lin (2002), Schütze (1998), Yarowsky (1995)

Using multiple prototypes

slide-15
SLIDE 15

disco

Ω =

bat (instrument) bat (animal) club (instrument) club (location)

  • Similar to unsupervised Word Sense Discovery, e.g. Pantel

and Lin (2002), Schütze (1998), Yarowsky (1995)

Using multiple prototypes

slide-16
SLIDE 16

Ω =

  • Similar to unsupervised Word Sense Discovery, e.g. Pantel

and Lin (2002), Schütze (1998), Yarowsky (1995) disco bat (instrument) club (instrument) club (location) bat (animal)

Using multiple prototypes

slide-17
SLIDE 17

Some practical benefits

  • “Meaning” is a mixture over prototypes, capturing

polysemy and thematic variation.

  • Can exploit contextual information to refine word

similarity computations:

  • e.g., is “the bat flew out of the cave” similar to “the girls

left the club” ?

  • “Senses” are thematic and very fine-grained
  • e.g., the hurricane sense of position
slide-18
SLIDE 18

Single Prototype ↔ Multi-Prototype ↔ Exemplar

Ω =

  • Find the centroid of the individual word occurrences
  • Conflates senses
slide-19
SLIDE 19

Single Prototype ↔ Multi-Prototype ↔ Exemplar

Ω =

bat (animal) bat (instrument) club (instrument) disco (location) club (location)

  • Find the centroid of the individual word occurrences
  • Conflates senses
slide-20
SLIDE 20

Single Prototype ↔ Multi-Prototype ↔ Exemplar

Ω =

bat (animal) bat (instrument) club (instrument) disco (location) club (location)

  • Find the centroid of the individual word occurrences
  • Conflates senses
slide-21
SLIDE 21

Single Prototype ↔ Multi-Prototype ↔ Exemplar

Ω =

bat (animal) bat (instrument) club (instrument) disco (location) club (location)

  • Essentially just clustering word occurrences
  • Doesn’t find lexicographic senses; captures contextual

variance directly.

slide-22
SLIDE 22

bat (instrument)

Single Prototype ↔ Multi-Prototype ↔ Exemplar

Ω =

bat (animal) club (instrument) disco (location) club (location)

  • Just treat all occurrences as an ensemble representing

meaning.

  • Compute similarity as the average of the K most similar

pairs.

  • Heavily influenced by noise, but captures more structure

Erk (2007), Vandekerckhove et al. (2009)

slide-23
SLIDE 23

Multi-Prototype Similarity Metrics

Ω =

  • MaxSim ー Maximum pairwise similarity between any two

prototypes.

  • AvgSim ー Average pairwise similarity over all prototypes.
slide-24
SLIDE 24

Multi-Prototype Similarity Metrics

Ω =

  • MaxSim ー Maximum pairwise similarity between any two

prototypes.

  • AvgSim ー Average pairwise similarity over all prototypes.
slide-25
SLIDE 25

Multi-Prototype Similarity Metrics

Ω =

  • MaxSim ー Maximum pairwise similarity between any two

prototypes.

  • AvgSim ー Average pairwise similarity over all prototypes.
slide-26
SLIDE 26
  • Choosing an embedding vector space:
  • features (unigram, bigram, collocation, dependency, ...)
  • feature weighting (t-test, tf-idf, χ2, MI, ...)
  • metric / inner product (cosine, Jaccard, KL, ...)
  • The multi-prototype method is essentially agnostic to

these implementation details

Feature Engineering / Weighting

Curran (2004)

slide-27
SLIDE 27
  • Choosing an embedding vector space:
  • features (unigram, bigram, collocation, dependency, ...)
  • feature weighting (t-test, tf-idf, χ2, MI, ...)
  • metric / inner product (cosine, Jaccard, KL, ...)
  • The multi-prototype method is essentially agnostic to

these implementation details

Feature Engineering / Weighting

Curran (2004)

slide-28
SLIDE 28
  • Wikipedia as the base textual corpus (2.8M articles, 2B

words)

  • Evaluation:

1. WordSim-353 collection (353 word pairs with ~15 human similarity judgements each) Finkelstein et al. (2002); using Spearman’s rank correlation Agirre et al. (2009) 2. Predicting related words; human raters from Amazon Mechanical Turk

Experimental setup

slide-29
SLIDE 29

Results: WordSim-353 Correlation

single prototype exemplar K=5 K=20 K=50 combined ESA† SVM* Oracle*

0.5 0.75 1 Spearman’s ρ

†Gabrilovich and Markovitch (2007), * Agirre et al. (2009)

multi-prototype{

combined approach, including the prototypes from multiple clusterings (2, 3, 5, 10, 20, 50)

slide-30
SLIDE 30

Results: WordSim-353 Correlation

0.2 0.8 0.4 0.6

Spearman’s ρ

# of prototypes

combined approach, including the prototypes from multiple clusterings (2, 3, 5, 10, 20, 50)

slide-31
SLIDE 31

Predicting related words

slide-32
SLIDE 32

Predicting related words top-word:

reservation

Which word is more related to reservation?

settlers tribal

party

Which word is more related to party?

government political

slide-33
SLIDE 33

Predicting related words top-word:

reservation

Which word is more related to reservation?

settlers tribal

party

Which word is more related to party?

government political

top-set:

journal

Which set of words is more related to journal?

research, study, published publication, paper, study

train

Which set of words is more related to train?

station, line, services passenger, rail, freight

slide-34
SLIDE 34

Predicting related words top-word:

reservation

Which word is more related to reservation?

settlers tribal

party

Which word is more related to party?

government political

  • 79 raters, 7.6K comparisons

top-set:

journal

Which set of words is more related to journal?

research, study, published publication, paper, study

train

Which set of words is more related to train?

station, line, services passenger, rail, freight

slide-35
SLIDE 35

Predicting related words top-word:

reservation

Which word is more related to reservation?

settlers tribal

party

Which word is more related to party?

government political

  • 79 raters, 7.6K comparisons

top-set:

journal

Which set of words is more related to journal?

research, study, published publication, paper, study

train

Which set of words is more related to train?

station, line, services passenger, rail, freight

slide-36
SLIDE 36

Results: Non-contextual Prediction

# of prototypes

% Multi-prototype favored

homonymous carrier, crane, cell, company, issue, interest, match, media, nature, party, practice, plant, racket, recess, reservation, rock, space, value polysemous cause, chance, journal, market, network, policy, power, production, series, trading, train

slide-37
SLIDE 37

Contextual Prediction

I have some reservation due to the high potential for violations.

Which word is more related to reservation as used in the sentence above?

tribal thoughtful

When there is more variation in wage offers, the searcher may want to wait longer (that is, set a higher reservation wage) in hopes of receiving an exceptionally high wage offer.

Which word is more related to reservation as used in the sentence above?

tribal minimum

slide-38
SLIDE 38

Contextual Prediction

I have some reservation due to the high potential for violations.

Which word is more related to reservation as used in the sentence above?

tribal thoughtful

When there is more variation in wage offers, the searcher may want to wait longer (that is, set a higher reservation wage) in hopes of receiving an exceptionally high wage offer.

Which word is more related to reservation as used in the sentence above?

tribal minimum

  • 127 raters, ~10K comparisons
slide-39
SLIDE 39

Contextual Prediction

I have some reservation due to the high potential for violations.

Which word is more related to reservation as used in the sentence above?

tribal thoughtful

When there is more variation in wage offers, the searcher may want to wait longer (that is, set a higher reservation wage) in hopes of receiving an exceptionally high wage offer.

Which word is more related to reservation as used in the sentence above?

tribal minimum

  • 127 raters, ~10K comparisons
slide-40
SLIDE 40

Results: Contextual Prediction

# of prototypes

homonymous carrier, crane, cell, company, issue, interest, match, media, nature, party, practice, plant, racket, recess, reservation, rock, space, value polysemous cause, chance, journal, market, network, policy, power, production, series, trading, train

% Multi-prototype favored

slide-41
SLIDE 41
  • Represent word meaning as a collection of

prototype vectors.

  • Outperforms single-prototype, but introduces more

noise (like exemplar).

  • Trade-off for doing clustering step.
  • Can we define better distance metrics? KL?
  • account for asymmetry?

Conclusion

slide-42
SLIDE 42

Questions?

slide-43
SLIDE 43

Pruning

0.4 0.8 0.4 0.8 0.4 0.8 0.4 0.8

K=1 K=10 K=50 tf-idf cosine, K=1,10,50

Spearman's ρ

0.0 0.8

# of features

all 10k 5k 2k 1k 500 200 100 20 10

# of features

all 10k 5k 2k 1k 500 200 100 20 10

# of features

all 10k 5k 2k 1k 500 200 100 20 10

# of features

all 10k 5k 2k 1k 500 200 100 20 10 tf-idf ttest χ2 tf K=50 K=10 K=1 tf-idf ttest χ2 tf tf-idf ttest χ2 tf

// // // //