The Role of Context Types and Dimensionality in Learning Word - - PowerPoint PPT Presentation

the role of context types and dimensionality in learning
SMART_READER_LITE
LIVE PREVIEW

The Role of Context Types and Dimensionality in Learning Word - - PowerPoint PPT Presentation

Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings Useful in supervised tasks: As pre-training initialization With limited supervised


slide-1
SLIDE 1

The Role of Context Types and Dimensionality in Learning Word Embeddings

Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal

NAACL, 2016

slide-2
SLIDE 2

What’s a good word embedding for my task?

Useful in supervised tasks:

  • As pre-training initialization
  • With limited supervised data

Applied to various tasks:

  • Dependency Parsing
  • Named Entity Recognition
  • Co-reference Resolution
  • Sentiment Analysis
  • More...

so many choices...

2

slide-3
SLIDE 3

What’s a good word embedding for my task?

Useful in supervised tasks:

  • As pre-training initialization
  • With limited supervised data

Applied to various tasks:

  • Dependency Parsing
  • Named Entity Recognition
  • Co-reference Resolution
  • Sentiment Analysis
  • More...

so many choices...

2

slide-4
SLIDE 4

What’s a good word embedding for my task?

Useful in supervised tasks:

  • As pre-training initialization
  • With limited supervised data

Applied to various tasks:

  • Dependency Parsing
  • Named Entity Recognition
  • Co-reference Resolution
  • Sentiment Analysis
  • More...

so many choices...

2

slide-5
SLIDE 5

Plethora of Word Embeddings

Easy to obtain

  • Off-the-shelf
  • Do-it-yourself toolkits

so many choices...

3

slide-6
SLIDE 6

Plethora of Word Embeddings

Lots of choices to make

  • 1. Input
  • Context type

(BOW-N, syntactic, ...)

  • Learning corpus
  • 2. Computational model
  • Model type

(word2vec, GloVe, ...)

  • Hyperparameters
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Ensembles, retrofitting, …

4

slide-7
SLIDE 7

Plethora of Word Embeddings

Lots of choices to make

  • 1. Input
  • Context type

(BOW-N, syntactic, ...)

  • Learning corpus
  • 2. Computational model
  • Model type

(word2vec, GloVe, ...)

  • Hyperparameters
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Ensembles, retrofitting, …

4

slide-8
SLIDE 8

Plethora of Word Embeddings

Lots of choices to make

  • 1. Input
  • Context type

(BOW-N, syntactic, ...)

  • Learning corpus
  • 2. Computational model
  • Model type

(word2vec, GloVe, ...)

  • Hyperparameters
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Ensembles, retrofitting, …

4

slide-9
SLIDE 9

Plethora of Word Embeddings

Lots of choices to make

  • 1. Input
  • Context type

(BOW-N, syntactic, ...)

  • Learning corpus
  • 2. Computational model
  • Model type

(word2vec, GloVe, ...)

  • Hyperparameters
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Ensembles, retrofitting, …

4

slide-10
SLIDE 10

Plethora of Word Embeddings

Lots of choices to make

  • 1. Input
  • Context type

(BOW-N, syntactic, ...)

  • Learning corpus
  • 2. Computational model
  • Model type

(word2vec, GloVe, ...)

  • Hyperparameters
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Ensembles, retrofitting, …

4

slide-11
SLIDE 11

Our Focus

Choices we explore:

  • 1. Input
  • Context type

(BOW-N, syntactic, substitute)

  • Wikipedia + Gigaword + UMBC (web)
  • 2. Computational model
  • word2vec
  • 3. Output
  • Dimensionality

(is higher always better?)

  • 4. Post-processing
  • Embeddings combinations

(concat, SVD, CCA)

Evaluated extensively on intrinsic and extrinsic tasks

5

slide-12
SLIDE 12

Our Focus

Research questions:

  • Do intrinsic benchmarks predict extrinsic performance?
  • Tune context type and dimensionality per extrinsic task?
  • Can we benefit from combining different context types?

6

slide-13
SLIDE 13

Our Focus

Research questions:

  • Do intrinsic benchmarks predict extrinsic performance?
  • Tune context type and dimensionality per extrinsic task?
  • Can we benefit from combining different context types?

6

slide-14
SLIDE 14

Our Focus

Research questions:

  • Do intrinsic benchmarks predict extrinsic performance?
  • Tune context type and dimensionality per extrinsic task?
  • Can we benefit from combining different context types?

6

slide-15
SLIDE 15

Additional Contribution

A new word2vec context type (substitute-based)

  • Based on n-gram language modeling

7

slide-16
SLIDE 16

Outline

  • Context types and dimensionality
  • Combining context types
  • Conclusions

8

slide-17
SLIDE 17

Context Types and Dimensionality

slide-18
SLIDE 18

Common Context Types

The Italian chef baked the cake in the oven

nsubj dobj prep_in BOW-2 BOW-2 Contexts t c baked Italian baked chef baked the baked cake Dependency Contexts t c baked nsubj:chef baked dobj:cake baked prep_in:oven

10

slide-19
SLIDE 19

Learning word2vec Skip-gram Embeddings

BOW-2 Contexts t c baked Italian baked chef baked the baked cake Dependency Contexts t c baked nsubj:chef baked dobj:cake baked prep_in:oven ∑

(t,c)∈PAIRS

( log σ(v′

c · vt) + ∑ neg∈NEGS(t,c) log σ(−v′ neg · vt)

)

11

slide-20
SLIDE 20

Substitute-based Contexts

Potential substitutes encode the context (Yuret, 2012)

The Italian chef baked the cake in the oven

Substitute Contexts t s wt s baked put 0.50 baked baked 0.25 baked cooked 0.15 baked forgot 0.10

12

slide-21
SLIDE 21

Substitute-based Contexts

Potential substitutes encode the context (Yuret, 2012)

The Italian chef _____ the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot

Substitute Contexts t s wt s baked put 0.50 baked baked 0.25 baked cooked 0.15 baked forgot 0.10

12

slide-22
SLIDE 22

Substitute-based Contexts

Potential substitutes encode the context (Yuret, 2012)

The Italian chef baked the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot

Substitute Contexts t s wt,s baked put 0.50 baked baked 0.25 baked cooked 0.15 baked forgot 0.10

12

slide-23
SLIDE 23

word2vec with Substitute-based Contexts

Substitute Contexts t s wt,s baked put 0.50 baked baked 0.25 baked cooked 0.15 baked forgot 0.10 ∑

(t,s)∈PAIRS wt,s ·

( log σ(v′

s · vt) + ∑ neg∈NEGS(t,s) log σ(−v′ neg · vt)

)

13

slide-24
SLIDE 24

‘Flavors’ of Similarity

Top-5 closest words to ‘playing’ W-10 DEP SUB played play singing play played rehearsing plays understudying performing professionally caddying composing player plays running

Topical Functional

Small context windows also yield ‘functional’ similarity

14

slide-25
SLIDE 25

Intrinsic Evaluations - Word Similarity

Topical ( lion:zoo ) Functional ( lion:cat )

* Similar results for SimLex-999

  • Context type matters
  • Higher dimensionality is generally better

15

slide-26
SLIDE 26

Extrinsic Evaluations

Can we find similar patterns in extrinsic tasks?

16

slide-27
SLIDE 27

Extrinsic Evaluations

System Benchmark Stanford NN Dependency Parser PTB Chen & Manning (2014) Named Entity Recognition CoNLL-2003 Turian et al. (2010) shared task Co-reference Resolution CoNLL-2012 Durrett & Klein (2013) shared task Full features + embeddings Sentiment Analysis Sentence-level Average of embeddings Sentiment Treebank with logistic regression Socher et al. (2013) *Only dev-set experiments

17

slide-28
SLIDE 28

Extrinsic Evaluations - Parsing

  • Preference for ‘functional’ embeddings
  • Best performance at d = 50 (due to limited supervision?)

18

slide-29
SLIDE 29

Extrinsic Evaluations - Parsing

  • Similar context type preferences
  • But different dimensionality preferences

19

slide-30
SLIDE 30

Extrinsic Evaluations - NER

  • Best performance at d = 50
  • No clear context type preference

20

slide-31
SLIDE 31

Extrinsic Evaluations - Sentiment Analysis

  • No context type preference
  • Higher dimensionality is better

21

slide-32
SLIDE 32

Extrinsic Evaluations - Coreference Resolution

  • Small performance diffs (competitive non-embedding features)

22

slide-33
SLIDE 33

Extrinsic Evaluations - Summary

  • Correlation with

intrinsic results

  • Dimensionality

preferences

  • Context type

preferences

23

slide-34
SLIDE 34

Extrinsic Evaluations - Summary

  • Correlation with

intrinsic results

  • Dimensionality

preferences

  • Context type

preferences

23

slide-35
SLIDE 35

Extrinsic Evaluations - Summary

  • Correlation with

intrinsic results

  • Dimensionality

preferences

  • Context type

preferences

23

slide-36
SLIDE 36

Context Combinations

slide-37
SLIDE 37

Embeddings Concatenation

Let the classifier choose the valuable information:

boy girl dog dim1 dim2 boy girl dog dim1 dim2 dim3 dim4 boy girl dog dim1 dim2 25

slide-38
SLIDE 38

Concatenation

26

slide-39
SLIDE 39

Concatenation

Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’

27

slide-40
SLIDE 40

Concatenation

Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’

28

slide-41
SLIDE 41

Concatenation

‘Topical’+‘Functional’ concats worked best

  • W10 + SUB
  • W10 + W1
  • W10 + DEP

29

slide-42
SLIDE 42

Compressed Combinations

  • Compression via SVD or CCA degrades performance
  • Better let the task-specific classifier ‘choose’

the relevant information

30

slide-43
SLIDE 43

Conclusions

slide-44
SLIDE 44

Summary

  • Do intrinsic benchmarks predict extrinsic performance?

NO

  • Tune context type and dimensionality per extrinsic task?

YES

  • Can we benefit from combining different context types?

MAYBE

Thank you and happy cooking!

32

slide-45
SLIDE 45

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task?

YES

  • Can we benefit from combining different context types?

MAYBE

Thank you and happy cooking!

32

slide-46
SLIDE 46

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task?

YES

  • Can we benefit from combining different context types?

MAYBE

Thank you and happy cooking!

32

slide-47
SLIDE 47

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task? YES
  • Can we benefit from combining different context types?

MAYBE

Thank you and happy cooking!

32

slide-48
SLIDE 48

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task? YES
  • Can we benefit from combining different context types?

MAYBE

Thank you and happy cooking!

32

slide-49
SLIDE 49

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task? YES
  • Can we benefit from combining different context types? MAYBE

Thank you and happy cooking!

32

slide-50
SLIDE 50

Summary

  • Do intrinsic benchmarks predict extrinsic performance? NO
  • Tune context type and dimensionality per extrinsic task? YES
  • Can we benefit from combining different context types? MAYBE

Thank you and happy cooking!

32