The Role of Context Types and Dimensionality in Learning Word - PowerPoint PPT Presentation

Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings

Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?

Easy to obtain • Off-the-shelf • Do-it-yourself toolkits so many choices... 3 Plethora of Word Embeddings

Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

5 2. Computational model 4. Post-processing • word2vec 3. Output 1. Input • Wikipedia + Gigaword + UMBC (web) Our Focus Choices we explore: • Context type (BOW-N, syntactic, substitute) • Dimensionality (is higher always better?) • Embeddings combinations (concat, SVD, CCA) Evaluated extensively on intrinsic and extrinsic tasks

Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus

• Based on n -gram language modeling A new word2vec context type (substitute-based) 7 Additional Contribution

• Combining context types • Context types and dimensionality • Conclusions 8 Outline

Context Types and Dimensionality

10 the prep_in:oven baked dobj:cake baked nsubj:chef baked c t Dependency Contexts cake baked baked chef baked Italian baked c t BOW-2 Contexts Common Context Types prep_in nsubj dobj The Italian chef baked the cake in the oven BOW-2

11 Dependency Contexts prep_in:oven baked dobj:cake baked nsubj:chef BOW-2 Contexts c t baked cake Italian baked c baked t baked chef baked the Learning word2vec Skip-gram Embeddings ( ) ∑ log σ ( v ′ c · v t ) + ∑ neg ∈ NEGS ( t , c ) log σ ( − v ′ neg · v t ) ( t , c ) ∈ PAIRS

12 baked 0.10 forgot baked 0.15 cooked baked 0.25 baked Potential substitutes encode the context (Yuret, 2012) 0.50 put baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven

12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef _____ the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot

12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot w t , s

13 0.25 0.10 forgot baked Substitute Contexts cooked baked 0.15 baked t 0.50 put baked baked s word2vec with Substitute-based Contexts w t , s ( ) ( t , s ) ∈ PAIRS w t , s · ∑ log σ ( v ′ s · v t ) + ∑ neg ∈ NEGS ( t , s ) log σ ( − v ′ neg · v t )

14 plays Small context windows also yield ‘functional’ similarity running plays player composing caddying professionally performing understudying rehearsing played play singing play played SUB DEP W-10 ‘Flavors’ of Similarity Top-5 closest words to ‘ playing ’ Topical Functional

Topical ( lion:zoo ) Functional ( lion:cat ) * Similar results for SimLex-999 • Context type matters • Higher dimensionality is generally better 15 Intrinsic Evaluations - Word Similarity

16 Can we find similar patterns in extrinsic tasks? Extrinsic Evaluations

17 Durrett & Klein (2013) *Only dev-set experiments Socher et al. (2013) with logistic regression Sentiment Treebank Average of embeddings Sentence-level Full features + embeddings shared task CoNLL-2012 System shared task Turian et al. (2010) CoNLL-2003 Chen & Manning (2014) PTB Benchmark Extrinsic Evaluations Stanford NN Dependency Parser Named Entity Recognition Co-reference Resolution Sentiment Analysis

• Preference for ‘functional’ embeddings 18 Extrinsic Evaluations - Parsing • Best performance at d = 50 (due to limited supervision?)

• But different dimensionality preferences • Similar context type preferences 19 Extrinsic Evaluations - Parsing

• No clear context type preference 20 Extrinsic Evaluations - NER • Best performance at d = 50

• Higher dimensionality is better • No context type preference 21 Extrinsic Evaluations - Sentiment Analysis

22 • Small performance diffs (competitive non-embedding features) Extrinsic Evaluations - Coreference Resolution

• Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary

Context Combinations

25 Let the classifier choose the valuable information: Embeddings Concatenation boy girl dog dim1 boy girl dog dim2 dim1 dim2 dim3 boy girl dog dim4 dim1 dim2

26 Concatenation

27 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation

28 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation

‘Topical’+‘Functional’ concats worked best • W10 + SUB • W10 + W1 • W10 + DEP 29 Concatenation

• Better let the task-specific classifier ‘choose’ • Compression via SVD or CCA degrades performance the relevant information 30 Compressed Combinations

Conclusions

YES MAYBE NO • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary

YES MAYBE • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO

MAYBE YES • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO

The Role of Context Types and Dimensionality in Learning Word - PowerPoint PPT Presentation

Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings Useful in supervised tasks: As pre-training initialization With limited supervised

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Types Dynamic types Types are broken down into many categories Static types Duck typing

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Intrinsic Motivation Ho How to to G Get et You our r Kid ids s Mo Moti tivate ted d

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

7b Swedish: Technique Demo and Practice - Posterior Lower Body 7b Swedish: Technique Demo and

Behavioral Perspective Third level Fourth level Fifth level David Anderson, PhD LP

PEER TO PEER FUNDRAISING Why, Why Not, How? WHO ARE WE? A little bit about your presenters Beth

Observations from the Financial World Types of Real Uncertainty Complexity and the Nature of

Enhanced Network Topology For Improved System Availability Availability Mark Enright Tyco