Characterizing the impact of geometric properties of word embeddings - - PowerPoint PPT Presentation

characterizing the impact of geometric properties of word
SMART_READER_LITE
LIVE PREVIEW

Characterizing the impact of geometric properties of word embeddings - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . Characterizing the impact of geometric properties of word embeddings on task performance Brendan Whitaker, Denis Newman-Griffjs, Aparajita Haldar Hakan Ferhatosmanoglu, Eric Fosler-Lussier Ohio State


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Characterizing the impact of geometric properties of word embeddings on task performance

Brendan Whitaker, Denis Newman-Griffjs, Aparajita Haldar Hakan Ferhatosmanoglu, Eric Fosler-Lussier

Ohio State University University of Warwick

June 4, 2019

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 1 / 16

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Objective

Question

What geometric properties of an embedding space are important for performance on a given task? Understand utility of embeddings as input features. Provide direction for future work in training and tuning embeddings.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 2 / 16

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Objective

Question

What geometric properties of an embedding space are important for performance on a given task? Understand utility of embeddings as input features. Provide direction for future work in training and tuning embeddings.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 2 / 16

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Embedding space?

In NLP, the term embedding is often used to denote both a map and (an element of) its image.

Defjnition

We defjne an embedding space as a set of word vectors in Rd.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 3 / 16

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Geometric properties?

We consider the following attributes of word embedding geometry: position relative to the origin; distribution of feature values in Rd; global pairwise distances; local pairwise distances.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 4 / 16

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Our approach

Ablation Study

We transform the embedding space such that we expose only a subset of the stated properties to downstream models. position relative to the origin; distribution of feature values in Rd; global pairwise distances; local pairwise distances.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 5 / 16

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Affjne

  • pos. relative to the origin

distribution of features global distances local distances Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 6 / 16

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cosine distance embedding (CDE)

Specs: Activation function: ReLU; Epochs: 50; d = embedding dimension (300); |V|∗ = distance vector dimension (104 most frequent words).

  • pos. relative to the origin

distribution of features global distances local distances Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 7 / 16

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nearest neighbor embedding (NNE)

  • pos. relative to the origin

distribution of features global distances local distances Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 8 / 16

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Hierarchy of transformations

Ordering is with respect to number of properties ablated. We include a random baseline of meaningless vectors. Arrow length does not mean anything. Transformations are applied independently to the original embeddings.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 9 / 16

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Embeddings and Tasks

Standard benchmark embeddings: Word2Vec on Google news; GloVe on common crawl; FastText on WikiNews. Testing: 10 standard intrinsic tasks. 5 extrinsic tasks (embeddings plugged into a downstream machine learning model).

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 10 / 16

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Tasks

Intrinsic Tasks Word Similarity and Relatedness via cosine distance

WordSim353 SimLex-999 RareWords RG65 MEN MTURK

Word Categorization

AP BLESS Battig ESSLLI

Extrinsic Tasks Relation classif. on SemEval-2010 Task 8 Sentence-level sentiment polarity classif. on MR movie reviews Sentiment classif. on IMDB reviews Subj./Obj. classif. on Rotten Tomatoes snippets SNLI

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 11 / 16

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Results - intrinsic tasks

We see the lowest performance

  • n thresholded-NNE.

Largest drop in performance at CDE (written as distAE on the graph). Rotations, dilations, and refmections are innocuous. Displacing the origin has a nontrivial efgect. NNE causes a signifjcant drop in performance as well.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 12 / 16

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Results - extrinsic tasks

CDE is still the largest drop. NNE recover most of the losses, and are on par with affjnes. Extrinsic tasks are more robust to translations, but not homotheties.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 13 / 16

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discussion

Drop due to CDE likely associated with the importance of locality in embedding learning. With thresholded-NNE, high out-degree words are rare words, introducing noise during node2vec’s random walk.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 14 / 16

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Takeaways

We fjnd that in general, both intrinsic and extrinsic models rely heavily on local similarity, as opposed to global distance information. We also fjnd that intrinsic models are more sensitive to absolute position than extrinsic ones. Methods for tuning and training should focus on local geometric structure in Rd.

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 15 / 16

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Questions.

Questions? github.com/OSU-slatelab/geometric-embedding-properties

Whitaker, Newman-Griffjs, Haldar, et al. Characterizing Embedding Geometry June 4, 2019 16 / 16