Understanding the Downstream Instability of Word Embeddings Megan - - PowerPoint PPT Presentation

understanding the downstream instability of word
SMART_READER_LITE
LIVE PREVIEW

Understanding the Downstream Instability of Word Embeddings Megan - - PowerPoint PPT Presentation

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian Zhang, Sen Wu, Chris Aberger, Chris R Stanford University Motivation Re Recommend new conte tent Le Learn new words De Detect the latest


slide-1
SLIDE 1

Understanding the Downstream Instability of Word Embeddings

Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Chris Aberger, Chris Ré Stanford University

slide-2
SLIDE 2

Motivation

Re Recommend new conte tent De Detect the latest spam Le Learn new words

Model freshness is necessary for user satisfaction in many products.

2

Changing distribution of popular videos New spam techniques Out-of-vocabulary words

Why retrain?

slide-3
SLIDE 3

Google retrains their app store Google Play models every day, and Facebook retrains search models every hour.

3

[1] Baylor et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD, 2017. [2] Hazelwood et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. HPCA, 2018.

slide-4
SLIDE 4

But model training can be unstable…

Data 1 Data 1 + ∆ Predictions 1 Predictions 2

Unnecessary prediction changes!

4

Prediction churn

[1] Cormier et al. Launch and Iterate: Reducing Prediction Churn. NeurIPS, 2016.

slide-5
SLIDE 5

Challenges of Instability

5

Debugging Model dependencies

1 2 3 4

Consistent user-experience Research reliability

slide-6
SLIDE 6

Problem Setting: Embedding Server

6

Embedding Server

Named Entity Recognition (NER) Question Answering Sentiment Analysis Relation Extraction

Changing Data Downstream Tasks Refresh Embeddings

0.1 0.3 0.5 …

Embeddings are shared among downstream tasks. How does the embedding instability propagate to these tasks?

slide-7
SLIDE 7

Key takeaway: Stability–memory tension

With the right understanding, we can improve stability by over 30% — in the same amount of memory

7

slide-8
SLIDE 8

Outline

Q: How do we define downstream instability? A: % prediction disagreement Q: What embedding hyperparameters impact downstream instability? A: hyperparameters related to memory Q: How can we theoretically understand downstream instability? A: using our eigenspace instability measure (EIS) Q: How can we select embedding hyperparameters to minimize instability? A: using the EIS (or k-NN) measures

8

slide-9
SLIDE 9

Definition: Downstream Instability

9

Data 1 Data 1 + ∆ Predictions 1 Predictions 2

Downstream Instability

Emb 1 (") Emb 2 ( # ")

Downstream instability = % prediction disagreement between models trained

  • n a pair of embeddings

Metrics like instability are important for modularity.

slide-10
SLIDE 10

Outline

Q: How do we define downstream instability? A: % prediction disagreement Q: What embedding hyperparameters impact downstream instability? A: hyperparameters related to memory Q: How can we theoretically understand downstream instability? A: using our eigenspace instability measure (EIS) Q: How can we select embedding hyperparameters to minimize instability? A: using the EIS (or k-NN) measures

10

slide-11
SLIDE 11

Hyperparameters that Impact Memory

11

[1] May et al. On the downstream performance of compressed word embeddings. NeurIPS, 2019.

Precision

# bits / feature

Dimension

# features / word

Memory

0.04

  • 0.03
  • 0.08

0.1

  • 0.1
  • 0.1

In Interval: [-0. 0.1, 1, 0. 0.1] 1] 32 32-bi bit 1-bi bit

Downstream Instability

Un Unif iform rm Qu Quantization

slide-12
SLIDE 12

Impact of Dimension

12

Dimension Downstream Instability

Sentiment Analysis NER

slide-13
SLIDE 13

Impact of Precision

13

Precision Downstream Instability

Sentiment Analysis NER

slide-14
SLIDE 14

Stability-Memory Tradeoff

14

11%

Memory Downstream Instability

Sentiment Analysis NER

slide-15
SLIDE 15

Outline

Q: How do we define downstream instability? A: % prediction disagreement Q: What embedding hyperparameters impact downstream instability? A: hyperparameters related to memory Q: How can we theoretically understand downstream instability? A: using our eigenspace instability measure (EIS) Q: How can we select embedding hyperparameters to minimize instability? A: using the EIS (or k-NN) measures

15

slide-16
SLIDE 16

Goal: Embedding distance measure

16

Data 1 Data 1 + ∆ Predictions 1 Predictions 2

Downstream Instability

Emb 1 (") Emb 2 ( # ")

Distance (Emb1, Emb2)

The measure must relate the distance between the embeddings to the downstream instability.

slide-17
SLIDE 17

Eigenspace Instability Measure (EIS)

Key insight: The predictions of a linear regression model trained on an embedding ! depend on the left singular vectors of !.1

Emb mb (!) )

=

S VT U

17

Singular Value Decomposition

[1] May et al. On the downstream performance of compressed word embeddings. NeurIPS, 2019.

slide-18
SLIDE 18

Eigenspace Instability Measure (EIS)

§EIS measures the similarity of the left singular vectors of two embeddings For embeddings ! and " !, EIS ($, " $ ) = similarity(&, " &) §Can be computed in time ' ()!

  • ( is the size of vocabulary and ) is the dimension

18

slide-19
SLIDE 19

Eigenspace Instability Measure (EIS)

Theorem (informal): EIS is equal to the expected mean-squared difference between the predictions of the linear models trained on X and " X. Direct theoretical connection between the EIS measure and the downstream instability.

19

slide-20
SLIDE 20

Outline

Q: How do we define downstream instability? A: % prediction disagreement Q: What embedding hyperparameters impact downstream instability? A: hyperparameters related to memory Q: How can we theoretically understand downstream instability? A: using our eigenspace instability measure (EIS) Q: How can we select embedding hyperparameters to minimize instability? A: using the EIS (or k-NN) measures

20

slide-21
SLIDE 21

Embedding measure for downstream instability?

21

§ EIS measure § k-NN measure [1,2,3] § Semantic displacement (SD) [4] § PIP loss [5] § Eigenspace overlap (EO) [6]

[1] Hellrich & Hahn, COLING, 2016; [2] Antoniak & Mimno, TACL, 2018; [3] Wendlandt et al., NAACL-HLT, 2018; [4] Hamilton et al., ACL, 2016; [5] Yin & Shen, NeurIPS, 2018; [6] May et al., NeurIPS, 2019

slide-22
SLIDE 22

Correlation with Downstream Instability

EIS and k-NN measures strongly correlate with downstream instability.

22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Eigenspace Instability (EIS) 1 - k-NN Measure Semantic Displacement PIP Loss 1 - Eigenspace Overlap Spearman Correlation

slide-23
SLIDE 23

Selection Task Setup

23

Memory % Disagreement

100-dim, 1-bit 25-dim, 4-bit 50-dim, 2-bit

§ Use embedding distance measure to select hyperparameters for a fixed memory budget § Record the difference in downstream instability to the oracle hyperparameters

Choices

Oracle

slide-24
SLIDE 24

Selection Task Results

24

EIS and k-NN measures outperform other measures as selection criteria.

0.5 1 1.5 2 2.5 3 3.5 SST-2 MR SUBJ MPQA CoNLL-2003

  • Diff. to Oracle (Abs. %)

EIS 1 - k-NN SD PIP 1 - EO

slide-25
SLIDE 25

Our theoretically grounded measure improves the stability up to 34% over a full precision baseline in the same amount of memory.

25

slide-26
SLIDE 26

Stability-Memory Tension on KG Embeddings

26

Memory Downstream Instability

slide-27
SLIDE 27

Conclusion

§Exposed a stability-memory tradeoff for word embeddings. §Proposed the EIS measure to understand downstream instability. §Evaluated measures for hyperparameter selection to minimize instability. Check out the paper for extended experiments with more embedding algorithms and downstream tasks! Code: Comments or Questions: http://bit.ly/embstability mleszczy@stanford.edu

27