Understanding the Downstream Instability of Word Embeddings Megan - PowerPoint PPT Presentation

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian Zhang, Sen Wu, Chris Aberger, Chris Ré Stanford University

Motivation Re Recommend new conte tent Le Learn new words De Detect the latest spam Why retrain? Changing distribution of popular videos New spam techniques Out-of-vocabulary words Model freshness is necessary for user satisfaction in many products. 2

Google retrains their app store Google Play models every day , and Facebook retrains search models every hour. [1] Baylor et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD, 2017. [2] Hazelwood et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. HPCA, 2018. 3

But model training can be unstable… Predictions 1 Data 1 Unnecessary prediction changes! Data 1 + ∆ Predictions 2 Prediction churn [1] Cormier et al. Launch and Iterate: Reducing Prediction Churn. NeurIPS, 2016. 4

Challenges of Instability Debugging Consistent user-experience Model dependencies Research reliability 2 4 1 3 5

Problem Setting: Embedding Server Downstream Tasks Changing Data Named Entity Recognition (NER) Question Answering Embedding Refresh Embeddings Server Sentiment Analysis 0.1 0.3 Relation Extraction 0.5 … Embeddings are shared among downstream tasks. How does the embedding instability propagate to these tasks? 6

Key takeaway: Stability–memory tension With the right understanding , we can improve stability by over 30% — in the same amount of memory 7

Outline Q: How do we define downstream instability? A: % prediction disagreement Q: What embedding hyperparameters impact downstream instability? A: hyperparameters related to memory Q: How can we theoretically understand downstream instability? A: using our eigenspace instability measure (EIS) Q: How can we select embedding hyperparameters to minimize instability? A: using the EIS (or k-NN) measures 8

Definition: Downstream Instability Emb 1 ( " ) Predictions 1 Data 1 Downstream Instability Emb 2 ( # Data 1 + ∆ " ) Predictions 2 Downstream instability = % prediction disagreement between models trained on a pair of embeddings Metrics like instability are important for modularity. 9

Hyperparameters that Impact Memory Dimension Precision Memory # features / word # bits / feature 32 32-bi bit 1-bi bit Unif Un iform rm Downstream Qu Quantization 0.04 0.1 -0.03 -0.1 In Interval: Instability [-0. 0.1, 1, 0. 0.1] 1] -0.08 -0.1 [1] May et al. On the downstream performance of compressed word embeddings. NeurIPS, 2019. 11

Impact of Dimension Sentiment Analysis NER Dimension Downstream Instability 12

Impact of Precision Sentiment Analysis NER Precision Downstream Instability 13

Stability-Memory Tradeoff Sentiment Analysis NER 11% Memory Downstream Instability 14

Goal: Embedding distance measure Emb 1 ( " ) Predictions 1 Data 1 Distance (Emb1, Emb2) Downstream Instability Emb 2 ( # Data 1 + ∆ " ) Predictions 2 The measure must relate the distance between the embeddings to the downstream instability. 16

Eigenspace Instability Measure (EIS) Key insight: The predictions of a linear regression model trained on an embedding ! depend on the left singular vectors of ! . 1 S V T Singular Value Emb mb = U Decomposition ( ! ) ) [1] May et al. On the downstream performance of compressed word embeddings. NeurIPS, 2019. 17

Eigenspace Instability Measure (EIS) § EIS measures the similarity of the left singular vectors of two embeddings For embeddings ! and " !, EIS ( $, " $ ) = similarity( &, " & ) § Can be computed in time ' () ! - ( is the size of vocabulary and ) is the dimension 18

Eigenspace Instability Measure (EIS) Theorem (informal): EIS is equal to the expected mean-squared difference between the predictions of the linear models trained on X and " X. Direct theoretical connection between the EIS measure and the downstream instability. 19

Embedding measure for downstream instability? § EIS measure § k-NN measure [1,2,3] § Semantic displacement (SD) [4] § PIP loss [5] § Eigenspace overlap (EO) [6] [1] Hellrich & Hahn, COLING, 2016; [2] Antoniak & Mimno, TACL, 2018; [3] Wendlandt et al., NAACL-HLT, 2018; [4] Hamilton et al., ACL, 2016; [5] Yin & Shen, NeurIPS, 2018; [6] May et al., NeurIPS, 2019 21

Correlation with Downstream Instability 0.9 0.8 Spearman Correlation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Eigenspace 1 - k-NN Semantic PIP Loss 1 - Eigenspace Instability Measure Displacement Overlap (EIS) EIS and k-NN measures strongly correlate with downstream instability. 22

Selection Task Setup § Use embedding distance 100-dim, 1-bit measure to select % Disagreement hyperparameters for a fixed 50-dim, 2-bit Choices memory budget 25-dim, 4-bit § Record the difference in Oracle downstream instability to the oracle hyperparameters Memory 23

Selection Task Results EIS 1 - k-NN SD PIP 1 - EO 3.5 Diff. to Oracle (Abs. %) 3 2.5 2 1.5 1 0.5 0 SST-2 MR SUBJ MPQA CoNLL-2003 EIS and k-NN measures outperform other measures as selection criteria. 24

Our theoretically grounded measure improves the stability up to 34% over a full precision baseline in the same amount of memory . 25

Stability-Memory Tension on KG Embeddings Memory Downstream Instability 26

Conclusion § Exposed a stability-memory tradeoff for word embeddings. § Proposed the EIS measure to understand downstream instability. § Evaluated measures for hyperparameter selection to minimize instability. Check out the paper for extended experiments with more embedding algorithms and downstream tasks! Code: Comments or Questions: http://bit.ly/embstability mleszczy@stanford.edu 27

Understanding the Downstream Instability of Word Embeddings Megan - PowerPoint PPT Presentation

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian Zhang, Sen Wu, Chris Aberger, Chris R Stanford University Motivation Re Recommend new conte tent Le Learn new words De Detect the latest

Hershey Mill Dam Looking Downstream from East Embankment Hershey Mill Dam Looking Downstream from

On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris

SARIMS BREAKFAST NOVEM NOVEMBER BER 6, 2014 6, 2014 DOWNSTREAM PROPE DOWNSTREAM PROPERTY

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Magnetic fields generated by the Weibel Instability C. M. Ryu POSTECH, KOREA FFP14 Marseille

Fluid Dynamics Simulation of Rayleigh-Taylor Instability Xinwei Li, Xiaoyi Xie, Yu Guo 1

Downstream Users Point of View Data Protection for Downstream Users Dr. Tibor Mller

Downstream user chemical safety report Downstream user update 21 October 2015 Bridget Ginnity

Scotlands Census Downstream Processing Operational Outline Head of Downstream Processing Unit

Synchrotron radiation downstream Synchrotron radiation downstream of relativistic shocks of

Transitioning Apples Downstream Repositories To The Monorepo Alex Lorenz Apple

Indications for Shoulder Instability Surgery Anthony Miniaci M.D. FRCSC Professor of Surgery

Saturation of the f -mode instability in neutron stars Pantelis Pnigouras in collab. with K. D.

Fast Ion Instability at CESR-TA Avishek Chatterjee (Post-doc at DPNC, formerly at Cornell)

Managing Conflict and Instability in Africa Jack May Jack.May@dia.mil Defense Intelligence

Goals Traumatic, Locked, Anterior Shoulder Dislocation Define traumatic anterior shoulder

Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH From idea to objective

Graph Embeddings in Practice: A Telco Churn Prediction Use Case PhD Researcher: Sandra Mitrovi

Simons Center for Communication, Information & Network Mathematics UT Austin Wiopt 2017,

Exploring Characteristics of Code Churn @JMKraaijeveld @EricBouwers Time Activities Code Churn

V ANTAGE : S CALABLE AND E FFICIENT F INE -G RAIN C ACHE P ARTITIONING Daniel Sanchez and Christos

The Influence of Organizational Structure on Software Quality: An Empirical Case Study

Verification of Implementations of Distributed Systems under Churn Ryan Doenges , James R. Wilcox,

Understanding the Downstream Instability of Word Embeddings Megan - PowerPoint PPT Presentation

Understanding the Downstream Instability of Word Embeddings Megan Leszczynski , Avner May, Jian Zhang, Sen Wu, Chris Aberger, Chris R Stanford University Motivation Re Recommend new conte tent Le Learn new words De Detect the latest

Hershey Mill Dam Looking Downstream from East Embankment Hershey Mill Dam Looking Downstream from

On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris

SARIMS BREAKFAST NOVEM NOVEMBER BER 6, 2014 6, 2014 DOWNSTREAM PROPE DOWNSTREAM PROPERTY

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Magnetic fields generated by the Weibel Instability C. M. Ryu POSTECH, KOREA FFP14 Marseille

Fluid Dynamics Simulation of Rayleigh-Taylor Instability Xinwei Li, Xiaoyi Xie, Yu Guo 1

Downstream Users Point of View Data Protection for Downstream Users Dr. Tibor Mller

Downstream user chemical safety report Downstream user update 21 October 2015 Bridget Ginnity

Scotlands Census Downstream Processing Operational Outline Head of Downstream Processing Unit

Synchrotron radiation downstream Synchrotron radiation downstream of relativistic shocks of

Transitioning Apples Downstream Repositories To The Monorepo Alex Lorenz Apple

Indications for Shoulder Instability Surgery Anthony Miniaci M.D. FRCSC Professor of Surgery

Saturation of the f -mode instability in neutron stars Pantelis Pnigouras in collab. with K. D.

Fast Ion Instability at CESR-TA Avishek Chatterjee (Post-doc at DPNC, formerly at Cornell)

Managing Conflict and Instability in Africa Jack May Jack.May@dia.mil Defense Intelligence

Goals Traumatic, Locked, Anterior Shoulder Dislocation Define traumatic anterior shoulder

Peer-to-peer systems and Data location overlay networks Churn Newscast algorithm

A field guide to the machine learning zoo Theodore Vasiloudis SICS/KTH From idea to objective

Graph Embeddings in Practice: A Telco Churn Prediction Use Case PhD Researcher: Sandra Mitrovi

Simons Center for Communication, Information &amp; Network Mathematics UT Austin Wiopt 2017,

Exploring Characteristics of Code Churn @JMKraaijeveld @EricBouwers Time Activities Code Churn

V ANTAGE : S CALABLE AND E FFICIENT F INE -G RAIN C ACHE P ARTITIONING Daniel Sanchez and Christos

The Influence of Organizational Structure on Software Quality: An Empirical Case Study

Verification of Implementations of Distributed Systems under Churn Ryan Doenges , James R. Wilcox,

Simons Center for Communication, Information & Network Mathematics UT Austin Wiopt 2017,