On the Downstream Performance of Compressed Word Embeddings Avner - - PowerPoint PPT Presentation

on the downstream performance of compressed word
SMART_READER_LITE
LIVE PREVIEW

On the Downstream Performance of Compressed Word Embeddings Avner - - PowerPoint PPT Presentation

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19. On the Downstream Performance of Compressed Word Embeddings Avner May, Jian Zhang, Tri Dao, Chris R Stanford University On the Downstream Performance of


slide-1
SLIDE 1

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

On the Downstream Performance of Compressed Word Embeddings

Avner May, Jian Zhang, Tri Dao, Chris Ré Stanford University

slide-2
SLIDE 2

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Word Embeddings

2

slide-3
SLIDE 3

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Word Embeddings

Important for strong NLP performance

2

slide-4
SLIDE 4

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Word Embeddings

Important for strong NLP performance Take a lot of memory

2

slide-5
SLIDE 5

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Word Embedding Compression

3

slide-6
SLIDE 6

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

What determines whether a compressed embedding matrix will perform well on downstream tasks?

4

slide-7
SLIDE 7

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

What determines whether a compressed embedding matrix will perform well on downstream tasks?

Train model

4

slide-8
SLIDE 8

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

What determines whether a compressed embedding matrix will perform well on downstream tasks?

Train model

4

slide-9
SLIDE 9

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

What determines whether a compressed embedding matrix will perform well on downstream tasks?

Train model

??

Train model

4

slide-10
SLIDE 10

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Motivating Observation

Existing ways of measuring compression quality

  • ften fail to explain relative downstream performance.

5

slide-11
SLIDE 11

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Motivating Observation

Existing ways of measuring compression quality

  • ften fail to explain relative downstream performance.

5

Better compression quality measure

slide-12
SLIDE 12

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Motivating Observation

Existing ways of measuring compression quality

  • ften fail to explain relative downstream performance.

5

Better compression quality measure Worse downstream performance

slide-13
SLIDE 13

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Our Contributions: Outline

6

slide-14
SLIDE 14

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Define a new measure of compression quality.

Our Contributions: Outline

1

6

slide-15
SLIDE 15

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Define a new measure of compression quality. Prove generalization bounds using this measure.

Our Contributions: Outline

1 2

6

slide-16
SLIDE 16

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Define a new measure of compression quality. Prove generalization bounds using this measure. Show strong empirical correlation w. downstream performance.

Our Contributions: Outline

1 2 3

6

slide-17
SLIDE 17

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Define a new measure of compression quality. Prove generalization bounds using this measure. Show strong empirical correlation w. downstream performance. Use measure to select compressed embeddings.

Our Contributions: Outline

1 2 3 4

6

slide-18
SLIDE 18

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Define a new measure of compression quality. Prove generalization bounds using this measure. Show strong empirical correlation w. downstream performance. Use measure to select compressed embeddings.

Our Contributions: Outline

1 2 3 4

Up to 2x lower selection error rates than the next best measure.

6

slide-19
SLIDE 19

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Observation:

Predictions are determined by data matrix’s left singular vectors.

Defining the Measure: Intuition from Linear Regression

7

slide-20
SLIDE 20

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Intuition from Linear Regression

Embed. matrix

=

Singular Value Decomposition

7

Observation:

Predictions are determined by data matrix’s left singular vectors.

slide-21
SLIDE 21

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Intuition from Linear Regression

Embed. matrix

=

Singular Value Decomposition

7

Observation:

Predictions are determined by data matrix’s left singular vectors.

slide-22
SLIDE 22

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Intuition from Linear Regression

Embed. matrix

=

Regression label Singular Value Decomposition

y

7

Observation:

Predictions are determined by data matrix’s left singular vectors.

slide-23
SLIDE 23

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Intuition from Linear Regression

Embed. matrix

=

Regression label

Linear regressor predictions

Project y onto span of left singular vectors Singular Value Decomposition

y

7

Observation:

Predictions are determined by data matrix’s left singular vectors.

slide-24
SLIDE 24

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Eigenspace Overlap Score (EOS)

Intuition:

Measures similarity between the span of left singular vectors.

8

slide-25
SLIDE 25

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defining the Measure: Eigenspace Overlap Score (EOS)

Compressed

  • embed. SVD

Uncompressed

  • embed. SVD

Eigenspace

  • verlap score

Intuition:

Measures similarity between the span of left singular vectors.

EOS

8

slide-26
SLIDE 26

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Theoretical Results: Linear Regression

Theorem (informal): Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS.

9

slide-27
SLIDE 27

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Theoretical Results: Linear Regression

Theorem (informal): Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS.

9

Higher EOS

slide-28
SLIDE 28

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Theoretical Results: Linear Regression

Theorem (informal): Expected difference in test mean-squared error attained by compressed vs. uncompressed embeddings is determined by EOS.

9

Higher EOS Better downstream performance

slide-29
SLIDE 29

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains strong correlation with downstream model accuracy.

Empirical Correlation: Beyond Linear Regression

10

slide-30
SLIDE 30

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains strong correlation with downstream model accuracy.

Empirical Correlation: Beyond Linear Regression

Higher accuracy

EOS

Higher quality

10

slide-31
SLIDE 31

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains strong correlation with downstream model accuracy.

Empirical Correlation: Beyond Linear Regression

Higher accuracy

EOS

Higher quality

10

slide-32
SLIDE 32

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains strong correlation with downstream model accuracy.

Empirical Correlation: Beyond Linear Regression

Higher accuracy

[1] Yin and Shen, On the Dimensionality of Word Embeddings. NeurIPS 2018.

EOS

  • Neg. PIP Loss [1]

Higher quality Higher quality

10

slide-33
SLIDE 33

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains up to 2x lower selection error rates than 2nd best.

EOS as a Selection Criterion

11

slide-34
SLIDE 34

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains up to 2x lower selection error rates than 2nd best.

EOS as a Selection Criterion

[1] Avron et al., ICML 2017. [2] Yin and Shen. NeurIPS 2018. [3] Zhang et al., AISTATS 2019.

Selection Error Rate (%) NLP Tasks

11

slide-35
SLIDE 35

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

EOS attains up to 2x lower selection error rates than 2nd best.

EOS as a Selection Criterion

[1] Avron et al., ICML 2017. [2] Yin and Shen. NeurIPS 2018. [3] Zhang et al., AISTATS 2019.

Selection Error Rate (%) NLP Tasks

11

slide-36
SLIDE 36

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Our Contributions: Summary

12

slide-37
SLIDE 37

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defined a new measure of compression quality.

Our Contributions: Summary

1

12

slide-38
SLIDE 38

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defined a new measure of compression quality. Proved generalization bounds using this measure.

Our Contributions: Summary

1 2

12

slide-39
SLIDE 39

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defined a new measure of compression quality. Proved generalization bounds using this measure. Showed strong empirical correlation w. downstream perf.

Our Contributions: Summary

1 2 3

12

slide-40
SLIDE 40

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

Defined a new measure of compression quality. Proved generalization bounds using this measure. Showed strong empirical correlation w. downstream perf. Used measure to select compressed embeddings.

Our Contributions: Summary

1 2 3 4

12

slide-41
SLIDE 41

On the Downstream Performance of Compressed Word Embeddings. NeurIPS Spotlight 12/12/19.

THANK YOU! Poster #185, 5-7 pm today!

Paper: https://arxiv.org/pdf/1909.01264.pdf Code: https://github.com/HazyResearch/smallfry E-mail: avnermay@cs.stanford.edu

13