A A La Carte Emb mbedding: Ch Cheap but Effective Induction on - PowerPoint PPT Presentation

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature Vector ors Mikhail Khodak* ,1 , Nikunj Saunshi *,1 , Yingyu Liang 2 , Tengyu Ma 3 , Brandon Stewart 1 , Sanjeev Arora 1 1: Princeton University, 2: University of Wisconsin-Madison, 3: FAIR/Stanford University

ACL 2018 Motivations Distributed representations for words / text have had lots of successes in NLP (language models, machine translation, text classification)

ACL 2018 Motivations Distributed representations for words / text have had lots of successes in NLP (language models, machine translation, text classification) Motivations for our work: • Can we induce embeddings for all kinds of features, especially those with very few occurrences (e.g. ngrams, rare words)

ACL 2018 Motivations Distributed representations for words / text have had lots of successes in NLP (language models, machine translation, text classification) Motivations for our work: • Can we induce embeddings for all kinds of features, especially those with very few occurrences (e.g. ngrams, rare words) • Can we develop simple methods for unsupervised text embedding that compete well with state-of-the-art LSTM methods

ACL 2018 Motivations Distributed representations for words / text have had lots of successes in NLP (language models, machine translation, text classification) We make progress on both problems - Simple and efficient method for embedding features Motivations for our work: (ngrams, rare words, synsets) • Can we induce embeddings for all kinds of features, especially those with very - Simple text embeddings using ngram embeddings which few occurrences (e.g. ngrams, rare words) perform well on classification tasks • Can we develop simple methods for unsupervised text embedding that compete well with state-of-the-art LSTM methods

ACL 2018 Word embeddings • Core idea: Cooccurring words are trained to have high inner product • E.g. LSA, word2vec, GloVe and variants

ACL 2018 Word embeddings • Core idea: Cooccurring words are trained to have high inner product • E.g. LSA, word2vec, GloVe and variants • Require few passes over a very large text corpus and do non-convex optimization word Optimizing ! " ∈ ℝ % corpus embeddings objective

ACL 2018 Word embeddings • Core idea: Cooccurring words are trained to have high inner product • E.g. LSA, word2vec, GloVe and variants • Require few passes over a very large text corpus and do non-convex optimization word Optimizing ! " ∈ ℝ % corpus embeddings objective • Used for solving analogies, language models, machine translation, text classification …

ACL 2018 Feature embeddings • Capturing meaning of other natural language features • E.g. ngrams, phrases, sentences, annotated words, synsets

ACL 2018 Feature embeddings • Capturing meaning of other natural language features • E.g. ngrams, phrases, sentences, annotated words, synsets • Interesting setting: features with zero or few occurrences

ACL 2018 Feature embeddings • Capturing meaning of other natural language features • E.g. ngrams, phrases, sentences, annotated words, synsets • Interesting setting: features with zero or few occurrences • One approach (extension of word embeddings): Learn embeddings for all features in a text corpus feature Optimizing ! " ∈ ℝ % corpus embeddings objective

ACL 2018 Feature embeddings Issues • Usually need to learn embeddings for all features together • Need to learn many parameters • Computation cost paid is prix fixe rather than à la carte • Bad quality for rare features

ACL 2018 Feature embeddings Firth revisited: Feature derives meaning from words around it

ACL 2018 Feature embeddings Firth revisited: Feature derives meaning from words around it Given a feature ! and one (few) context(s) of words around it, can we find a reliable embedding for ! efficiently?

ACL 2018 Feature embeddings Firth revisited: Feature derives meaning from words around it Given a feature ! and one (few) context(s) of words around it, can we find a reliable embedding for ! efficiently? Scientists attending ACL work on cutting edge research in NLP Petrichor : the earthy scent produce when rain falls on dry soil Roger Federer won the first set NN of the match

ACL 2018 Problem setup Given: Text corpus and high quality word embeddings trained on it & ' ∈ ℝ * + & + ∈ ℝ * ! # … "%# … Algorithm f ! ! ! $ " Output: Good quality embedding Input: A feature in context(s) for the feature

ACL 2018 Linear approach • Given a feature f and words in a context c around it #$% = 1 ! " |)| * ! + +∈-

ACL 2018 Linear approach • Given a feature f and words in a context c around it #$% = 1 ! " |)| * ! + +∈- • Issues • stop words (“is”, “the”) are frequent but are less informative • Word vectors tend to share common components which will be amplified

ACL 2018 Potential fixes • Ignore stop words

ACL 2018 Potential fixes • Ignore stop words • SIF weights 1 : Down-weight frequent words (similar to tf-idf) , ! " = 1 + ( = |&| ' + ( ! ( , + . ( . ( is frequency of w in corpus (∈* 1: Arora et al. ’17

ACL 2018 Potential fixes • Ignore stop words • SIF weights 1 : Down-weight frequent words (similar to tf-idf) A ! " = 1 + ( = |&| ' + ( ! ( A + / ( / ( is frequency of w in corpus (∈* • All-but-the-top 2 : Remove the component of top direction from word vectors , = -./_1234&-2.5 ! ( ! " = 1 6 = ; − ,, = ! ( >?@ |&| ' ! ( 6 = 347.!4_&.7/.545-(! ( , ,) ! ( (∈* 1: Arora et al. ’17, 2: Mu et al. ‘18

ACL 2018 Our more general approach • Down-weighting and removing directions can be achieved by matrix multiplication ! " ≈ $ 1 ,-. Induced Embedding & ' ! ( = $! " (∈* Induction Matrix

ACL 2018 Our more general approach • Down-weighting and removing directions can be achieved by matrix multiplication " # ≈ ! 1 ,-. Induced Embedding & ' " ( = !" # (∈* Induction Matrix • Learn ! by using words as features ! ∗ = 012345 6 ' ,-. | 9 9 |" ( − !" ( ( • Learn ! by linear regression and is unsupervised

ACL 2018 Theoretical justification • [Arora et al. TACL ’18] prove that under a generative model for text, there exists a matrix ! which satisfies %&' " # ≈ !" #

ACL 2018 Theoretical justification • [Arora et al. TACL ’18] prove that under a generative model for text, there exists a matrix ! which satisfies %&' " # ≈ !" # • Empirically we find that the best ! ∗ recovers the original word vectors %&' ≥ 0.9 )*+,-. " # , ! ∗ " #

ACL 2018 A la carte embeddings + . , 1. Learn induction matrix ! ∗ = $%&'() * + 012 | 3 3 |. , − !. , Linear , Regression 4 ∗

ACL 2018 A la carte embeddings + 3 1 1. Learn induction matrix & ∗ = )*+,-. / 0 567 | 8 8 |3 1 − &3 1 Linear ! # 1 Regression . . . 2. A la carte embeddings ! " 5:; ? ∗ + 3 9 f 1 567 = & ∗ 5:; = & ∗ 3 9 3 9 |=| 0 3 1 ! "%# . 1∈; . . ! $

ACL 2018 A la carte embeddings Only once !! + 3 1 1. Learn induction matrix & ∗ = )*+,-. / 0 567 | 8 8 |3 1 − &3 1 Linear ! # 1 Regression . . . 2. A la carte embeddings ! " 5:; ? ∗ + 3 9 f 1 567 = & ∗ 5:; = & ∗ 3 9 3 9 |=| 0 3 1 ! "%# . 1∈; . . ! $

ACL 2018 Advantages • à la carte: Compute embedding only for given feature • Simple optimization: Linear regression • Computational efficiency: One pass over corpus and contexts • Sample efficiency: Learn only ! " parameters for # ∗ (rather than %! ) • Versatility: Works for any feature which has at least 1 context

Effect of induction matrix • We plot the extent to which ! ∗ down-weights words against frequency of words compared to all-but-the-top

Effect of induction matrix • We plot the extent to which ! ∗ down-weights words against frequency of words compared to all-but-the-top Change in Embedding Norm under Transform ! ∗ mainly down-weights words with very high and very low frequency |! ∗ $ % | |$ % | All-but-the-top mainly down-weights frequent words log(*+,-. % )

ACL 2018 Effect of number of contexts Contextual Rare Words (CRW) dataset 1 providing contexts for rare words • Task: Predict human-rated similarity scores for pairs of words • Evaluation: Spearman’s rank coefficient between inner product and score 1: Subset of RW dataset [Luong et al. ’13]

ACL 2018 Effect of number of contexts Contextual Rare Words (CRW) dataset 1 providing contexts for rare words • Task: Predict human-rated similarity scores for pairs of words • Evaluation: Spearman’s rank coefficient between inner product and score Average Compare to the following methods: Average , all-but-the-top Average, no stop words SIF • Average of words in context SIF + all-but-the-top à la carte • Average of non stop words • SIF weighted average • all-but-the-top 1: Subset of RW dataset [Luong et al. ’13]

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on - PowerPoint PPT Presentation

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature Vector ors Mikhail Khodak* ,1 , Nikunj Saunshi *,1 , Yingyu Liang 2 , Tengyu Ma 3 , Brandon Stewart 1 , Sanjeev Arora 1 1: Princeton University, 2:

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Products available for hospitality hospitality Axminster a la Carte Choose from the design

nouvelle mthode de placement MSc Jacques VERCRUYSSE GEO-GREEN sprl-bvba Cheap-GSHPs (Cheap and

PolyViNE: Pol ic y based Vi rtual N etwork E mbedding Across Multiple Domains Presented by Fady

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

scoot Introducing Fast, Cheap, Personal Transportation Free, But Slow (<10 MPH, $0/trip)

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Strategic Information Transmission: Cheap Talk Games Outline (November 12, 2008) Credible

Communication in Games Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Cheap Talk In cheap talk,

Statistical significance in CP violation Mattias Blennow emb@kth.se KTH Theoretical Physics

SAN DIEGO PORTFOLIO ACQUISITION S E P T EMB ER 2 0 1 7 San Diego, CA FORWARD-LOOKING STATEMENTS

Emb mbracing STEAM over STEM: Ben Benefit fits f s for Oil, or Oil, Gas, Gas, an and

GTU Faculty of Informatics and Control systems TMM Campus De TM e Naye ayer, , GTU

An Overview of Warhead Initiation and Reliability Testing Mark Ashcroft WGCDR Simon Bird Jing

P12 Yerevan Telecommunication Research Institute CJSC YeTRI Tbilisi Technical

Tegra gra Go Goes s Ind ndustry: ustry: Emb mbedded edded Hyp ypersp erspectral ectral

Hidden Springs Event Centers-Aubrey-Anna-Poetry Pizza Hut Park, Frisco, Texas The Villa

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

1 HALF-YEAR 2020 COMPANY RESULTS F20 HALF-YEAR Thank you to Andrew Lawrence and Dan Medd from

1 William Putsis Professor of Marketing, Economics and Business Strategy Two Parts to this

Q2 2019 Investor Presentation August 6, 2019 Safe Harbor Disclosure and Definitions This

A L A CART E I NST RUCT I ON Ste pha nie K ing Pub lic Se rvic e s L ib ra ria n Susa

Investor Presentation M A Y 2 0 1 8 F O R W A R D L O O K I N G S T A T E M E N T S A N D U

Velas Resorts | 3 Prestigious Beach Destinations in Mexico | 5 Upscale Resorts Luxury

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on - PowerPoint PPT Presentation

A A La Carte Emb mbedding: Ch Cheap but Effective Induction on of of Se Semantic Feature Vector ors Mikhail Khodak* ,1 , Nikunj Saunshi *,1 , Yingyu Liang 2 , Tengyu Ma 3 , Brandon Stewart 1 , Sanjeev Arora 1 1: Princeton University, 2:

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

Products available for hospitality hospitality Axminster a la Carte Choose from the design

nouvelle mthode de placement MSc Jacques VERCRUYSSE GEO-GREEN sprl-bvba Cheap-GSHPs (Cheap and

PolyViNE: Pol ic y based Vi rtual N etwork E mbedding Across Multiple Domains Presented by Fady

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

scoot Introducing Fast, Cheap, Personal Transportation Free, But Slow (&lt;10 MPH, $0/trip)

Cheap Children and the Cheap Children and the Persistence of Poverty Persistence of

Strategic Information Transmission: Cheap Talk Games Outline (November 12, 2008) Credible

Communication in Games Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Cheap Talk In cheap talk,

Statistical significance in CP violation Mattias Blennow emb@kth.se KTH Theoretical Physics

SAN DIEGO PORTFOLIO ACQUISITION S E P T EMB ER 2 0 1 7 San Diego, CA FORWARD-LOOKING STATEMENTS

Emb mbracing STEAM over STEM: Ben Benefit fits f s for Oil, or Oil, Gas, Gas, an and

GTU Faculty of Informatics and Control systems TMM Campus De TM e Naye ayer, , GTU

An Overview of Warhead Initiation and Reliability Testing Mark Ashcroft WGCDR Simon Bird Jing

P12 Yerevan Telecommunication Research Institute CJSC YeTRI Tbilisi Technical

Tegra gra Go Goes s Ind ndustry: ustry: Emb mbedded edded Hyp ypersp erspectral ectral

Hidden Springs Event Centers-Aubrey-Anna-Poetry Pizza Hut Park, Frisco, Texas The Villa

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

1 HALF-YEAR 2020 COMPANY RESULTS F20 HALF-YEAR Thank you to Andrew Lawrence and Dan Medd from

1 William Putsis Professor of Marketing, Economics and Business Strategy Two Parts to this

Q2 2019 Investor Presentation August 6, 2019 Safe Harbor Disclosure and Definitions This

A L A CART E I NST RUCT I ON Ste pha nie K ing Pub lic Se rvic e s L ib ra ria n Susa

Investor Presentation M A Y 2 0 1 8 F O R W A R D L O O K I N G S T A T E M E N T S A N D U

Velas Resorts | 3 Prestigious Beach Destinations in Mexico | 5 Upscale Resorts Luxury

scoot Introducing Fast, Cheap, Personal Transportation Free, But Slow (<10 MPH, $0/trip)