the influence of down sampling strategies on svd word
play

The Influence of Down-Sampling Strategies on SVD Word Embedding - PowerPoint PPT Presentation

June 6 th 2019, Minneapolis, USA RepEval 2019 The Influence of Down-Sampling Strategies on SVD Word Embedding Stability Johannes Hellrich, Bernd Kampe & Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich


  1. June 6 th 2019, Minneapolis, USA RepEval 2019 The Influence of Down-Sampling Strategies on SVD Word Embedding Stability Johannes Hellrich, Bernd Kampe & Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany www.julielab.de Down-Sampling and SVD Word Embedding Stability 1 Johannes Hellrich, Bernd Kampe & Udo Hahn

  2. June 6 th 2019, Minneapolis, USA RepEval 2019 Typical Word Embeddings are Unstable lots tiger tiger corpus of text cat cat dog dog random embeddings random processing final embeddings Down-Sampling and SVD Word Embedding Stability 2 Johannes Hellrich, Bernd Kampe & Udo Hahn

  3. June 6 th 2019, Minneapolis, USA RepEval 2019 Typical Word Embeddings are Unstable tiger lots tiger corpus of text dog cat cat dog random embeddings final embeddings random processing Down-Sampling and SVD Word Embedding Stability 3 Johannes Hellrich, Bernd Kampe & Udo Hahn

  4. June 6 th 2019, Minneapolis, USA RepEval 2019 Measuring Stability lots corpus of text tiger tiger tiger cat cat cat dog dog dog | T m ∈ M msw( a, n, m ) | j @ n := 1 X | S | A | m ∈ M msw( a, n, m ) | a ∈ A Down-Sampling and SVD Word Embedding Stability 5 Johannes Hellrich, Bernd Kampe & Udo Hahn

  5. June 6 th 2019, Minneapolis, USA RepEval 2019 Why SVD Embeddings? tiger roar food lots dog 475 156 corpus of cat cat 823 492 text counting SVD dog tiger 51 19 final embeddings Down-Sampling and SVD Word Embedding Stability 7 Johannes Hellrich, Bernd Kampe & Udo Hahn

  6. June 6 th 2019, Minneapolis, USA RepEval 2019 Why SVD Embeddings? tiger roar food lots dog 0.02 0.01 corpus of cat cat 0.5 0.4 text counting SVD dog tiger 0.01 0.19 & down-sampling final embeddings Replaced with association values in SVD PPMI (Levy et al., TACL 2015) Down-Sampling and SVD Word Embedding Stability 8 Johannes Hellrich, Bernd Kampe & Udo Hahn

  7. June 6 th 2019, Minneapolis, USA RepEval 2019 Why Down-Sampling? • Avoids over-representing frequent words • Closer context words are more salient than distant ones à Increased Performance (Mikolov, NIPS 2013) Down-Sampling and SVD Word Embedding Stability 9 Johannes Hellrich, Bernd Kampe & Udo Hahn

  8. June 6 th 2019, Minneapolis, USA RepEval 2019 Down-Sampling Mechanism Probabilistic Weighting • word2vec • GloVe • SVD PPMI • New: SVD wPPMI Down-Sampling and SVD Word Embedding Stability 10 Johannes Hellrich, Bernd Kampe & Udo Hahn

  9. June 6 th 2019, Minneapolis, USA RepEval 2019 Experimental Design I/II • Three Corpora: • Corpus of Historical American English 2000s decade (COHA; 28M tokens.) • English News Crawl Corpus (NEWS; 550M tokens) • Wikipedia (WIKI; 1.7G tokens) à Other studies used mostly COHA-sized corpora! Down-Sampling and SVD Word Embedding Stability 13 Johannes Hellrich, Bernd Kampe & Udo Hahn

  10. June 6 th 2019, Minneapolis, USA RepEval 2019 Experimental Design II/II • Train 10 models each with SGNS, GloVe, SVD PPMI (none / prob. down-sampling), SVD wPPMI • Evaluate intrinsically with four word similarity & two analogy test sets • Measure stability with j@10 for 1k most frequent words Down-Sampling and SVD Word Embedding Stability 14 Johannes Hellrich, Bernd Kampe & Udo Hahn

  11. June 6 th 2019, Minneapolis, USA RepEval 2019 Stability Results GloVe‘s high stability (Antoniak & Mimno, TACL 2018; Wendlandt et al., NAACL 2018) is true only for small corpora Down-Sampling and SVD Word Embedding Stability 16 Johannes Hellrich, Bernd Kampe & Udo Hahn

  12. June 6 th 2019, Minneapolis, USA RepEval 2019 Exemplary Accuracy Results Wilcoxon rank-sum test shows SVD wPPMI and SGNS to be indistinguishable in accuracy over all test sets and corpora Down-Sampling and SVD Word Embedding Stability 17 Johannes Hellrich, Bernd Kampe & Udo Hahn

  13. June 6 th 2019, Minneapolis, USA RepEval 2019 Conclusion • Typical word embeddings are unstable • Down-sampling details greatly affect stability • GloVe’s stability is worse than claimed in literature • SVD wPPMI embeddings provide SGNS-like performance and perfect stability • See paper for additional results (and bootstrapping) Down-Sampling and SVD Word Embedding Stability 18 Johannes Hellrich, Bernd Kampe & Udo Hahn

  14. June 6 th 2019, Minneapolis, USA RepEval 2019 The Influence of Down-Sampling Strategies on SVD Word Embedding Stability Johannes Hellrich, Bernd Kampe & Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany www.julielab.de Down-Sampling and SVD Word Embedding Stability 19 Johannes Hellrich, Bernd Kampe & Udo Hahn

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend