graph based word embeddings learning
play

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - PowerPoint PPT Presentation

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date One year ago Our plan: Using graph-of-words for


  1. (Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date

  2. One year ago… Our plan: Using graph-of-words for word2vec training • 28/03/2017 Difficulty: Optimization for big data • 2

  3. Graph-based word2vec training graph-of-words → word co-occurrences networks (matrices) • Definition: A graph whose vertices represent unique • terms of the document and whose edges represent co- occurrences between the terms within a fixed-size sliding window. Networks and matrices are interchangeable. • A new context → negative examples • word2vec already implicitly uses the statistics of word • co-occurrences for the context word selection, but not for the negative examples selection. 3 Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. https://safetyapp.shinyapps.io/GoWvis/

  4. Skip-gram negative sampling Why? • Softmax calculation is too expensive → Replace every • !"# $(& ' |& ) ) term in the Skip-gram objective. What? • • Distinguish the target word from draws from the noise distribution using logistic regression , where there are k negative examples for each data sample. How? • • Advantages: • Cheap to calculate. • All valid words could be selected as negative examples. • Drawbacks: • • Not targeted for training words. 4

  5. Drawbacks of skip-gram negative sampling Negative sampling is not targeted for training words. It is • only based on the word count. word_count word_id lg($ % (&)) word_id word_id Heat map of the negative word count ((&) examples distribution $ % (&) Same ! 5

  6. Graph-based negative sampling Based on the word co-occurrence network (matrix) • word_id lg($%&' (% − %((*&&+,(+) word_id Heat map of the word co-occurrence distribution • Three methods to generate noise distribution Training-word context distribution • Difference between the unigram distribution and the • training words contexts distribution 6 Random walks on the word co-occurrence network •

  7. Graph-based negative sampling Evaluation results • Total time • Entire English Wikipedia corpus ( ~2.19 &'((')* tokens) • trained on the server prevert (50 threads used): 2.5h + 8h Word co-occurrence word2vec training network (matrix) generation 7

  8. How to generate a large word co- occurrence network within 3 hours ? Difficulties: • Large corpus • NLP applications oriented (tokenizer, word • preprocessing, POS-tagging, weighted word co- occurrences…) Grid search of the window size • We (joint work with Ruiqing YIN in group TLP) developed a • tool for that ! Multiprocessing • Built-in methods to preprocess words, analyze • sentences, extract word pairs and define edges weights. User-customized functions supported. • • Works with other graph libraries (igraph, NetworkX and graph-tool) as a front end providing data to boost network generation speed. 8

  9. One year ago… Our plan: Using graph-of-words for word2vec training • 28/03/2017 Difficulty: Optimization for big data • 9

  10. Progress and Future work Our plan: Using graph-of-words for word2vec training • GNEG: Graph-Based Negative Sampling for word2vec (Submitted as • a short paper to ACL 2018) Future work: • • Graph-based context selection Re-assign the training order • Adapt for multi-lingual word embeddings training • Difficulty: Optimization for big data • Efficient Generation and Processing of Word Co-occurrence • Networks Using corpus2graph (to appear in TextGraphs 2018) An open-source NLP-application-oriented tool: public version will • be available in GitHub (https://github.com/zzcoolj/corpus2graph) by the end of this week. Future work: • More built-in methods • Efficient graph processing • 10

  11. Merci

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend