Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - PowerPoint PPT Presentation

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date

One year ago… Our plan: Using graph-of-words for word2vec training • 28/03/2017 Difficulty: Optimization for big data • 2

Graph-based word2vec training graph-of-words → word co-occurrences networks (matrices) • Definition: A graph whose vertices represent unique • terms of the document and whose edges represent co- occurrences between the terms within a fixed-size sliding window. Networks and matrices are interchangeable. • A new context → negative examples • word2vec already implicitly uses the statistics of word • co-occurrences for the context word selection, but not for the negative examples selection. 3 Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. https://safetyapp.shinyapps.io/GoWvis/

Skip-gram negative sampling Why? • Softmax calculation is too expensive → Replace every • !"# $(& ' |& ) ) term in the Skip-gram objective. What? • • Distinguish the target word from draws from the noise distribution using logistic regression , where there are k negative examples for each data sample. How? • • Advantages: • Cheap to calculate. • All valid words could be selected as negative examples. • Drawbacks: • • Not targeted for training words. 4

Drawbacks of skip-gram negative sampling Negative sampling is not targeted for training words. It is • only based on the word count. word_count word_id lg($ % (&)) word_id word_id Heat map of the negative word count ((&) examples distribution $ % (&) Same ! 5

Graph-based negative sampling Based on the word co-occurrence network (matrix) • word_id lg($%&' (% − %((*&&+,(+) word_id Heat map of the word co-occurrence distribution • Three methods to generate noise distribution Training-word context distribution • Difference between the unigram distribution and the • training words contexts distribution 6 Random walks on the word co-occurrence network •

Graph-based negative sampling Evaluation results • Total time • Entire English Wikipedia corpus ( ~2.19 &'((')* tokens) • trained on the server prevert (50 threads used): 2.5h + 8h Word co-occurrence word2vec training network (matrix) generation 7

How to generate a large word co- occurrence network within 3 hours ? Difficulties: • Large corpus • NLP applications oriented (tokenizer, word • preprocessing, POS-tagging, weighted word co- occurrences…) Grid search of the window size • We (joint work with Ruiqing YIN in group TLP) developed a • tool for that ! Multiprocessing • Built-in methods to preprocess words, analyze • sentences, extract word pairs and define edges weights. User-customized functions supported. • • Works with other graph libraries (igraph, NetworkX and graph-tool) as a front end providing data to boost network generation speed. 8

One year ago… Our plan: Using graph-of-words for word2vec training • 28/03/2017 Difficulty: Optimization for big data • 9

Progress and Future work Our plan: Using graph-of-words for word2vec training • GNEG: Graph-Based Negative Sampling for word2vec (Submitted as • a short paper to ACL 2018) Future work: • • Graph-based context selection Re-assign the training order • Adapt for multi-lingual word embeddings training • Difficulty: Optimization for big data • Efficient Generation and Processing of Word Co-occurrence • Networks Using corpus2graph (to appear in TextGraphs 2018) An open-source NLP-application-oriented tool: public version will • be available in GitHub (https://github.com/zzcoolj/corpus2graph) by the end of this week. Future work: • More built-in methods • Efficient graph processing • 10

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - PowerPoint PPT Presentation

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date One year ago Our plan: Using graph-of-words for

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview Whats an embedding? How do

Exploiting Graph Embeddings for Graph Analysis Tasks Fatemeh Salehi Rizi Graph Embedding Day

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Landscaping Performance Research at the ICPE and its Predecessors: A Systematic Literature Review

Course Content Principles of Knowledge Introduction to Data Mining Discovery in Databases

Distinguishing the Popularity Between Topics: A System for Up-to-date Opinion Retrieval and

Motivations ( key word ) Decomposition of the solving parts : SS : Search Strategy

CS 1655 / Spring 2013 Secure Data Management and Web Applications 04 Information

Information Retrieval Language

Principles of Software Construction: Objects, Design, and Concurrency Concurrency Part II:

From Uncertainty to Belief: Inferring the Specification Within Stephen McLaughlin Stephen

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - PowerPoint PPT Presentation

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date One year ago Our plan: Using graph-of-words for

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Graph Embeddings Alicia Frame, PhD October 10, 2019 Overview Whats an embedding? How do

Exploiting Graph Embeddings for Graph Analysis Tasks Fatemeh Salehi Rizi Graph Embedding Day

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Landscaping Performance Research at the ICPE and its Predecessors: A Systematic Literature Review

Course Content Principles of Knowledge Introduction to Data Mining Discovery in Databases

Distinguishing the Popularity Between Topics: A System for Up-to-date Opinion Retrieval and

Motivations ( key word ) Decomposition of the solving parts : SS : Search Strategy

CS 1655 / Spring 2013 Secure Data Management and Web Applications 04 Information

Information Retrieval Language

Principles of Software Construction: Objects, Design, and Concurrency Concurrency Part II:

From Uncertainty to Belief: Inferring the Specification Within Stephen McLaughlin Stephen

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to