Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - - PowerPoint PPT Presentation

graph based word embeddings learning
SMART_READER_LITE
LIVE PREVIEW

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG - - PowerPoint PPT Presentation

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018 Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA 1 Date One year ago Our plan: Using graph-of-words for


slide-1
SLIDE 1

Date

Presenter: Zheng ZHANG Supervisors: Pierre ZWEIGENBAUM & Yue MA

Graph-Based Word Embeddings Learning

(Post-) Doctoral Seminar of Group ILES, LIMSI 10/04/2018

1

slide-2
SLIDE 2

One year ago…

  • Our plan: Using graph-of-words for word2vec training
  • Difficulty: Optimization for big data

2

28/03/2017

slide-3
SLIDE 3

Graph-based word2vec training

  • graph-of-words → word co-occurrences networks (matrices)
  • Definition: A graph whose vertices represent unique

terms of the document and whose edges represent co-

  • ccurrences between the terms within a fixed-size

sliding window.

  • Networks and matrices are interchangeable.
  • A new context

→ negative examples

  • word2vec already implicitly uses the statistics of word

co-occurrences for the context word selection, but not for the negative examples selection. 3

  • Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction.

https://safetyapp.shinyapps.io/GoWvis/

slide-4
SLIDE 4

Skip-gram negative sampling

  • Why?
  • Softmax calculation is too expensive → Replace every

!"# $(&'|&)) term in the Skip-gram objective.

  • What?
  • Distinguish the target word from draws from the noise

distribution using logistic regression, where there are k negative examples for each data sample.

  • How?
  • Advantages:
  • Cheap to calculate.
  • All valid words could be selected as negative examples.
  • Drawbacks:
  • Not targeted for training words.

4

slide-5
SLIDE 5

Drawbacks of skip-gram negative sampling

  • Negative sampling is not targeted for training words. It is
  • nly based on the word count.

word_id word_count

5

word_id word_id lg($

%(&))

word count ((&) Heat map of the negative examples distribution $

%(&)

Same !

slide-6
SLIDE 6

word_id word_id lg($%&' (% − %((*&&+,(+)

Graph-based negative sampling

  • Based on the word co-occurrence network (matrix)
  • Three methods to generate noise distribution
  • Training-word context distribution
  • Difference between the unigram distribution and the

training words contexts distribution

  • Random walks on the word co-occurrence network

6 Heat map of the word co-occurrence distribution

slide-7
SLIDE 7

Graph-based negative sampling

  • Evaluation results
  • Total time
  • Entire English Wikipedia corpus (~2.19 &'((')* tokens)

trained on the server prevert (50 threads used):

2.5h + 8h

Word co-occurrence network (matrix) generation word2vec training 7

slide-8
SLIDE 8

How to generate a large word co-

  • ccurrence network within 3 hours ?
  • Difficulties:
  • Large corpus
  • NLP applications oriented (tokenizer, word

preprocessing, POS-tagging, weighted word co-

  • ccurrences…)
  • Grid search of the window size
  • We (joint work with Ruiqing YIN in group TLP) developed a

tool for that !

  • Multiprocessing
  • Built-in methods to preprocess words, analyze

sentences, extract word pairs and define edges weights.

  • User-customized functions supported.
  • Works with other graph libraries (igraph, NetworkX and

graph-tool) as a front end providing data to boost network generation speed. 8

slide-9
SLIDE 9

One year ago…

  • Our plan: Using graph-of-words for word2vec training
  • Difficulty: Optimization for big data

9

28/03/2017

slide-10
SLIDE 10

Progress and Future work

  • Our plan: Using graph-of-words for word2vec training
  • GNEG: Graph-Based Negative Sampling for word2vec (Submitted as

a short paper to ACL 2018)

  • Future work:
  • Graph-based context selection
  • Re-assign the training order
  • Adapt for multi-lingual word embeddings training
  • Difficulty: Optimization for big data
  • Efficient Generation and Processing of Word Co-occurrence

Networks Using corpus2graph (to appear in TextGraphs 2018)

  • An open-source NLP-application-oriented tool: public version will

be available in GitHub (https://github.com/zzcoolj/corpus2graph) by the end of this week.

  • Future work:
  • More built-in methods
  • Efficient graph processing

10

slide-11
SLIDE 11

Merci