Learning Word Embeddings for Low-resource Languages by PU Learning - PowerPoint PPT Presentation

Learning Word Embeddings for Low-resource Languages by PU Learning Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang 1

Word Embeddings are useful • Many successful stories • Named entity recognition • Document ranking • Sentiment analysis • Question answering • Image captioning • Pre-trained word vectors have been widely used • GloVe [Pennington+14]: 3900+ citations • Word2Vec [Mikolov+13] : 7600+ citations 2

Existing English Embeddings are trained on a large collection of text • Word2Vec is trained on • GloVe is trained on a the Google News dataset. crawled corpus. 100 billion 840 billion tokens tokens 3

How about other language? 4

How about other language? • # Wikipedia articles in different languages • English: ~ 2.5 M High-resource languages: • German: ~ 800 K 23 languages have more than 100K articles • French: ~ 700 K • … low-resource languages: • Czech: ~100 K 60 languages have • Danish: ~95K 10K ~ 100K articles • … very low-resource languages: • Chichewa: 58 183 languages have less than 10K articles 5

Sparsity of the co-occurrence matrix • Word Embeddings are trained based on co-occurrence statistics • When training corpus is small • Many word pairs are unobserved • Co-occurrence matrix is very sparse • Example: The text8 data • 17,000,000 tokens and 71,000 distinct words • Co-occurrence matrix has more than 5,000,000,000 entries, > 99% are zeros . 6

Zeros in the co-occurrence matrix • True zeros • Word pairs which are unlikely to co-occur • Missing entries • Word pairs can co-occur • Unobserved in the training data Center word space alien table cake … alien 0.8 0.1 - 0 0 table context word 0 0 - 0 0 … - - - - - True 0 cake 0 0.2 - 0 0 Missing space 0 0 - 0 0.7 7 7

Motivation • 8

Our contributions 1. Propose a PU-Learning framework for training word embedding 2. Design an efficient learning algorithm to deal with all negative pairs 3. Demonstrate that unobserved word pairs provide valuable information 9

PU-Learning for Training Word Embedding 10

PU Learning Framework 1. Pre-processing: Building co-occurrence matrix 2. Matrix factorization by PU-Learning 3. Post-processing 11

Step 1 – Building co-occurrence matrix • Count words co-occurrence statistics • We follow [Levy+15] to scale the co-occurrence counts by PPMI metric Center word happy cat dog table … the 0.8 0.1 - 0 0 black context word 0 0 - 0 0 … the black cat likes milk … … - - - - - likes 0 0 - 0 0 context window milk (cat, the) 0.2 0 - 0 0.2 (cat, black) (cat, likes) Scaled by PPMI 12 (cat, milk) ...

Center word happy cat dog table … black 0.8 0.1 - 0 0 zeros blue context word 0.5 0 - 0 0 … - - - - - yellow frequent 0 0 - 0 0 milk 0.2 0 - 0 0.2 13

Step 2 - PU-Learning for matrix factorization 0.1 0.1 - 0.8 0.1 0.2 - 0.3 0.1 … 0.2 0.3 - - - 0.1 0.1 … 0.1 0.1 0.1 0.1 - … … … … … 0.2 0.2 - 14

Step 2 - PU-Learning for matrix factorization 0.1 0.1 - 0.8 0.1 0.2 - 0.3 0.1 … 0.2 0.3 - - - 0.1 0.1 … 0.1 0.1 0.1 0.1 - … … … … … 0.2 0.2 - r o n r r o e i t n c Regularization o n i u t c f u g r n t s i t n h o g c i e e W R 15

Step 2 – Weighting function frequent 0.1 0.1 - 0.1 0.2 - 0.3 0.1 … 0.2 0.3 zeros - - - 0.1 0.1 … 0.1 0.1 0.1 0.1 - … … … … … 0.2 0.2 - 16

Step 2 - PU-Learning for matrix factorization r o n r r o e i t n c Regularization o n i u t c f u g r n t i s t n h o g c i e e W R 17 • We consider all entries • Both positive and zero entries 17

Step 2 - PU-Learning for matrix factorization 0.1 0.1 - 0.1 0.2 - 0.3 0.1 … 0.2 0.3 - - - 0.1 0.1 … 0.1 0.1 0.1 0.1 - … … … … … 0.2 0.2 - • We design efficient coordinate descent algorithm (see paper for details) 18

Step 3 -- Post-processing • 23

Experiments 24

Results on English Simulate the low-resource setting: Embedding is trained on a subset of Wikipedia with 32M tokens Word Similarity Task on WS353 Analogy Task on Google Dataset 25

Results on Danish (more results in paper) Danish Wikipedia with 64M tokens Test set are translated by Google translation (w/ 90% accuracy verified by native speakers) Word Similarity Task on WS353 Analogy Task on Google Dataset 26

• Weight for zero entries in co-occurrence matrix • Zero entries can be true 0 or missing • 𝜍 reflects how confident that the zero entries are true zero 27

Take home messages • A PU-Learning framework for learning word embedding in the low resource setting • Unobserved word pairs provide valuable information Thanks! 28

Learning Word Embeddings for Low-resource Languages by PU Learning - PowerPoint PPT Presentation

Learning Word Embeddings for Low-resource Languages by PU Learning Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang 1 Word Embeddings are useful Many successful stories Named entity recognition Document ranking

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

MA111: Contemporary mathematics Jack Schmidt University of Kentucky November 28, 2011 Schedule:

Detecting a Machine Failure in a Network, a.k.a. Vertex Identifying Codes Daniel W. Cranston

The Communication Complexity of Fair Division Simina Brnzei and Noam Nisan Hebrew University

A Look Into the Future Roman Elizarov October 12, 2020 @relizarov A bit of history Kotlin 1.0

Outline Efficient Receipt-Freeness for e-Voting David Pointcheval 1 Introduction Joint work

Artificial Intelligence Lecture 1-2: Representational Methods Propositional Logic and

CSC2556 Lecture 6 Kidney Exchange Cake-Cutting [Some illustrations due to: Ariel Procaccia]

Latent Event Structure Atomic Object Structure: Formal Quale (objects expressed as basic nominal

Learning Word Embeddings for Low-resource Languages by PU Learning - PowerPoint PPT Presentation

Learning Word Embeddings for Low-resource Languages by PU Learning Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang 1 Word Embeddings are useful Many successful stories Named entity recognition Document ranking

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

MA111: Contemporary mathematics Jack Schmidt University of Kentucky November 28, 2011 Schedule:

Detecting a Machine Failure in a Network, a.k.a. Vertex Identifying Codes Daniel W. Cranston

The Communication Complexity of Fair Division Simina Brnzei and Noam Nisan Hebrew University

A Look Into the Future Roman Elizarov October 12, 2020 @relizarov A bit of history Kotlin 1.0

Outline Efficient Receipt-Freeness for e-Voting David Pointcheval 1 Introduction Joint work

Artificial Intelligence Lecture 1-2: Representational Methods Propositional Logic and

CSC2556 Lecture 6 Kidney Exchange Cake-Cutting [Some illustrations due to: Ariel Procaccia]

Latent Event Structure Atomic Object Structure: Formal Quale (objects expressed as basic nominal

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to