Analogies Explained Towards Understanding Word Embeddings Carl - PowerPoint PPT Presentation

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13 2019 School of Informatics, University of Edinburgh

The Problem: linking semantics to geometry from: “man is to king as woman is to queen” explain: w king w man w woman w queen or rather: ? P 1 ? P 1 1

The Problem: linking semantics to geometry from: “man is to king as woman is to queen” explain: or rather: ? P 1 ? P 1 1 w king − w man + w woman ≈ w queen

The Problem: linking semantics to geometry from: P 1 ? P 1 ? 1 “man is to king as woman is to queen” explain: or rather: w king − w man + w woman ≈ w queen prince auxiliary sol permitting reign queen royal princess crown lord prince queen w K − w M w K − w M + w W king woman man woman

p w i c j w i c j p w i p c j PMI w i c j W C use sigmoid with negative sampling ( k ) Levy and Goldberg (2014) Word2Vec: SkipGram with Negative Sampling Mikolov et al. (2013a,b) k k W C PMI k . . . . w 1 w 2 w 3 w n target c 1 c 2 c 3 c n context . . 2 • p ( c j | w i ) by softmax expensive words ( E ) words ( E )

p w i c j w i c j p w i p c j PMI w i c j Word2Vec: SkipGram with Negative Sampling Mikolov et al. (2013a,b) k PMI W C k k Levy and Goldberg (2014) sampling ( k ) C W . . . c 2 w 1 w 2 w 3 w n target c 1 c 3 . c n context . . 2 • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative

Word2Vec: SkipGram with Negative Sampling . k PMI W C sampling ( k ) C W . . Mikolov et al. (2013a,b) . . . 2 target w 1 w 2 w 3 context w n c 2 c 1 c 3 c n • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative • Levy and Goldberg (2014) p ( w i , c j ) w ⊤ i c j ≈ log p ( w i ) p ( c j ) − log k = PMI ( w i , c j ) − log k

Word2Vec: SkipGram with Negative Sampling context sampling ( k ) C W . . . Mikolov et al. (2013a,b) . . . c n w n w 1 w 2 w 3 2 target c 1 c 2 c 3 • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative • Levy and Goldberg (2014) p ( w i , c j ) w ⊤ i c j ≈ log p ( w i ) p ( c j ) − log k = PMI ( w i , c j ) − log k W ⊤ C ≈ PMI − log k

Routemap PMI woman semantic geometric w queen w woman w man w king PMI queen PMI man “man is to king as woman is to queen” PMI king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3

Routemap PMI woman semantic geometric w queen w woman w man w king PMI queen PMI man “man is to king as woman is to queen” PMI king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕

Routemap PMI man semantic geometric w queen w woman w man w king PMI queen PMI woman PMI king “man is to king as woman is to queen” {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕ ⇕

Routemap “man is to king as woman is to queen” semantic geometric w queen w woman w man w king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen

Routemap {woman, king} semantic geometric {man, queen} “man is to king as woman is to queen” paraphrases woman transforms to queen as man transforms to king 3 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ w king − w man + w woman ≈ w queen

Routemap {woman, king} semantic geometric {man, queen} “man is to king as woman is to queen” paraphrases woman transforms to queen as man transforms to king 4 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ w king − w man + w woman ≈ w queen

Routemap {woman, king} semantic geometric “man is to king as woman is to queen” paraphrases {man, queen} woman transforms to queen as man transforms to king 5 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ PMI i ≈ w ⊤ i C w king − w man + w woman ≈ w queen

n is (element-wise) small: p c j w 6 l , if paraphrase error Inspired by Gittens et al. (2017) c j p c j j w w , paraphrases Definition (D1): w w n w 1 Paraphrase † of W by w ∗ Intuition: word w ∗ ∈E paraphrases word set W = { w 1 , ..., w m }⊆E , if w ∗ and W are semantically interchangeable . p ( E|W ) p ( E| w ∗ ) E

6 w n j w 1 Paraphrase † of W by w ∗ Intuition: word w ∗ ∈E paraphrases word set W = { w 1 , ..., w m }⊆E , if w ∗ and W are semantically interchangeable . p ( E|W ) p ( E| w ∗ ) E Definition (D1): w ∗ ∈E paraphrases W ⊆E , |W| < l , if paraphrase error ρ W , w ∗ ∈ R n is (element-wise) small: = log p ( c j | w ∗ ) W , w ∗ p ( c j |W ) , c j ∈E ρ † Inspired by Gittens et al. (2017)

PMI w 1 c j PMI w 2 c j p w 1 c j p w 2 c j p w 1 p w 2 p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase error conditional independence error p independence , Lemma 1: For any word w and word set l : PMI w i PMI i w 1 j error c j p PMI w c j p w c j p w p c j c j p p p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ?

p w 1 c j p w 2 c j p w 1 p w 2 p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase error conditional independence error p independence and word set Lemma 1: For any word w , l : PMI w i PMI i w 1 j error c j c j p w c j p w p c j p p p p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j )

p c j w p w 1 c j p w 2 c j p w 1 p w 2 error j conditional independence error p independence Summing PMI vectors of a paraphrase c j and word set , l : PMI w i PMI i w 1 Lemma 1: For any word w error p p p p c j c j p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) p ( w ∗ ) p ( w 1 ) p ( w 2 )

p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase conditional independence error p independence error Lemma 1: For any word w and word set , l : PMI w i PMI i w 1 j c j 7 p error paraphrase j w p c j PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W )

Summing PMI vectors of a paraphrase Lemma 1: For any word w j conditional independence error independence error and word set paraphrase , l : PMI w i PMI i w 1 error 7 j PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W ) = log p ( c j | w ∗ ) p ( W| c j ) p ( W ) + log − log p ( c j |W ) p ( w 1 | c j ) p ( w 2 | c j ) p ( w 1 ) p ( w 2 ) � �� ρ W , w ∗ σ W τ W

Summing PMI vectors of a paraphrase j error independence error independence conditional j error paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W ) = log p ( c j | w ∗ ) p ( W| c j ) p ( W ) + log − log p ( c j |W ) p ( w 1 | c j ) p ( w 2 | c j ) p ( w 1 ) p ( w 2 ) � �� ρ W , w ∗ σ W τ W Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W

8 p 1 PMI i w i PMI i w i l : , , , Lemma 2: For any word sets p w n w 1 : Replace word w with word set Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W

8 Lemma 2: For any word sets 1 PMI i w i PMI i w i l : , , , w 1 w n Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W Replace word w ∗ with word set W ∗ ⊆E : p ( E|W ) p ( E|W ∗ ) E

8 w 1 w n Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W Replace word w ∗ with word set W ∗ ⊆E : p ( E|W ) p ( E|W ∗ ) E Lemma 2: For any word sets W , W ∗ ⊆E , |W| , |W ∗ | < l : ∑ ∑ W − σ W − τ W , W∗ + σ W∗ − ( τ PMI i = PMI i + ρ W∗ ) 1 w i ∈ W ∗ w i ∈ W

Analogies Explained Towards Understanding Word Embeddings Carl - PowerPoint PPT Presentation

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13 2019 School of Informatics, University of Edinburgh The Problem: linking semantics to geometry from: man is to king as woman is to queen

Systems Good for People Chained to the Rhythm Learning Analogies Analogies Run Amok What

Image and Shape The beginning Analogies The middle A few pictures Some more middle

Cell Analogies Using abstraction to show off your creativity and your knowledge of cells Project

Christian Life Analogies 1. Athlete (II Timothy 2:5) 2. Farmer (James 5:7) 3. Bride (Revelation

What is an explanation? What happens when we explain ? What sort of facts need to be explained?

Learning about the Earth from a scotch egg: How children learn with analogies and how to teach

Making Sense of Contemporary Geopolitics: Historical Analogies and Present Constructs Ali Wyne

Evolution and Thermodynamics Useful and Misleading Analogies Peter Schuster From Thermodynamics

Lecture 3 January 23, 2018 How does the brain work? Analogies to Technology Mechanical

Classes, Styles, Conflicts Analogies Morphological Functional The Biological Realm of L A T

for teaching modules Helpful visuals, analogies, and other ideas for use in library instruction

Laterite Karst * There are many types of karst-like feature in Laterites. Ken Grimes *

CARRYING BARRENLAND CARIBOU HERD AND HUNTER MIGRATION ROUTE ANALOGIES TO GREAT LAKES STATES, NEW

Poetic Figures 3 ANALOGIES AND COMPARISONS Simile : the explicit comparison of two things

Teaching the concept of energy using analogies between solar energy converters TPI-15 / ELTE

COURNOTS LEGACY The actions of intelligent and moral beings cannot be explained, given the

Language Modeling Recap CMSC 473/673 UMBC Some slides adapted from 3SLP n-grams = Chain Rule

& Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

4.2: Isomorphism of Grammars In this section, we study grammar isomorphism, i.e., the way in

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Language models Chapter 3 in Martin/Jurafsky Language model as a generative model Choose a

Antimicrobial resistance and strategies for Gram-negative bacteria Y Glupczynski UCL

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

Analogies Explained Towards Understanding Word Embeddings Carl - PowerPoint PPT Presentation

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13 2019 School of Informatics, University of Edinburgh The Problem: linking semantics to geometry from: man is to king as woman is to queen

Systems Good for People Chained to the Rhythm Learning Analogies Analogies Run Amok What

Image and Shape The beginning Analogies The middle A few pictures Some more middle

Cell Analogies Using abstraction to show off your creativity and your knowledge of cells Project

Christian Life Analogies 1. Athlete (II Timothy 2:5) 2. Farmer (James 5:7) 3. Bride (Revelation

What is an explanation? What happens when we explain ? What sort of facts need to be explained?

Learning about the Earth from a scotch egg: How children learn with analogies and how to teach

Making Sense of Contemporary Geopolitics: Historical Analogies and Present Constructs Ali Wyne

Evolution and Thermodynamics Useful and Misleading Analogies Peter Schuster From Thermodynamics

Lecture 3 January 23, 2018 How does the brain work? Analogies to Technology Mechanical

Classes, Styles, Conflicts Analogies Morphological Functional The Biological Realm of L A T

for teaching modules Helpful visuals, analogies, and other ideas for use in library instruction

Laterite Karst * There are many types of karst-like feature in Laterites. Ken Grimes *

CARRYING BARRENLAND CARIBOU HERD AND HUNTER MIGRATION ROUTE ANALOGIES TO GREAT LAKES STATES, NEW

Poetic Figures 3 ANALOGIES AND COMPARISONS Simile : the explicit comparison of two things

Teaching the concept of energy using analogies between solar energy converters TPI-15 / ELTE

COURNOTS LEGACY The actions of intelligent and moral beings cannot be explained, given the

Language Modeling Recap CMSC 473/673 UMBC Some slides adapted from 3SLP n-grams = Chain Rule

&amp; Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

CSC321 Lecture 7: Distributed Representations Roger Grosse Roger Grosse CSC321 Lecture 7:

4.2: Isomorphism of Grammars In this section, we study grammar isomorphism, i.e., the way in

N-grams &amp; Language ID If N-gram models represent language models, can we use N-gram

Language models Chapter 3 in Martin/Jurafsky Language model as a generative model Choose a

Antimicrobial resistance and strategies for Gram-negative bacteria Y Glupczynski UCL

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

& Information Theory Problems with Unseen Sequences Suppose we want to evaluate bigram

N-grams & Language ID If N-gram models represent language models, can we use N-gram