GloVe: Global Vectors for Word Representation Jeffrey Pennington, - - PowerPoint PPT Presentation

glove global vectors for word representation
SMART_READER_LITE
LIVE PREVIEW

GloVe: Global Vectors for Word Representation Jeffrey Pennington, - - PowerPoint PPT Presentation

GloVe: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning Presented by Chris Kedzie March 25, 2015 Chris Kedzie GloVe March 25, 2015 1 / 30 Overview Introduction 1 Problem 2 GloVe Model 3


slide-1
SLIDE 1

GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher D. Manning Presented by Chris Kedzie March 25, 2015

Chris Kedzie GloVe March 25, 2015 1 / 30

slide-2
SLIDE 2

Overview

1

Introduction

2

Problem

3

GloVe Model

4

Experiments

Chris Kedzie GloVe March 25, 2015 2 / 30

slide-3
SLIDE 3

GloVe

1

Introduction

2

Problem

3

GloVe Model

4

Experiments

Chris Kedzie GloVe March 25, 2015 3 / 30

slide-4
SLIDE 4

Word Representations: A history

Chris Kedzie GloVe March 25, 2015 4 / 30

slide-5
SLIDE 5

Neural Language Models – Recurrent NNLM

ht−1 wt . . . . . . ht

  • t+1

Chris Kedzie GloVe March 25, 2015 5 / 30

slide-6
SLIDE 6

Neural Language Models – Recurrent NNLM

ht−1 wt . . . . . . ht

  • t+1

Chris Kedzie GloVe March 25, 2015 5 / 30

slide-7
SLIDE 7

Neural Language Models – Recurrent NNLM

ht−1 wt . . . . . . ht

  • t+1

wt

Chris Kedzie GloVe March 25, 2015 5 / 30

slide-8
SLIDE 8

Neural Language Models – Continuous BOW

wt−2 wt−1 wt+1 wt+2 wavg . . . . . .

  • t

Chris Kedzie GloVe March 25, 2015 6 / 30

slide-9
SLIDE 9

Neural Language Models – Continuous BOW

wt−2 wt−1 wt+1 wt+2 wavg . . . . . .

  • t

Chris Kedzie GloVe March 25, 2015 6 / 30

slide-10
SLIDE 10

Linear Relationships

Semantic wking − wman + wwoman ≈ wqueen Syntactic weasy − weasiest + wluckiest ≈ wlucky

Chris Kedzie GloVe March 25, 2015 7 / 30

slide-11
SLIDE 11

Scalable Embedding Learning

Noise Contrastive Estimation

wt−2 wt−1 wt+1 wt+2 wavg . . . . . .

  • t

Chris Kedzie GloVe March 25, 2015 8 / 30

slide-12
SLIDE 12

Scalable Embedding Learning

Noise Contrastive Estimation –no more normalization required!

wt−2 wt−1 wt+1 wt+2 wavg . . . . . .

  • t

Chris Kedzie GloVe March 25, 2015 8 / 30

slide-13
SLIDE 13

Scalable Embedding Learning

Noise Contrastive Estimation –no more normalization required!

wt−2 wt−1 wt+1 wt+2 wavg . . . . . .

  • t

wavg w

Chris Kedzie GloVe March 25, 2015 8 / 30

slide-14
SLIDE 14

GloVe

1

Introduction

2

Problem

3

GloVe Model

4

Experiments

Chris Kedzie GloVe March 25, 2015 9 / 30

slide-15
SLIDE 15

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Chris Kedzie GloVe March 25, 2015 10 / 30

slide-16
SLIDE 16

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-17
SLIDE 17

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-18
SLIDE 18

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-19
SLIDE 19

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-20
SLIDE 20

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-21
SLIDE 21

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-22
SLIDE 22

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-23
SLIDE 23

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-24
SLIDE 24

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-25
SLIDE 25

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-26
SLIDE 26

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-27
SLIDE 27

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms. Chris Kedzie GloVe March 25, 2015 10 / 30

slide-28
SLIDE 28

Local Online Optimization

Lots of time spent scanning context windows to learn a distribution for P(w|the)

Theatre or theater is a collaborative form of fine art that uses live performers to present the experience of a real or imagined event before a live audience in a specific place. The performers may communicate this experience to the audience through combinations of gesture, speech, song, music, and dance. Elements of art and stagecraft are used to enhance the physicality, presence and immediacy of the experience. The specific place of the performance is also named by the word ”theatre” as derived from the Ancient Greek (thatron, ”a place for viewing”), itself from (theomai, ”to see”, ”to watch”, ”to observe”). Modern Western theatre comes from large measure from ancient Greek drama, from which it borrows technical terminology, classification into genres, and many of its themes, stock characters, and plot elements. Theatre artist Patrice Pavis defines theatricality, theatrical language, stage writing, and the specificity of theatre as synonymous expressions that differentiate theatre from the other performing arts, literature, and the arts in general. Theatre today, broadly defined, includes performances of plays and musicals, ballets, operas and various other forms.

There’s got to be a better way!

Chris Kedzie GloVe March 25, 2015 10 / 30

slide-29
SLIDE 29

Matrix Factorization Methods

e.g. SVD, COALS, etc. directly on co-occurrence matrix.

Chris Kedzie GloVe March 25, 2015 11 / 30

slide-30
SLIDE 30

Matrix Factorization Methods

e.g. SVD, COALS, etc. directly on co-occurrence matrix. Main drawback: frequent words like the and a have an outsized effect on the representation learning.

Chris Kedzie GloVe March 25, 2015 11 / 30

slide-31
SLIDE 31

GloVe

1

Introduction

2

Problem

3

GloVe Model

4

Experiments

Chris Kedzie GloVe March 25, 2015 12 / 30

slide-32
SLIDE 32

GloVe Model

J =

V

  • i,j=1

f (Xij)

  • wT

i ˜

wj + bi + ˜ bj − log Xij 2

Chris Kedzie GloVe March 25, 2015 13 / 30

slide-33
SLIDE 33

Notation!

X ∈ RV ×V word co-occurrence matrix

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-34
SLIDE 34

Notation!

X ∈ RV ×V word co-occurrence matrix Xij frequency of word i co-occurring with word j

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-35
SLIDE 35

Notation!

X ∈ RV ×V word co-occurrence matrix Xij frequency of word i co-occurring with word j Xi = V

k Xik

total number of occurrences of word i in corpus

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-36
SLIDE 36

Notation!

X ∈ RV ×V word co-occurrence matrix Xij frequency of word i co-occurring with word j Xi = V

k Xik

total number of occurrences of word i in corpus Pij = P(j|i) = Xij

Xi

a.k.a. probability of word j occurring within the context of word i

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-37
SLIDE 37

Notation!

X ∈ RV ×V word co-occurrence matrix Xij frequency of word i co-occurring with word j Xi = V

k Xik

total number of occurrences of word i in corpus Pij = P(j|i) = Xij

Xi

a.k.a. probability of word j occurring within the context of word i w ∈ Rd a word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-38
SLIDE 38

Notation!

X ∈ RV ×V word co-occurrence matrix Xij frequency of word i co-occurring with word j Xi = V

k Xik

total number of occurrences of word i in corpus Pij = P(j|i) = Xij

Xi

a.k.a. probability of word j occurring within the context of word i w ∈ Rd a word embedding of dimension d ˜ w ∈ Rd a context word embedding of dimension d

Chris Kedzie GloVe March 25, 2015 14 / 30

slide-39
SLIDE 39

Motivation

  • Prob. and Ratio

k = solid k = gas k = water k = fashion P(k|ice) 1.9 × 10−4 6.6 × 10−5 3.0 × 10−3 1.7 × 10−5 P(k|steam) 2.2 × 10−5 7.8 × 10−4 2.2 × 10−3 1.8 × 10−5

P(k|ice) P(k|steam)

8.9 8.5 × 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

slide-40
SLIDE 40

Motivation

  • Prob. and Ratio

k = solid k = gas k = water k = fashion P(k|ice) 1.9 × 10−4 6.6 × 10−5 3.0 × 10−3 1.7 × 10−5 P(k|steam) 2.2 × 10−5 7.8 × 10−4 2.2 × 10−3 1.8 × 10−5

P(k|ice) P(k|steam)

8.9 8.5 × 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

slide-41
SLIDE 41

Motivation

  • Prob. and Ratio

k = solid k = gas k = water k = fashion P(k|ice) 1.9 × 10−4 6.6 × 10−5 3.0 × 10−3 1.7 × 10−5 P(k|steam) 2.2 × 10−5 7.8 × 10−4 2.2 × 10−3 1.8 × 10−5

P(k|ice) P(k|steam)

8.9 8.5 × 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

slide-42
SLIDE 42

Motivation

  • Prob. and Ratio

k = solid k = gas k = water k = fashion P(k|ice) 1.9 × 10−4 6.6 × 10−5 3.0 × 10−3 1.7 × 10−5 P(k|steam) 2.2 × 10−5 7.8 × 10−4 2.2 × 10−3 1.8 × 10−5

P(k|ice) P(k|steam)

8.9 8.5 × 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

slide-43
SLIDE 43

Motivation

  • Prob. and Ratio

k = solid k = gas k = water k = fashion P(k|ice) 1.9 × 10−4 6.6 × 10−5 3.0 × 10−3 1.7 × 10−5 P(k|steam) 2.2 × 10−5 7.8 × 10−4 2.2 × 10−3 1.8 × 10−5

P(k|ice) P(k|steam)

8.9 8.5 × 10−2 1.36 0.96

Chris Kedzie GloVe March 25, 2015 15 / 30

slide-44
SLIDE 44

Derivation

F(wi, wj, ˜ wk) = Pik Pjk

Chris Kedzie GloVe March 25, 2015 16 / 30

slide-45
SLIDE 45

Derivation

F(wi, wj, ˜ wk) = Pik Pjk F should encode information in the ratio Pik

Pjk .

Chris Kedzie GloVe March 25, 2015 16 / 30

slide-46
SLIDE 46

Derivation

F(wi − wj, ˜ wk) = Pik Pjk

Chris Kedzie GloVe March 25, 2015 17 / 30

slide-47
SLIDE 47

Derivation

F

  • (wi − wj)T ˜

wk

  • = Pik

Pjk

Chris Kedzie GloVe March 25, 2015 18 / 30

slide-48
SLIDE 48

Derivation

F

  • (wi − wj)T ˜

wk

  • = Pik

Pjk Some more desiderata: F should be unchanged by exchanging w → ˜ w and X → XT

Chris Kedzie GloVe March 25, 2015 18 / 30

slide-49
SLIDE 49

Derivation

F

  • (wi − wj)T ˜

wk

  • = Pik

Pjk Some more desiderata: F should be unchanged by exchanging w → ˜ w and X → XT This requires that F

  • (wi − wj)T ˜

wk

  • = F
  • wT

i ˜

wk

  • F
  • wT

j ˜

wk

  • ⇒ F(wT

i ˜

wk) = Pik

Chris Kedzie GloVe March 25, 2015 18 / 30

slide-50
SLIDE 50

Derivation

F

  • (wi − wj)T ˜

wk

  • = Pik

Pjk Some more desiderata: F should be unchanged by exchanging w → ˜ w and X → XT This requires that F

  • (wi − wj)T ˜

wk

  • = F
  • wT

i ˜

wk

  • F
  • wT

j ˜

wk

  • ⇒ F(wT

i ˜

wk) = Pik F

  • wT

i ˜

wk − wT

j ˜

wk

  • = F
  • wT

i ˜

wk

  • F
  • wT

j ˜

wk

  • Chris Kedzie

GloVe March 25, 2015 18 / 30

slide-51
SLIDE 51

Derivation

F

  • (wi − wj)T ˜

wk

  • = Pik

Pjk Some more desiderata: F should be unchanged by exchanging w → ˜ w and X → XT This requires that F

  • (wi − wj)T ˜

wk

  • = F
  • wT

i ˜

wk

  • F
  • wT

j ˜

wk

  • ⇒ F(wT

i ˜

wk) = Pik exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • Chris Kedzie

GloVe March 25, 2015 18 / 30

slide-52
SLIDE 52

Derivation

exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • Chris Kedzie

GloVe March 25, 2015 19 / 30

slide-53
SLIDE 53

Derivation

exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • wT

i ˜

wk = log Pik = log Xik − log Xi

Chris Kedzie GloVe March 25, 2015 19 / 30

slide-54
SLIDE 54

Derivation

exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • wT

i ˜

wk = log Pik = log Xik − log Xi

Chris Kedzie GloVe March 25, 2015 19 / 30

slide-55
SLIDE 55

Derivation

exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • wT

i ˜

wk = log Pik = log Xik − log Xi wT

i ˜

wk = log Xik − bi − ˜ bk

Chris Kedzie GloVe March 25, 2015 19 / 30

slide-56
SLIDE 56

Derivation

exp

  • wT

i ˜

wk − wT

j ˜

wk

  • = exp
  • wT

i ˜

wk

  • exp
  • wT

j ˜

wk

  • wT

i ˜

wk = log Pik = log Xik − log Xi wT

i ˜

wk = log Xik − bi − ˜ bk wT

i ˜

wk + bi + ˜ bk = log Xik

Chris Kedzie GloVe March 25, 2015 19 / 30

slide-57
SLIDE 57

Derivation

This suggests a least-squares objective function, J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-58
SLIDE 58

Derivation

This suggests a least-squares objective function, but... J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-59
SLIDE 59

Derivation

This suggests a least-squares objective function, but... J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-60
SLIDE 60

Derivation

This suggests a least-squares objective function, but... J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-61
SLIDE 61

Derivation

This suggests a least-squares objective function, but... J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2 ⇒ J =

V

  • i,j=1

f (Xij)

  • wT

i ˜

wj + bi + ˜ bj − log Xij 2 where f has the following desiderata:

1 f(0) = 0 2 f(x) should be non-decreasing so that rare co-occurrences are not

  • verweighted.

3 f(x) should be relatively small for large values of x, so that frequent

co-occurrences are not overweighted.

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-62
SLIDE 62

Derivation

This suggests a least-squares objective function, but... J =

V

  • i,j=1
  • wT

i ˜

wj + bi + ˜ bj − log Xij 2 ⇒ J =

V

  • i,j=1

f (Xij)

  • wT

i ˜

wj + bi + ˜ bj − log Xij 2 where f(x) =

  • x

xmax

α if x < xmax 1

  • therwise

Chris Kedzie GloVe March 25, 2015 20 / 30

slide-63
SLIDE 63

Weighting Function

Chris Kedzie GloVe March 25, 2015 21 / 30

slide-64
SLIDE 64

Optimization

J =

V

  • i,j=1

f (Xij)

  • wT

i ˜

wj + bi + ˜ bj − log Xij 2 where f(x) =

  • x

xmax

α if x < xmax 1

  • therwise

In this paper: α = 3

4 and xmax = 100.

The model is trained using AdaGrad and stochastically sampling non-zero elements from X. An initial learning rate of .05 is used.

Chris Kedzie GloVe March 25, 2015 22 / 30

slide-65
SLIDE 65

GloVe

1

Introduction

2

Problem

3

GloVe Model

4

Experiments

Chris Kedzie GloVe March 25, 2015 23 / 30

slide-66
SLIDE 66

Word Analogies

a is to b as c to ? wb − wa + wc

Chris Kedzie GloVe March 25, 2015 24 / 30

slide-67
SLIDE 67

Word Analogies

a is to b as c to ? Paris is to France as Tokyo is to ? wb − wa + wc

Chris Kedzie GloVe March 25, 2015 24 / 30

slide-68
SLIDE 68

Word Analogies

a is to b as c to ? Paris is to France as Tokyo is to ? arg maxw′ cosine-sim(wb − wa + wc, w′)

Chris Kedzie GloVe March 25, 2015 24 / 30

slide-69
SLIDE 69

Word Analogies – Results

Chris Kedzie GloVe March 25, 2015 25 / 30

slide-70
SLIDE 70

Word Similarities

Humans scored similarity of word pairs. word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1) king cabbage 0.23 0.11 king queen 8.58 0.78 king rook 5.92 0.25

Chris Kedzie GloVe March 25, 2015 26 / 30

slide-71
SLIDE 71

Word Similarities

Humans scored similarity of word pairs. word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1) king cabbage 0.23 0.11 king queen 8.58 0.78 king rook 5.92 0.25

Chris Kedzie GloVe March 25, 2015 26 / 30

slide-72
SLIDE 72

Word Similarities

Humans scored similarity of word pairs. word 1 word 2 human score (mean) (1-10) cosine-similarity (-1, 1) king cabbage 0.23 0.11 king queen 8.58 0.78 king rook 5.92 0.25 Embeddings are evaluated by Spearman rank correlation of human scores to cosine similarity.

Chris Kedzie GloVe March 25, 2015 26 / 30

slide-73
SLIDE 73

Word Similarities – Results

Chris Kedzie GloVe March 25, 2015 27 / 30

slide-74
SLIDE 74

Named Entity Recognition

NER is a sequence tagging task where the goal is to identify named entities: Jim bought 300 shares of Acme Corp . in 2006 . B-PER O O O O B-ORG I-ORG I-ORG O B-TIME O

Chris Kedzie GloVe March 25, 2015 28 / 30

slide-75
SLIDE 75

Named Entity Recognition

NER is a sequence tagging task where the goal is to identify named entities: Jim bought 300 shares of Acme Corp . in 2006 . B-PER O O O O B-ORG I-ORG I-ORG O B-TIME O Combined discrete features of existing system (Stanford NER). Word embeddings were treated as additional features in a linear-chain CRF model.

Chris Kedzie GloVe March 25, 2015 28 / 30

slide-76
SLIDE 76

Named Entity Recognition – Results

Chris Kedzie GloVe March 25, 2015 29 / 30

slide-77
SLIDE 77

The end! Thanks!

Chris Kedzie GloVe March 25, 2015 30 / 30