Improving the Compositionality of Word Embeddings M ASTER T HESIS - - PowerPoint PPT Presentation

improving the compositionality of word embeddings
SMART_READER_LITE
LIVE PREVIEW

Improving the Compositionality of Word Embeddings M ASTER T HESIS - - PowerPoint PPT Presentation

Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr. Evangelos K ANOULAS Thijs S CHEEPERS dr. Efstratios G AVVES Truely understanding A far out goal for Artificial Intelligence What is your name? Such a


slide-1
SLIDE 1

Improving the Compositionality of Word Embeddings

MASTER THESIS

Author: Thijs SCHEEPERS Supervisors:

  • dr. Evangelos KANOULAS
  • dr. Efstratios GAVVES
slide-2
SLIDE 2

A far out goal for Artificial Intelligence

Truely understanding

slide-3
SLIDE 3

What is your name?

Such a simple question

from Her by Spike Jonze (2013)

slide-4
SLIDE 4

„What is your name?‰

01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111

Transforming to Binary

slide-5
SLIDE 5

„What is your name?‰

01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111

ASCII

slide-6
SLIDE 6

„What is your name?‰

What is your name

1 … 1 … … 1 … … 1

Bag-of-words

100,000

slide-7
SLIDE 7

Improving the Compositionality of
 Word Embeddings

TITLE OF THE MASTER THESIS

slide-8
SLIDE 8

„What is your name?‰

What is your name

0.23 1.56 …

  • 0.78

0.93 1.62

  • 0.25

  • 0.53

1.72

  • 1.60

0.82 … 0.91

  • 1.39

0.87 1.32 …

  • 1.41
  • 0.91

Word Embeddings

300

slide-9
SLIDE 9

Word Embeddings encode Lexical Semantics, i.e. word meaning

What is your name

0.23 1.56 …

  • 0.78

0.93 1.62

  • 0.25

  • 0.53

1.72

  • 1.60

0.82 … 0.91

  • 1.39

0.87 1.32 …

  • 1.41
  • 0.91

300

slide-10
SLIDE 10

20 10 10 20 20 10 10 20 a

  • f

the

  • r

in and to that an with by for as is from

  • n

who having used

  • ne

not genus small any at be which united states especially relating into something being usually 's are person large flowers manner someone act it its north some made two can between make body american part form leaves state people

  • ther

america has white consisting family than without time but water was plant more when

  • rder
  • ften

long through group city central plants new tropical language another south unit where tree use system cause whose resembling quality various his world

  • ver

type characterized yellow

  • ut

place have southern light their member northern surface several blood up containing characteristic eastern trees common red western all after very first ' position during no english river if born food about like under number

  • ld

together perennial area sometimes point found fruit herbs high work native so equal particular head action process certain been short property law capable sound money line name many great etc shrubs asia become side disease europe lacking ancient back sea your wood around military end such

  • nly

africa skin making force living set air color war substance parts widely given european shrub move liquid same british against black computer power fish spoken most edible cultivated french way each branch material caused region government evergreen animal greek device formed southeastern trade roman right hard marked near woman metal you animals green before

  • ff

game related acid ground structure similar

slide-11
SLIDE 11

Word Embedding space ‘Netherlands’ + {capital} = ‘Amsterdam’

⅕ · (‘Berlin’ – ‘Germany’) + (‘Stockholm’ – ‘Sweden’) + (‘Washington DC’ – ‘United States’) + (‘Beijing’ – ‘China’) + (‘London’ – ‘United Kingdom’ ) ≈ {capital}

slide-12
SLIDE 12
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 Japan France Russia Germany Italy Spain Greece Turkey Beijing China Paris Tokyo Poland Moscow Portugal Berlin Rome Athens Madrid Ankara Warsaw Lisbon

from Mikolov et al. (2013)

slide-13
SLIDE 13

Improving the Compositionality of
 Word Embeddings

TITLE OF THE MASTER THESIS

slide-14
SLIDE 14

Combine encodings of word meanings in such a way that a good encoding of their joint meaning is created

Word Embedding Composition

slide-15
SLIDE 15

What is your name

0.23 1.56 …

  • 0.78

0.93 1.62

  • 0.25

  • 0.53

1.72

  • 1.60

0.82 … 0.91

  • 1.39

0.87 1.32 …

  • 1.41
  • 0.91

Word Embedding Composition

300

  • 0.13

1.65 … 1.63 0.99

f ( ) =

„What is your name?‰

slide-16
SLIDE 16

Overview

1. Evaluating compositionality 2. Tuning word embeddings for
 better algebraic composition 3. Neural methods for composing
 word embeddings

slide-17
SLIDE 17

Introducing CompVecEval a method to evaluate word embeddings on their compositionality

1. Evaluating compositionality

slide-18
SLIDE 18

Dictionaries

A pragmatic solution for word meaning

slide-19
SLIDE 19

cat /kat/

A small domesticated carnivorous mammal with soft fur, a short snout, and retractable claws. It is widely kept as a pet or for catching mice, and many breeds have been developed.

slide-20
SLIDE 20
slide-21
SLIDE 21

cat /kat/

A method of examining body organs by scanning them with X-rays and using a computer to construct a series of cross-sectional scans along a single axis.

slide-22
SLIDE 22

c

human being a f c person

x[0…2]

slide-23
SLIDE 23

Dictionary

1. WordNet (Miller and Fellbaum 1998) 2. We use 4,119 datapoints for our evaluation method, and 72,322 datapoints for tuning

slide-24
SLIDE 24

Popular pretrained Word Embeddings

1. Word2Vec (Mikolov et al. 2013) 2. GloVe (Pennington et al. 2014) 3. fastText (Bojanowski et al. 2016) 4. Paragram (Wieting et al. 2015)

slide-25
SLIDE 25

Word2Vec Skip-gram

wt

the ate

wt+2

cat ate the mouse

wt+1 wt wt-1 wt-2

slide-26
SLIDE 26

Additive Compositionality proven for Skip-Gram*

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 69–76 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1007 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 69–76 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1007

Skip-Gram – Zipf + Uniform = Vector Additivity

A l e x G i t t e n s D e p t .

  • f

C

  • m

p u t e r S c i e n c e R e n s s e l a e r P

  • l

y t e c h n i c I n s t i t u t e

gittea@rpi.edu

D i m i t r i s A c h l i

  • p

t a s D e p t .

  • f

C

  • m

p u t e r S c i e n c e U C S a n t a C r u z

  • ptas@soe.ucsc.edu

M i c h a e l W . M a h

  • n

e y I C S I a n d D e p t .

  • f

S t a t i s t i c s U C B e r k e l e y

mmahoney@stat.berkeley.edu

A b s t r a c t

I n r e c e n t y e a r s w

  • r

d

  • e

m b e d d i n g m

  • d

e l s h a v e g a i n e d g r e a t p

  • p

u l a r i t y d u e t

  • t

h e i r r e m a r k a b l e p e r f

  • r

m a n c e

  • n

s e v e r a l t a s k s , i n c l u d i n g w

  • r

d a n a l

  • g

y q u e s t i

  • n

s a n d c a p

  • t

i

  • n

g e n e r a t i

  • n

. A n u n e x p e c t e d “ s i d e

  • e

f f e c t ”

  • f

s u c h m

  • d

e l s i s t h a t t h e i r v e c t

  • r

s

  • f

t e n e x h i b i t c

  • m

p

  • s

i t i

  • n

a l i t y , i . e . , a d d i n g t w

  • w
  • r

d

  • v

e c t

  • r

s r e s u l t s i n a v e c t

  • r

t h a t i s

  • n

l y a s m a l l a n g l e a w a y f r

  • m

t h e v e c t

  • r
  • f

a w

  • r

d r e p r e s e n t i n g t h e s e m a n t i c c

  • m
  • p
  • s

i t e

  • f

t h e

  • r

i g i n a l w

  • r

d s , e . g . , “ m a n ” + “ r

  • y

a l ” = “ k i n g ” . T h i s w

  • r

k p r

  • v

i d e s a t h e

  • r

e t i c a l j u s t i fi c a

  • t

i

  • n

f

  • r

t h e p r e s e n c e

  • f

a d d i t i v e c

  • m

p

  • s

i

  • t

i

  • n

a l i t y i n w

  • r

d v e c t

  • r

s l e a r n e d u s i n g t h e S k i p

  • G

r a m m

  • d

e l . I n p a r t i c u l a r , i t s h

  • w

s t h a t a d d i t i v e c

  • m

p

  • s

i t i

  • n

a l i t y h

  • l

d s i n a n e v e n s t r i c t e r s e n s e ( s m a l l d i s t a n c e r a t h e r t h a n s m a l l a n g l e ) u n d e r c e r t a i n a s s u m p

  • t

i

  • n

s

  • n

t h e p r

  • c

e s s g e n e r a t i n g t h e c

  • r

p u s . A s a c

  • r
  • l

l a r y , i t e x p l a i n s t h e s u c c e s s

  • f

v e c t

  • r

c a l c u l u s i n s

  • l

v i n g w

  • r

d a n a l

  • g

i e s . W h e n t h e s e a s s u m p t i

  • n

s d

  • n
  • t

h

  • l

d , t h i s w

  • r

k d e s c r i b e s t h e c

  • r

r e c t n

  • n
  • l

i n e a r c

  • m
  • p
  • s

i t i

  • n
  • p

e r a t

  • r

. F i n a l l y , t h i s w

  • r

k e s t a b l i s h e s a c

  • n
  • n

e c t i

  • n

b e t w e e n t h e S k i p

  • G

r a m m

  • d

e l a n d t h e S u f fi c i e n t D i m e n s i

  • n

a l i t y R e d u c

  • t

i

  • n

( S D R ) f r a m e w

  • r

k

  • f

G l

  • b

e r s

  • n

a n d T i s h b y : t h e p a r a m e t e r s

  • f

S D R m

  • d

e l s c a n b e

  • b

t a i n e d f r

  • m

t h

  • s

e

  • f

S k i p

  • G

r a m m

  • d

e l s s i m p l y b y a d d i n g i n f

  • r

m a t i

  • n
  • n

s y m b

  • l

f r e q u e n c i e s . T h i s s h

  • w

s t h a t S k i p

  • G

r a m e m b e d d i n g s a r e

  • p

t i m a l i n t h e s e n s e

  • f

G l

  • b

e r s

  • n

a n d T i s h b y a n d , f u r t h e r , i m

  • p

l i e s t h a t t h e h e u r i s t i c s c

  • m

m

  • n

l y u s e d t

  • a

p p r

  • x

i m a t e l y fi t S k i p

  • G

r a m m

  • d

e l s c a n b e u s e d t

  • fi

t S D R m

  • d

e l s .

1 I n t r

  • d

u c t i

  • n

T h e s t r a t e g y

  • f

r e p r e s e n t i n g w

  • r

d s a s v e c t

  • r

s h a s a l

  • n

g h i s t

  • r

y i n c

  • m

p u t a t i

  • n

a l l i n g u i s t i c s a n d m a c h i n e l e a r n i n g . T h e g e n e r a l i d e a i s t

  • fi

n d a m a p f r

  • m

w

  • r

d s t

  • v

e c t

  • r

s s u c h t h a t w

  • r

d

  • s

i m i l a r i t y a n d v e c t

  • r
  • s

i m i l a r i t y a r e i n c

  • r

r e s p

  • n
  • d

e n c e . W h i l s t v e c t

  • r
  • s

i m i l a r i t y c a n b e r e a d i l y q u a n t i fi e d i n t e r m s

  • f

d i s t a n c e s a n d a n g l e s , q u a n

  • t

i f y i n g w

  • r

d

  • s

i m i l a r i t y i s a m

  • r

e a m b i g u

  • u

s t a s k . A k e y i n s i g h t i n t h a t r e g a r d i s t

  • p
  • s

i t t h a t t h e m e a n i n g

  • f

a w

  • r

d i s c a p t u r e d b y “ t h e c

  • m

p a n y i t k e e p s ” ( F i r t h , 1 9 5 7 ) a n d , t h e r e f

  • r

e , t h a t t w

  • w
  • r

d s t h a t k e e p c

  • m

p a n y w i t h s i m i l a r w

  • r

d s a r e l i k e l y t

  • b

e s i m i l a r t h e m s e l v e s . I n t h e s i m p l e s t c a s e ,

  • n

e s e e k s v e c t

  • r

s w h

  • s

e i n n e r p r

  • d

u c t s a p p r

  • x

i m a t e t h e c

  • c

c u r r e n c e f r e

  • q

u e n c i e s . I n m

  • r

e s

  • p

h i s t i c a t e d m e t h

  • d

s c

  • c

c u r r e n c e s a r e r e w e i g h e d t

  • s

u p p r e s s t h e e f f e c t

  • f

m

  • r

e f r e q u e n t w

  • r

d s ( R

  • h

d e e t a l . , 2 6 ) a n d /

  • r

t

  • e

m p h a s i z e p a i r s

  • f

w

  • r

d s w h

  • s

e c

  • c

c u r r e n c e f r e q u e n c y m a x i m a l l y d e v i a t e s f r

  • m

t h e i n d e p e n

  • d

e n c e a s s u m p t i

  • n

( C h u r c h a n d H a n k s , 1 9 9 ) . A n a l t e r n a t i v e t

  • s

e e k i n g w

  • r

d

  • e

m b e d d i n g s t h a t r e fl e c t c

  • c

c u r r e n c e s t a t i s t i c s i s t

  • e

x t r a c t t h e v e c t

  • r

i a l r e p r e s e n t a t i

  • n
  • f

w

  • r

d s f r

  • m

n

  • n
  • l

i n e a r s t a t i s t i c a l l a n g u a g e m

  • d

e l s , s p e c i fi c a l l y n e u r a l n e t w

  • r

k s . ( B e n g i

  • e

t a l . , 2 3 ) a l r e a d y p r

  • p
  • s

e d ( i ) a s s

  • c

i a t i n g w i t h e a c h v

  • c

a b u l a r y w

  • r

d a f e a

  • t

u r e v e c t

  • r

, ( i i ) e x p r e s s i n g t h e p r

  • b

a b i l i t y f u n c

  • t

i

  • n
  • f

w

  • r

d s e q u e n c e s i n t e r m s

  • f

t h e f e a t u r e v e c

  • t
  • r

s

  • f

t h e w

  • r

d s i n t h e s e q u e n c e , a n d ( i i i ) l e a r n

  • i

n g s i m u l t a n e

  • u

s l y t h e v e c t

  • r

s a n d t h e p a r a m e

  • t

e r s

  • f

t h e p r

  • b

a b i l i t y f u n c t i

  • n

. T h i s a p p r

  • a

c h c a m e i n t

  • p

r

  • m

i n e n c e r e c e n t l y t h r

  • u

g h w

  • r

k s

  • f

M i k

  • l
  • v

e t a l . ( s e e b e l

  • w

) w h

  • s

e m a i n d e p a r t u r e f r

  • m

( B e n g i

  • e

t a l . , 2 3 ) w a s t

  • f
  • l

l

  • w

t h e s u g

  • g

e s t i

  • n
  • f

( M n i h a n d H i n t

  • n

, 2 7 ) a n d t r a d e

  • a

w a y t h e e x p r e s s i v e c a p a c i t y

  • f

g e n e r a l n e u r a l

  • n

e t w

  • r

k m

  • d

e l s f

  • r

t h e s c a l a b i l i t y ( t

  • v

e r y l a r g e 6 9

1. Uniform distribution is assumed 2. Definition of compositionality

slide-27
SLIDE 27

Evaluating by Ranking

1. We rank lemmas according to their euclidean distance 2. We use a ball-tree algorithm to make this efficient 3. We considered several ranking metrics, and choose to use Mean Reciprocal Rank

slide-28
SLIDE 28

Algebraic Composition

1. Addition 2. Averaging 3. Multiplication 4. Max-pooling

slide-29
SLIDE 29

Evaluation Results

slide-30
SLIDE 30

Improving existing word embeddings by tuning them to algebraically compose lexicographic data

2. Tuning for algebraic composition

slide-31
SLIDE 31

c

human a f c being person

x[0…2] yp c

<random lemma>

yn

  • bjective

maximize distance minimize distance

slide-32
SLIDE 32

Objective Function

triplet loss :=

N

i=1

max ⇣

||ci − yp

i ||2 − ||ci − yn i ||2 + α, 0

1. Triplet loss function 2. Negative example 3. Within a margin

slide-33
SLIDE 33

CompVecEval

slide-34
SLIDE 34

Evaluating tuned embeddings

1. We evaluated using CompVecEval 2. But also using 15 existing sentence representation evaluation methods 3. And 13 existing word representation evaluation methods

slide-35
SLIDE 35

STS14

SENTENCE REPRESENTATION EVALUATION

slide-36
SLIDE 36

STS14

SENTENCE REPRESENTATION EVALUATION

slide-37
SLIDE 37

SimLex-999

WORD REPRESENTATION EVALUATION

slide-38
SLIDE 38

SimLex-999

WORD REPRESENTATION EVALUATION

slide-39
SLIDE 39

Instead of tuning word embeddings for algebraic composition, we now turn to learnable composition functions

3. Neural models for composing embeddings

slide-40
SLIDE 40

Learnable composition functions

1. Projection function 2. Recurrent composition functions 3. Convolutional composition functions

slide-41
SLIDE 41

“I have to read this book.” “I have this book to read.”

slide-42
SLIDE 42

human a being

x[0…2]

f c f c f c person

h-1 h0 h1 c = h2

Recurrent Composition Function

slide-43
SLIDE 43

Gated Recurrent Unit

c h u r

f gru

xi ~ hnext hprev

slide-44
SLIDE 44

CompVecEval

slide-45
SLIDE 45

STS14 on GloVe

SENTENCE REPRESENTATION EVALUATION

slide-46
SLIDE 46

SimLex-999 on GloVe

WORD REPRESENTATION EVALUATION

slide-47
SLIDE 47

Expanding to encyclopedic data

slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50

Semantic representations, and composition can improve when tuning using lexicographic data Just simple summation is a good composition function, don’t consider averaging

Conclusion

slide-51
SLIDE 51

Please feel free to ask anything

Questions?

slide-52
SLIDE 52

Multi word lemmaÊs

c

human a f c being

x[0…2] yp

  • bjective

minimize distance sapien homo f c

yp

[0…1]

c yn

maximize distance

<lemma> <other>

f c

yn

[0…1]

slide-53
SLIDE 53

CompVecEval

slide-54
SLIDE 54

Sentence Evaluation

slide-55
SLIDE 55

Word Evaluation

slide-56
SLIDE 56

Vocabulary Overlap

COMPVECEVAL

slide-57
SLIDE 57

CompVecEval

ALTERNATIVE VOCABULARY

slide-58
SLIDE 58

Vocabulary Overlap

SENTENCE REPRESENTATION EVALUATION

slide-59
SLIDE 59

Statistical Significance

GLOVE AND WORD2VEC

slide-60
SLIDE 60

Statistical Significance

FASTTEXT AND PARAGRAM

slide-61
SLIDE 61

Improving the Compositionality of Word Embeddings

MASTER THESIS

Author: Thijs SCHEEPERS Supervisors:

  • dr. Evangelos KANOULAS
  • dr. Efstratios GAVVES
slide-62
SLIDE 62

Thanks!

slide-63
SLIDE 63

Announcements

slide-64
SLIDE 64 BIBLIOGRAPHY 121 [111] Tobias Schnabel et al. “Evaluation methods for unsupervised word embed- dings”. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015. Ed. by Lluís Màrquez et al. The Association for Computational Linguistics, 2015,
  • pp. 298–307. ISBN: 978-1-941643-32-7. URL: http://aclweb.org/anthology/D/
D15/D15-1036.pdf. [112] Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681. [113] Hinrich Schütze. “Word space”. In: Advances in neural information processing
  • systems. 1993, pp. 895–902.
[114] John Searle. “The background of meaning”. In: Speech act theory and pragmatics (1980), pp. 221–232. [115] John R Searle. Expression and meaning: Studies in the theory of speech acts. Cam- bridge University Press, 1985. [116] Ali Sharif Razavian et al. “CNN features off-the-shelf: an astounding baseline for recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014, pp. 806–813. [117] Richard Socher et al. “Parsing with Compositional Vector Grammars.” In: ACL. 2013, pp. 455–465. [118] Richard Socher et al. “Recursive deep models for semantic compositionality
  • ver a sentiment treebank”. In: Proceedings of the conference on empirical methods
in natural language processing (EMNLP). Vol. 1631. Citeseer. 2013, p. 1642. [119] Richard Socher et al. “Semantic compositionality through recursive matrix- vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods natural language processing and computational natural language learning. Associa- tion for Computational Linguistics. 2012, pp. 1201–1211. [120] Charles Spearman. “The proof and measurement of association between things”. In: The American journal of psychology 15.1 (1904), pp. 72–101. [121] Nitish Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting.” In: Journal of machine learning research 15.1 (2014), pp. 1958. [122] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. “End-to-end networks”. In: Advances in neural information processing systems. 2015, 2448. [123] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence with neural networks”. In: Advances in neural information processing 2014, pp. 3104–3112. [124] Zoltán Gendler Szabó. “Compositionality”. In: Stanford encyclopedia phy (2010). [125] Zolt>aacute;n Gendler Szabó<. “Compositionality”. In: The Stanford dia of Philosophy. Ed. by Edward N. Zalta. Summer 2017. Metaphysics Lab, Stanford University, 2017. 7 6 C h a p t e r 5 . N e u r a l m
  • d
e l s f
  • r
c
  • m
p
  • s
i t i
  • n
(A) STS14 Word2Vec (B) STS14 Paragram (C) SICK Relatedness Word2Vec (D) SICK Relatedness Paragram (E) SICK Entailment Word2Vec (F) SICK Entailment Paragram FIGURE 5.9: This figure shows the progression of sentence represen- tation evaluations scores during tuning of word2vec, and Paragram with additive, GRU and Bi-GRU composition. We display Pearsons’s r × 100 for STS14 and SICK-R and accuracy ×100 for SICK-E. 3.5. Experimental results 31 TABLE 3 . 1 : T h e r e s u l t s f r
  • m
e v a l u a t i n g r a n k i n g . A l l r e s u l t s f
  • r
M R R , M A P , a n d M P @ 1 a r e d e n
  • t
e d a s × 1 i n p e r c e n t a g e s . W e a l s
  • i
n
  • c
l u d e d r e s u l t s f r
  • m
r a n d
  • m
r a n k i n g s
  • n
t h e d a t a s e t , t h i s i s n
  • t
s p e c i fi c t
  • a
n y
  • f
t h e w
  • r
d e m b e d d i n g s . Word2Vec GloVe fastText Paragram MRR random 0.7 % + 16.8 % 11.9 % 20.7 % 26.5 % avg ( d ) 2.0 % 3.3 % 3.0 % 3.8 % × 0.6 % 0.9 % 0.9 % 1.0 % max ( d ) 6.6 % 13.7 % 14.6 % 20.5 % MNR random 54.2 % + 83.9 % 83.5 % 86.3 % 90.3 % avg ( d ) 71.7 % 75.5 % 71.2 % 71.2 % × 62.8 % 65.2 % 59.0 % 54.6 % max ( d ) 63.1 % 83.7 % 78.5 % 85.7 % MAP random 0.6 % + 15.3 % 10.8 % 18.9 % 24.8 % avg ( d ) 1.8 % 2.9 % 2.6 % 3.4 % × 0.6 % 0.8 % 0.8 % 0.9 % max ( d ) 6.0 % 12.4 % 13.3 % 18.9 % MP@10 random 0.1 % + 2.9 % 2.2 % 3.6 % 5.2 % avg ( d ) 0.3 % 0.6 % 0.5 % 0.8 % × 0.1 % 0.1 % 0.1 % 0.3 % max ( d ) 1.1 % 2.4 % 2.4 % 3.8 % When comparing pretrained embeddings, Paragram is the clear winner. It is also the
  • nly embedding which is already tuned from an original embedding (namely GloVe).
In the original paper by [135] they already showed significant improvements over GloVe on sentence evaluation tasks, so this is in line with our results here. We also see a huge margin between the performance of composing by averaging and additive composition. Clearly averaging is not necessarily the best approach when evaluating embeddings using not just their angle. We also see that multiplicative composition clearly fails to amount to a meaningful representation as the results are relatively close to the random baseline. This is in direct contradiction of the state- ments made by Mitchell and Lapata [89]. Composition by max-pooling also seems to work surprisingly well. Especially, when considering the loss of data inherit in the operation. The operation will discretely choose the maximum value on the embedding dimension. Clearly this seems to have important semantic value. Word2Vec performs relatively poorly on this compositional dataset, but this can be explained by the nature of the test data. The dataset where Word2Vec was trained on used news data, where fastText and GloVe use more definitional data, Wikipedia and Common Crawl respectively. Still lots of researchers use Word2Vec as a starting point for their training procedure, our evaluation here shows that there are better options. fastText is able to perform well on additive composition. When comparing it to GloVe, v 3.3.3 Vocabulary overlap with the pretrained embeddings . . . . . . 26 3.3.4 Data input practicalities . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Compositional Vector Evaluation . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1 Nearest Neighbor Ranking . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Ranking measures . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.3 Defining CompVecEval . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.1 Quantitative results . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.2 Qualitative results . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Training word embeddings for algebraic composition 35 4.1 Tune embeddings for compositionality . . . . . . . . . . . . . . . . . . . 36 4.1.1 Triplet loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.4 Training data and multi token targets . . . . . . . . . . . . . . . 39 4.2 Measuring improvement of the overall quality . . . . . . . . . . . . . . 40 4.2.1 Pearson’s rank correlation coefficient . . . . . . . . . . . . . . . 40 4.2.2 Spearman’s rank correlation coefficient . . . . . . . . . . . . . . 40 4.3 Evaluating Word Representations . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 WS-353 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.2 SimLex-999 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3.3 SimVerb-3500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.4 Early word similarity datasets . . . . . . . . . . . . . . . . . . . 43 4.3.5 Stanford’s rare words . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.6 YP-130 and VERB-143 . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.7 Miscellaneous datasets created using Mechanical Turk . . . . . 44 4.4 Evaluating Sentence Representations . . . . . . . . . . . . . . . . . . . . 45 4.4.1 TREC: Question-Type Classification . . . . . . . . . . . . . . . . 46 4.4.2 Microsoft Research Paraphrasing Corpus . . . . . . . . . . . . . 46 4.4.3 Stanford: Simple, Sentiment and Topic Classification . . . . . . 47 4.4.4 SemEval: Semantic Textual Similarity . . . . . . . . . . . . . . . 48 4.4.5 Stanford Sentiment Treebank . . . . . . . . . . . . . . . . . . . . 49 4.4.6 Sentences Involving Compositional Knowledge . . . . . . . . . 50 4.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5.1 Results on CompVecEval . . . . . . . . . . . . . . . . . . . . . . 53 4.5.2 Results on Word Representation Evaluations . . . . . . . . . . . 53 4.5.3 Results on Sentence Representation Evaluations . . . . . . . . . 57 4.5.4 Impact of randomly initialized embeddings . . . . . . . . . . . 59 4.5.5 Qualitative results on embedding magnitude . . . . . . . . . . . 59 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Neural models for composition 63 5.1 Projecting algebraic compositions . . . . . . . . . . . . . . . . . . . . . . 64

UNIVERSITEIT VAN AMSTERDAM

MASTER THESIS

I m p r

  • v

i n g t h e C

  • m

p

  • s

i t i

  • n

a l i t y

  • f

W

  • r

d E m b e d d i n g s

A u t h
  • r
: M a t h i j s J . S C H E E P E R S S u p e r v i s
  • r
s : d r . E v a n g e l
  • s
K A N O U L A S d r . E f s t r a t i
  • s
G A V V E S A s s e s s
  • r
: p r
  • f
. d r . M a a r t e n D E R I J K E A t h e s i s s u b m i t t e d t
  • t
h e B
  • a
r d
  • f
E x a m i n e r s i n p a r t i a l f u l fi l l m e n t
  • f
t h e r e q u i r e m e n t s f
  • r
t h e d e g r e e
  • f
M a s t e r
  • f
S c i e n c e i n A r t i fi c i a l I n t e l l i g e n c e . N
  • v
e m b e r 2 9 , 2 1 7

http://github.com/tscheepers/CompVec

slide-65
SLIDE 65
slide-66
SLIDE 66

Enschede Rightersbleek-Zandvoort 10 2.06 7521 BE Enschede +31 (0)53 711 34 99

slide-67
SLIDE 67

Enschede Rightersbleek-Zandvoort 10 2.06 7521 BE Enschede +31 (0)53 711 34 99 Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49

slide-68
SLIDE 68

Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49

slide-69
SLIDE 69

Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49

slide-70
SLIDE 70
slide-71
SLIDE 71

Now Drinks at cafe de Polder