Improving the Compositionality of Word Embeddings
MASTER THESIS
Author: Thijs SCHEEPERS Supervisors:
- dr. Evangelos KANOULAS
- dr. Efstratios GAVVES
Improving the Compositionality of Word Embeddings M ASTER T HESIS - - PowerPoint PPT Presentation
Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr. Evangelos K ANOULAS Thijs S CHEEPERS dr. Efstratios G AVVES Truely understanding A far out goal for Artificial Intelligence What is your name? Such a
MASTER THESIS
Author: Thijs SCHEEPERS Supervisors:
A far out goal for Artificial Intelligence
Such a simple question
from Her by Spike Jonze (2013)
01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111
Transforming to Binary
01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111
ASCII
What is your name
1 … 1 … … 1 … … 1
Bag-of-words
100,000
TITLE OF THE MASTER THESIS
What is your name
0.23 1.56 …
0.93 1.62
…
1.72
0.82 … 0.91
0.87 1.32 …
Word Embeddings
300
What is your name
0.23 1.56 …
0.93 1.62
…
1.72
0.82 … 0.91
0.87 1.32 …
300
20 10 10 20 20 10 10 20 a
the
in and to that an with by for as is from
who having used
not genus small any at be which united states especially relating into something being usually 's are person large flowers manner someone act it its north some made two can between make body american part form leaves state people
america has white consisting family than without time but water was plant more when
long through group city central plants new tropical language another south unit where tree use system cause whose resembling quality various his world
type characterized yellow
place have southern light their member northern surface several blood up containing characteristic eastern trees common red western all after very first ' position during no english river if born food about like under number
together perennial area sometimes point found fruit herbs high work native so equal particular head action process certain been short property law capable sound money line name many great etc shrubs asia become side disease europe lacking ancient back sea your wood around military end such
africa skin making force living set air color war substance parts widely given european shrub move liquid same british against black computer power fish spoken most edible cultivated french way each branch material caused region government evergreen animal greek device formed southeastern trade roman right hard marked near woman metal you animals green before
game related acid ground structure similar
⅕ · (‘Berlin’ – ‘Germany’) + (‘Stockholm’ – ‘Sweden’) + (‘Washington DC’ – ‘United States’) + (‘Beijing’ – ‘China’) + (‘London’ – ‘United Kingdom’ ) ≈ {capital}
0.5 1 1.5 2
0.5 1 1.5 2 Japan France Russia Germany Italy Spain Greece Turkey Beijing China Paris Tokyo Poland Moscow Portugal Berlin Rome Athens Madrid Ankara Warsaw Lisbon
from Mikolov et al. (2013)
TITLE OF THE MASTER THESIS
Combine encodings of word meanings in such a way that a good encoding of their joint meaning is created
What is your name
0.23 1.56 …
0.93 1.62
…
1.72
0.82 … 0.91
0.87 1.32 …
Word Embedding Composition
300
1.65 … 1.63 0.99
1. Evaluating compositionality 2. Tuning word embeddings for better algebraic composition 3. Neural methods for composing word embeddings
Introducing CompVecEval a method to evaluate word embeddings on their compositionality
A pragmatic solution for word meaning
A small domesticated carnivorous mammal with soft fur, a short snout, and retractable claws. It is widely kept as a pet or for catching mice, and many breeds have been developed.
A method of examining body organs by scanning them with X-rays and using a computer to construct a series of cross-sectional scans along a single axis.
c
human being a f c person
x[0…2]
1. WordNet (Miller and Fellbaum 1998) 2. We use 4,119 datapoints for our evaluation method, and 72,322 datapoints for tuning
1. Word2Vec (Mikolov et al. 2013) 2. GloVe (Pennington et al. 2014) 3. fastText (Bojanowski et al. 2016) 4. Paragram (Wieting et al. 2015)
wt
the ate
wt+2
cat ate the mouse
wt+1 wt wt-1 wt-2
Additive Compositionality proven for Skip-Gram*
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 69–76 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1007 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 69–76 Vancouver, Canada, July 30 - August 4, 2017. c 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1007Skip-Gram – Zipf + Uniform = Vector Additivity
A l e x G i t t e n s D e p t .
C
p u t e r S c i e n c e R e n s s e l a e r P
y t e c h n i c I n s t i t u t e
gittea@rpi.eduD i m i t r i s A c h l i
t a s D e p t .
C
p u t e r S c i e n c e U C S a n t a C r u z
M i c h a e l W . M a h
e y I C S I a n d D e p t .
S t a t i s t i c s U C B e r k e l e y
mmahoney@stat.berkeley.eduA b s t r a c t
I n r e c e n t y e a r s w
d
m b e d d i n g m
e l s h a v e g a i n e d g r e a t p
u l a r i t y d u e t
h e i r r e m a r k a b l e p e r f
m a n c e
s e v e r a l t a s k s , i n c l u d i n g w
d a n a l
y q u e s t i
s a n d c a p
i
g e n e r a t i
. A n u n e x p e c t e d “ s i d e
f f e c t ”
s u c h m
e l s i s t h a t t h e i r v e c t
s
t e n e x h i b i t c
p
i t i
a l i t y , i . e . , a d d i n g t w
d
e c t
s r e s u l t s i n a v e c t
t h a t i s
l y a s m a l l a n g l e a w a y f r
t h e v e c t
a w
d r e p r e s e n t i n g t h e s e m a n t i c c
i t e
t h e
i g i n a l w
d s , e . g . , “ m a n ” + “ r
a l ” = “ k i n g ” . T h i s w
k p r
i d e s a t h e
e t i c a l j u s t i fi c a
i
f
t h e p r e s e n c e
a d d i t i v e c
p
i
i
a l i t y i n w
d v e c t
s l e a r n e d u s i n g t h e S k i p
r a m m
e l . I n p a r t i c u l a r , i t s h
s t h a t a d d i t i v e c
p
i t i
a l i t y h
d s i n a n e v e n s t r i c t e r s e n s e ( s m a l l d i s t a n c e r a t h e r t h a n s m a l l a n g l e ) u n d e r c e r t a i n a s s u m p
i
s
t h e p r
e s s g e n e r a t i n g t h e c
p u s . A s a c
l a r y , i t e x p l a i n s t h e s u c c e s s
v e c t
c a l c u l u s i n s
v i n g w
d a n a l
i e s . W h e n t h e s e a s s u m p t i
s d
h
d , t h i s w
k d e s c r i b e s t h e c
r e c t n
i n e a r c
i t i
e r a t
. F i n a l l y , t h i s w
k e s t a b l i s h e s a c
e c t i
b e t w e e n t h e S k i p
r a m m
e l a n d t h e S u f fi c i e n t D i m e n s i
a l i t y R e d u c
i
( S D R ) f r a m e w
k
G l
e r s
a n d T i s h b y : t h e p a r a m e t e r s
S D R m
e l s c a n b e
t a i n e d f r
t h
e
S k i p
r a m m
e l s s i m p l y b y a d d i n g i n f
m a t i
s y m b
f r e q u e n c i e s . T h i s s h
s t h a t S k i p
r a m e m b e d d i n g s a r e
t i m a l i n t h e s e n s e
G l
e r s
a n d T i s h b y a n d , f u r t h e r , i m
l i e s t h a t t h e h e u r i s t i c s c
m
l y u s e d t
p p r
i m a t e l y fi t S k i p
r a m m
e l s c a n b e u s e d t
t S D R m
e l s .
1 I n t r
u c t i
T h e s t r a t e g y
r e p r e s e n t i n g w
d s a s v e c t
s h a s a l
g h i s t
y i n c
p u t a t i
a l l i n g u i s t i c s a n d m a c h i n e l e a r n i n g . T h e g e n e r a l i d e a i s t
n d a m a p f r
w
d s t
e c t
s s u c h t h a t w
d
i m i l a r i t y a n d v e c t
i m i l a r i t y a r e i n c
r e s p
e n c e . W h i l s t v e c t
i m i l a r i t y c a n b e r e a d i l y q u a n t i fi e d i n t e r m s
d i s t a n c e s a n d a n g l e s , q u a n
i f y i n g w
d
i m i l a r i t y i s a m
e a m b i g u
s t a s k . A k e y i n s i g h t i n t h a t r e g a r d i s t
i t t h a t t h e m e a n i n g
a w
d i s c a p t u r e d b y “ t h e c
p a n y i t k e e p s ” ( F i r t h , 1 9 5 7 ) a n d , t h e r e f
e , t h a t t w
d s t h a t k e e p c
p a n y w i t h s i m i l a r w
d s a r e l i k e l y t
e s i m i l a r t h e m s e l v e s . I n t h e s i m p l e s t c a s e ,
e s e e k s v e c t
s w h
e i n n e r p r
u c t s a p p r
i m a t e t h e c
c u r r e n c e f r e
u e n c i e s . I n m
e s
h i s t i c a t e d m e t h
s c
c u r r e n c e s a r e r e w e i g h e d t
u p p r e s s t h e e f f e c t
m
e f r e q u e n t w
d s ( R
d e e t a l . , 2 6 ) a n d /
t
m p h a s i z e p a i r s
w
d s w h
e c
c u r r e n c e f r e q u e n c y m a x i m a l l y d e v i a t e s f r
t h e i n d e p e n
e n c e a s s u m p t i
( C h u r c h a n d H a n k s , 1 9 9 ) . A n a l t e r n a t i v e t
e e k i n g w
d
m b e d d i n g s t h a t r e fl e c t c
c u r r e n c e s t a t i s t i c s i s t
x t r a c t t h e v e c t
i a l r e p r e s e n t a t i
w
d s f r
n
i n e a r s t a t i s t i c a l l a n g u a g e m
e l s , s p e c i fi c a l l y n e u r a l n e t w
k s . ( B e n g i
t a l . , 2 3 ) a l r e a d y p r
e d ( i ) a s s
i a t i n g w i t h e a c h v
a b u l a r y w
d a f e a
u r e v e c t
, ( i i ) e x p r e s s i n g t h e p r
a b i l i t y f u n c
i
w
d s e q u e n c e s i n t e r m s
t h e f e a t u r e v e c
s
t h e w
d s i n t h e s e q u e n c e , a n d ( i i i ) l e a r n
n g s i m u l t a n e
s l y t h e v e c t
s a n d t h e p a r a m e
e r s
t h e p r
a b i l i t y f u n c t i
. T h i s a p p r
c h c a m e i n t
r
i n e n c e r e c e n t l y t h r
g h w
k s
M i k
e t a l . ( s e e b e l
) w h
e m a i n d e p a r t u r e f r
( B e n g i
t a l . , 2 3 ) w a s t
l
t h e s u g
e s t i
( M n i h a n d H i n t
, 2 7 ) a n d t r a d e
w a y t h e e x p r e s s i v e c a p a c i t y
g e n e r a l n e u r a l
e t w
k m
e l s f
t h e s c a l a b i l i t y ( t
e r y l a r g e 6 9
1. Uniform distribution is assumed 2. Definition of compositionality
1. We rank lemmas according to their euclidean distance 2. We use a ball-tree algorithm to make this efficient 3. We considered several ranking metrics, and choose to use Mean Reciprocal Rank
1. Addition 2. Averaging 3. Multiplication 4. Max-pooling
Improving existing word embeddings by tuning them to algebraically compose lexicographic data
c
human a f c being person
x[0…2] yp c
<random lemma>
yn
maximize distance minimize distance
triplet loss :=
N
i=1
max ⇣
||ci − yp
i ||2 − ||ci − yn i ||2 + α, 0
⌘
1. Triplet loss function 2. Negative example 3. Within a margin
1. We evaluated using CompVecEval 2. But also using 15 existing sentence representation evaluation methods 3. And 13 existing word representation evaluation methods
SENTENCE REPRESENTATION EVALUATION
SENTENCE REPRESENTATION EVALUATION
WORD REPRESENTATION EVALUATION
WORD REPRESENTATION EVALUATION
Instead of tuning word embeddings for algebraic composition, we now turn to learnable composition functions
1. Projection function 2. Recurrent composition functions 3. Convolutional composition functions
human a being
x[0…2]
f c f c f c person
h-1 h0 h1 c = h2
c h u r
f gru
xi ~ hnext hprev
SENTENCE REPRESENTATION EVALUATION
WORD REPRESENTATION EVALUATION
Semantic representations, and composition can improve when tuning using lexicographic data Just simple summation is a good composition function, don’t consider averaging
Please feel free to ask anything
c
human a f c being
x[0…2] yp
minimize distance sapien homo f c
yp
[0…1]
c yn
maximize distance
<lemma> <other>
f c
yn
[0…1]
COMPVECEVAL
ALTERNATIVE VOCABULARY
SENTENCE REPRESENTATION EVALUATION
GLOVE AND WORD2VEC
FASTTEXT AND PARAGRAM
MASTER THESIS
Author: Thijs SCHEEPERS Supervisors:
UNIVERSITEIT VAN AMSTERDAM
MASTER THESIS
I m p r
i n g t h e C
p
i t i
a l i t y
W
d E m b e d d i n g s
A u t hEnschede Rightersbleek-Zandvoort 10 2.06 7521 BE Enschede +31 (0)53 711 34 99
Enschede Rightersbleek-Zandvoort 10 2.06 7521 BE Enschede +31 (0)53 711 34 99 Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49
Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49
Amsterdam Kruithuisstraat 13 1018 WJ Amsterdam +31 (0)20 261 47 49