improving the compositionality of word embeddings
play

Improving the Compositionality of Word Embeddings M ASTER T HESIS - PowerPoint PPT Presentation

Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr. Evangelos K ANOULAS Thijs S CHEEPERS dr. Efstratios G AVVES Truely understanding A far out goal for Artificial Intelligence What is your name? Such a


  1. Improving the Compositionality of Word Embeddings M ASTER T HESIS Supervisors: Author: dr. Evangelos K ANOULAS Thijs S CHEEPERS dr. Efstratios G AVVES

  2. Truely understanding A far out goal for Artificial Intelligence

  3. What is your name? Such a simple question from Her by Spike Jonze (2013)

  4. „What is your name?‰ 01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111 Transforming to Binary

  5. „What is your name?‰ ⁄ 01010111 01101000 01100001 01110100 00100000 01101001 01110011 00100000 01111001 01101111 01110101 01110010 00100000 01101110 01100001 01101101 01100101 00111111 ASCII

  6. „What is your name?‰ What is your name 1 0 0 0 0 1 … 0 … … 1 … 100,000 0 0 … 1 0 0 0 0 Bag-of-words

  7. Improving the Compositionality of 
 Word Embeddings T ITLE OF THE M ASTER T HESIS

  8. „What is your name?‰ What is your name 0.23 1.62 -1.60 0.87 1.56 -0.25 0.82 1.32 … … … … 300 -0.78 -0.53 0.91 -1.41 Word 0.93 1.72 -1.39 -0.91 Embeddings

  9. Word Embeddings encode Lexical Semantics, i.e. word meaning What is your name 0.23 1.62 -1.60 0.87 1.56 -0.25 0.82 1.32 … … … … 300 -0.78 -0.53 0.91 -1.41 0.93 1.72 -1.39 -0.91

  10. capable can right law trade 20 act has work have 's was action is move are be western process force power make become making eastern manner way leaves southeastern been fruit south southern end government time being off asia state out branch language disease side point unit states up africa part game group form french formed 10 woman member parts english position body set world british america line structure american region place area head united ground city system cause people spoken device caused person computer surface made war someone given military you skin light something one material substance air your quality sound consisting back property their an containing tropical a its his characteristic metal food edible any found characterized all 0 marked the each resembling it acid together animal sea common relating two animals money water related liquid that or river red such and color which north same northern black yellow similar so who fish equal blood living etc but white like whose perennial as type order genus green central only than particular family especially other another having if use where plant plants used new no herbs not when hard without with name 10 lacking shrubs flowers after ' certain shrub number various to of wood some several for trees born europe tree evergreen cultivated european many native from by first into before between more in long through most under roman at during very greek against old over ancient great on small large short near high around widely about 20 usually often sometimes 20 10 0 10 20

  11. Word Embedding space ⅕ · (‘Berlin’ – ‘Germany’) + (‘Stockholm’ – ‘Sweden’) + (‘Washington DC’ – ‘United States’) + (‘Beijing’ – ‘China’) + (‘London’ – ‘United Kingdom’ ) ≈ {capital} ‘Netherlands’ + {capital} = ‘Amsterdam’

  12. 2 China Beijing 1.5 Russia Japan Moscow 1 Tokyo Ankara Turkey 0.5 Poland Germany 0 France Warsaw Berlin Italy Paris -0.5 Athens Greece Rome Spain -1 Madrid Portugal -1.5 Lisbon -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 from Mikolov et al. (2013)

  13. Improving the Compositionality of 
 Word Embeddings T ITLE OF THE M ASTER T HESIS

  14. Word Embedding Composition Combine encodings of word meanings in such a way that a good encoding of their joint meaning is created

  15. „What is your name?‰ What is your name 0.23 1.62 -1.60 0.87 -0.13 1.56 -0.25 0.82 1.32 1.65 f ( ) = … … … … … 300 -0.78 -0.53 0.91 -1.41 1.63 0.93 1.72 -1.39 -0.91 0.99 Word Embedding Composition

  16. Overview 1. Evaluating compositionality 2. Tuning word embeddings for 
 better algebraic composition 3. Neural methods for composing 
 word embeddings

  17. 1. Evaluating compositionality Introducing CompVecEval a method to evaluate word embeddings on their compositionality

  18. Dictionaries A pragmatic solution for word meaning

  19. cat /kat/ A small domesticated carnivorous mammal with soft fur, a short snout, and retractable claws. It is widely kept as a pet or for catching mice, and many breeds have been developed.

  20. cat /kat/ A method of examining body organs by scanning them with X-rays and using a computer to construct a series of cross-sectional scans along a single axis.

  21. person c f c x [0…2] a human being

  22. Dictionary 1. WordNet (Miller and Fellbaum 1998) 2. We use 4,119 datapoints for our evaluation method, and 72,322 datapoints for tuning

  23. Popular pretrained Word Embeddings 1. Word2Vec (Mikolov et al. 2013) 2. GloVe (Pennington et al. 2014) 3. fastText (Bojanowski et al. 2016) 4. Paragram (Wieting et al. 2015)

  24. the cat ate the mouse Word2Vec w t-2 w t-1 w t w t+1 w t+2 Skip-gram w t ate

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend