si425 nlp
play

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - PowerPoint PPT Presentation

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last time : words are vectors of observed counts How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim(


  1. SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers

  2. Why are these so different? Last time : words are vectors of observed counts

  3. How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim( eat, devour ) = cosine( ) = 0.72 Problem: Lots of zeros and huge vectors!

  4. Other Problem Problem: Lots of zeros and huge vectors! - we’ll shrink them Other problem: counts still miss a lot of similarity “Apple” “Peach” “dice” “slice” “dice” “slice” Zero overlap on “cutting” counts!

  5. Today’s goals • Shrink these vectors to a reasonable size • Optimize the vector values to be “useful” to NLP - word prediction! Rather than just counting with no goal… • Force synonyms to be similar to each other, don’t just “hope”. • Similar to our Lab 2 goal of generation, predict your neighbor. • 5

  6. Why do we care? • Words as vectors let us represent any span of text 6

  7. Why do we care? • Our input is now a vector representation Logistic Regression! “Dickens” weights “Dickens” score "The cat ate mice” 7

  8. Word2Vec • Learn word embeddings (vectors) by predicting neighboring words • Step 1: create a random vector for each word 8

  9. Word2Vec • Step 2: find a huge corpus of written text • Step 3: use each word to “predict” its neighbor “Alice ate dinner very quickly” P(“Alice”) 9

  10. Word2Vec • How to compute probabilities? Score all the words! Normed Scores Probabilities 10

  11. Word2Vec The loss function is again how far off your prediction • probability is from the correct word (“Alice”) How do you get high probabilities? High scores!! • How do you get high scores? • When the input word embedding is similar to the target word embedding. 11

  12. Why it works All the “food words” need to score “eat” highly. They’ll thus • adjust weights to be similar to “eat”, which means similar to each other! All the “action verbs” need to score adverbs like “quickly” • higher. They’ll adjust weights to be similar to it! All the “people names” do people things, so need to score • words highly like “talk”, “walk”, “think”. Their vectors will slowly turn into each other! 12

  13. An added detail… Make sure the training data includes negative examples • It helps to push weights away from wrong answers • Positive Negative Examples Examples (Alice, ate) (Table, ate) (Puppy, ate) (Idea, ate) (Baby, ate) (The, ate) (Peacock, ate) (Paint, ate) 13

  14. Examples • Color-coded numbers, blue is negative, red positive. 14

  15. Vector semantics? 15

  16. Algebra with words? 16

  17. Demo with Python’s gensim 17

  18. Other Overviews of Word2Vec • Blog post by Adrian Colyer https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/ • The Illustrated Guide to Word2Vec http://jalammar.github.io/illustrated-word2vec/ • The original research paper! https://papers.nips.cc/paper/5021-distributed-representations-of-words-and- phrases-and-their-compositionality.pdf 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend