SLIDE 1
SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - - PowerPoint PPT Presentation
SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - - PowerPoint PPT Presentation
SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last time : words are vectors of observed counts How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim(
SLIDE 2
SLIDE 3
How big are these vectors?
Big vectors: the size of your vocabulary How similar are two words? sim( eat, devour ) = cosine( ) = 0.72 Problem: Lots of zeros and huge vectors!
SLIDE 4
Other Problem
Problem: Lots of zeros and huge vectors!
- we’ll shrink them
Other problem: counts still miss a lot of similarity
“Apple” “Peach” “slice” “slice” “dice” “dice”
Zero overlap on “cutting” counts!
SLIDE 5
Today’s goals
- Shrink these vectors to a reasonable size
- Optimize the vector values to be “useful” to NLP -
word prediction!
- Rather than just counting with no goal…
- Force synonyms to be similar to each other, don’t just “hope”.
- Similar to our Lab 2 goal of generation, predict your neighbor.
5
SLIDE 6
Why do we care?
- Words as vectors let us represent any span of text
6
SLIDE 7
Why do we care?
- Our input is now a vector representation
7
"The cat ate mice” “Dickens” weights “Dickens” score Logistic Regression!
SLIDE 8
Word2Vec
- Learn word embeddings (vectors) by predicting
neighboring words
- Step 1: create a random vector for each word
8
SLIDE 9
Word2Vec
- Step 2: find a huge corpus of written text
- Step 3: use each word to “predict” its neighbor
“Alice ate dinner very quickly”
9
P(“Alice”)
SLIDE 10
Word2Vec
- How to compute probabilities? Score all the words!
10
Scores Normed Probabilities
SLIDE 11
Word2Vec
- The loss function is again how far off your prediction
probability is from the correct word (“Alice”)
- How do you get high probabilities? High scores!!
- How do you get high scores?
11
When the input word embedding is similar to the target word embedding.
SLIDE 12
Why it works
- All the “food words” need to score “eat” highly. They’ll thus
adjust weights to be similar to “eat”, which means similar to each other!
- All the “action verbs” need to score adverbs like “quickly”
- higher. They’ll adjust weights to be similar to it!
- All the “people names” do people things, so need to score
words highly like “talk”, “walk”, “think”. Their vectors will slowly turn into each other!
12
SLIDE 13
An added detail…
- Make sure the training data includes negative examples
- It helps to push weights away from wrong answers
13
Positive Examples Negative Examples (Alice, ate) (Table, ate) (Puppy, ate) (Idea, ate) (Baby, ate) (The, ate) (Peacock, ate) (Paint, ate)
SLIDE 14
Examples
- Color-coded numbers, blue is negative, red positive.
14
SLIDE 15
Vector semantics?
15
SLIDE 16
Algebra with words?
16
SLIDE 17
Demo with Python’s gensim
17
SLIDE 18
Other Overviews of Word2Vec
- Blog post by Adrian Colyer
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
- The Illustrated Guide to Word2Vec
http://jalammar.github.io/illustrated-word2vec/
- The original research paper!
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and- phrases-and-their-compositionality.pdf
18