SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - - PowerPoint PPT Presentation

si425 nlp
SMART_READER_LITE
LIVE PREVIEW

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - - PowerPoint PPT Presentation

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last time : words are vectors of observed counts How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim(


slide-1
SLIDE 1

SI425 : NLP

Set 9 Word2Vec - Neural Words

Fall 2020 : Chambers

slide-2
SLIDE 2

Why are these so different?

Last time: words are vectors of observed counts

slide-3
SLIDE 3

How big are these vectors?

Big vectors: the size of your vocabulary How similar are two words? sim( eat, devour ) = cosine( ) = 0.72 Problem: Lots of zeros and huge vectors!

slide-4
SLIDE 4

Other Problem

Problem: Lots of zeros and huge vectors!

  • we’ll shrink them

Other problem: counts still miss a lot of similarity

“Apple” “Peach” “slice” “slice” “dice” “dice”

Zero overlap on “cutting” counts!

slide-5
SLIDE 5

Today’s goals

  • Shrink these vectors to a reasonable size
  • Optimize the vector values to be “useful” to NLP -

word prediction!

  • Rather than just counting with no goal…
  • Force synonyms to be similar to each other, don’t just “hope”.
  • Similar to our Lab 2 goal of generation, predict your neighbor.

5

slide-6
SLIDE 6

Why do we care?

  • Words as vectors let us represent any span of text

6

slide-7
SLIDE 7

Why do we care?

  • Our input is now a vector representation

7

"The cat ate mice” “Dickens” weights “Dickens” score Logistic Regression!

slide-8
SLIDE 8

Word2Vec

  • Learn word embeddings (vectors) by predicting

neighboring words

  • Step 1: create a random vector for each word

8

slide-9
SLIDE 9

Word2Vec

  • Step 2: find a huge corpus of written text
  • Step 3: use each word to “predict” its neighbor

“Alice ate dinner very quickly”

9

P(“Alice”)

slide-10
SLIDE 10

Word2Vec

  • How to compute probabilities? Score all the words!

10

Scores Normed Probabilities

slide-11
SLIDE 11

Word2Vec

  • The loss function is again how far off your prediction

probability is from the correct word (“Alice”)

  • How do you get high probabilities? High scores!!
  • How do you get high scores?

11

When the input word embedding is similar to the target word embedding.

slide-12
SLIDE 12

Why it works

  • All the “food words” need to score “eat” highly. They’ll thus

adjust weights to be similar to “eat”, which means similar to each other!

  • All the “action verbs” need to score adverbs like “quickly”
  • higher. They’ll adjust weights to be similar to it!
  • All the “people names” do people things, so need to score

words highly like “talk”, “walk”, “think”. Their vectors will slowly turn into each other!

12

slide-13
SLIDE 13

An added detail…

  • Make sure the training data includes negative examples
  • It helps to push weights away from wrong answers

13

Positive Examples Negative Examples (Alice, ate) (Table, ate) (Puppy, ate) (Idea, ate) (Baby, ate) (The, ate) (Peacock, ate) (Paint, ate)

slide-14
SLIDE 14

Examples

  • Color-coded numbers, blue is negative, red positive.

14

slide-15
SLIDE 15

Vector semantics?

15

slide-16
SLIDE 16

Algebra with words?

16

slide-17
SLIDE 17

Demo with Python’s gensim

17

slide-18
SLIDE 18

Other Overviews of Word2Vec

  • Blog post by Adrian Colyer

https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

  • The Illustrated Guide to Word2Vec

http://jalammar.github.io/illustrated-word2vec/

  • The original research paper!

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and- phrases-and-their-compositionality.pdf

18