SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers - PowerPoint PPT Presentation

Apr 23, 2023 •458 likes •653 views

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last time : words are vectors of observed counts How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim(

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers
Why are these so different? Last time : words are vectors of observed counts
How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim( eat, devour ) = cosine( ) = 0.72 Problem: Lots of zeros and huge vectors!
Other Problem Problem: Lots of zeros and huge vectors! - we’ll shrink them Other problem: counts still miss a lot of similarity “Apple” “Peach” “dice” “slice” “dice” “slice” Zero overlap on “cutting” counts!
Today’s goals • Shrink these vectors to a reasonable size • Optimize the vector values to be “useful” to NLP - word prediction! Rather than just counting with no goal… • Force synonyms to be similar to each other, don’t just “hope”. • Similar to our Lab 2 goal of generation, predict your neighbor. • 5
Why do we care? • Words as vectors let us represent any span of text 6
Why do we care? • Our input is now a vector representation Logistic Regression! “Dickens” weights “Dickens” score "The cat ate mice” 7
Word2Vec • Learn word embeddings (vectors) by predicting neighboring words • Step 1: create a random vector for each word 8
Word2Vec • Step 2: find a huge corpus of written text • Step 3: use each word to “predict” its neighbor “Alice ate dinner very quickly” P(“Alice”) 9
Word2Vec • How to compute probabilities? Score all the words! Normed Scores Probabilities 10
Word2Vec The loss function is again how far off your prediction • probability is from the correct word (“Alice”) How do you get high probabilities? High scores!! • How do you get high scores? • When the input word embedding is similar to the target word embedding. 11
Why it works All the “food words” need to score “eat” highly. They’ll thus • adjust weights to be similar to “eat”, which means similar to each other! All the “action verbs” need to score adverbs like “quickly” • higher. They’ll adjust weights to be similar to it! All the “people names” do people things, so need to score • words highly like “talk”, “walk”, “think”. Their vectors will slowly turn into each other! 12
An added detail… Make sure the training data includes negative examples • It helps to push weights away from wrong answers • Positive Negative Examples Examples (Alice, ate) (Table, ate) (Puppy, ate) (Idea, ate) (Baby, ate) (The, ate) (Peacock, ate) (Paint, ate) 13
Examples • Color-coded numbers, blue is negative, red positive. 14
Vector semantics? 15
Algebra with words? 16
Demo with Python’s gensim 17
Other Overviews of Word2Vec • Blog post by Adrian Colyer https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/ • The Illustrated Guide to Word2Vec http://jalammar.github.io/illustrated-word2vec/ • The original research paper! https://papers.nips.cc/paper/5021-distributed-representations-of-words-and- phrases-and-their-compositionality.pdf 18

Recommend

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly Most top-tier universities now have NLP faculty (Stanford, Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc) Commercial NLP hiring: Google,

541 views • 25 slides

SI425 : NLP Set 14 Neural NLP Fall 2020 : Chambers Why are these so different? Last time :

SI425 : NLP Set 14 Neural NLP Fall 2020 : Chambers Why are these so different? Last time : Word2Vec learned word embeddings This time : use word embeddings as input to classifiers Recall: Logistic Regression x = it was the best of times it

740 views • 44 slides

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly Most top-tier universities now have NLP faculty (Stanford, Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc) Commercial NLP hiring: Google,

380 views • 24 slides

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill MacCartney Distributional methods Firth (1957) You shall know a word by the company it keeps! Example from Nida (1975) noted by Lin: A

603 views • 19 slides

SI425 : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney Three

SI425 : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney Three levels of meaning 1. Lexical Semantics (words) 2. Sentential / Compositional / Formal Semantics 3. Discourse or Pragmatics meaning +

380 views • 34 slides

SI425 : NLP Set 13 Information Extraction Information Extraction Yesterday GM released third

SI425 : NLP Set 13 Information Extraction Information Extraction Yesterday GM released third quarter GM profit-increase 10% results showing a 10% in profit over the same period last year. John Doe was convicted Tuesday on John Doe

609 views • 41 slides

SI425 : NLP Set 4 Smoothing Language Models Fall 2017 : Chambers Review: evaluating n-gram

SI425 : NLP Set 4 Smoothing Language Models Fall 2017 : Chambers Review: evaluating n-gram models Best evaluation for an N-gram Put model A in a speech recognizer Run recognition, get word error rate (WER) for A Put model B in

816 views • 27 slides

SI425 : NLP Set 3 Language Models Fall 2017 : Chambers Language Modeling Which sentence is

SI425 : NLP Set 3 Language Models Fall 2017 : Chambers Language Modeling Which sentence is most likely (most probable)? I saw this dog running across the street. Saw dog this I running across street the. Why? You have a language model in

835 views • 30 slides

SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers Motivation We want to

SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers Motivation We want to predict something . We have some text related to this something. something = target label Y text = text features X Given X, what is the

459 views • 15 slides

SI425 Natural Language Processing Set 1 Intro to NLP Fall 2020: Chambers Assumptions about

SI425 Natural Language Processing Set 1 Intro to NLP Fall 2020: Chambers Assumptions about You You know how to program Python (or are capable of learning) basic UNIX usage basic probability and statistics (well also

395 views • 11 slides

SI425 : NLP Set 8 Words as Vectors (distributional similarity) Fall 2020 : Chambers some

SI425 : NLP Set 8 Words as Vectors (distributional similarity) Fall 2020 : Chambers some slides adapted from Dan Jurafsky and Bill MacCartney Why are these so different? P( ball | threw, the) = 0.12 P( baseball | threw, the) = 0.01 P( ran |

1.22k views • 20 slides

SI425 : NLP Set 4 Smoothing Language Models Fall 2020 : Chambers Review: evaluating n-gram

SI425 : NLP Set 4 Smoothing Language Models Fall 2020 : Chambers Review: evaluating n-gram models Best evaluation for an N-gram Put model A in a speech recognizer Run recognition, get word error rate (WER) for A Put model B in

537 views • 28 slides

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax:

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax: The kind of implicit knowledge of your native language that you had mastered by the time you were 3 years old Not the kind of stuff you were later

855 views • 57 slides

SI425 : NLP Set 2 Probability Review Fall 2020 : Chambers help me make a new rumor

SI425 : NLP Set 2 Probability Review Fall 2020 : Chambers help me make a new rumor Probabilistic language models P( today | the September Plan ) Review of Probability Experiment (trial) Repeatable procedure with well-defined

540 views • 22 slides

SI425 : NLP Set 6 Logistic Regression Fall 2020 : Chambers Last time Naive Bayes Classifier

SI425 : NLP Set 6 Logistic Regression Fall 2020 : Chambers Last time Naive Bayes Classifier Given X, what is the most probable Y? Y arg max P ( Y y ) P ( X | Y y ) = = new k i k y k i Problems with

571 views • 17 slides

SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit

SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit knowledge of your native language that you had mastered by the time you were 3 years old Not the kind of stuff you were later taught in

677 views • 57 slides

Networks and Governance in Broadcasting Stephen Pratten Department of Management Kings

Networks and Governance in Broadcasting Stephen Pratten Department of Management Kings College London CBR Summit: 29-30 March 2006 Innovation and Governance Motivation Network forms of organisation have been identified within the

350 views • 19 slides

Control Structures 1 / 17 Structured Programming Any algorithm can be expressed by: Sequence

Control Structures 1 / 17 Structured Programming Any algorithm can be expressed by: Sequence - one statement after another Selection - conditional execution (not conditional jumping) Repetition - loops Weve already seen sequences

585 views • 17 slides

Black Hole X-Ray Sources in Extragalactic Globular Clusters Steve Zepf Michigan State University

Black Hole X-Ray Sources in Extragalactic Globular Clusters Steve Zepf Michigan State University and Arunav Kundu and Tom Maccarone also Mark Peacock, Katherine Rhode, and Matt Steele Accreting stellar mass black holes have been found in

634 views • 17 slides

A Search Theory of the Peacocks Tail Balzs Szentes May 5, 2012 Literature 1. Costly

A Search Theory of the Peacocks Tail Balzs Szentes May 5, 2012 Literature 1. Costly Signaling 2. Social Assets Postlewaite and Mailath (2006) Model males differ in a binary attribute { a, d } females differ in endowment E U

486 views • 22 slides

EAP TEACHER MOTIVATION AND THE GLOBAL SPREAD OF ENGLISH Gosia Sky University of Warwick

EAP TEACHER MOTIVATION AND THE GLOBAL SPREAD OF ENGLISH Gosia Sky University of Warwick gosia.sky@hotmail.co.uk INTRODUCTION 1. EAP, nativespeakerism & internationalisation 2. Teacher motivation a hole in the wall? 3.

447 views • 18 slides

Structure growth & redshi0-space distor4ons around voids Yan-Chuan Cai Y. Cai, A. Taylor, J.

Structure growth & redshi0-space distor4ons around voids Yan-Chuan Cai Y. Cai, A. Taylor, J. Peacock, N. Pad illa, arXiv:1603.05184 V. Demchenko, Y. Cai, C. Heymans & J. Peacock, arXiv:1605.05286 XII th Rencontres du Vietnam, Large Scale

630 views • 31 slides

Learning to Assess: A Professional Development Model for Librarians Dr. Corinne Laverty

Learning to Assess: A Professional Development Model for Librarians Dr. Corinne Laverty Teaching & Learning Specialist & Librarian Queens University Centre for Teaching & Learning Queens University in Kingston Ontario:

611 views • 23 slides

Proverbs 8:1-11 Author (s) King Solomon 1 Kings 4:32 - He composed 3,000 proverbs and 1,005

Proverbs 8:1-11 Author (s) King Solomon 1 Kings 4:32 - He composed 3,000 proverbs and 1,005 songs. The men of Hezekiah (proverbs 25:1) where a group of scholars in the days of King Hezekiah (700 BC) who compiled the material recorded in

320 views • 17 slides