Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP

Overview • Word types and tokens • Training contextual embeddings • Embeddings from Language Models (ELMo) 1

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country 3

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country Seventeen words ask, not, what, your, country, can, do, for, you, ask, what, you, can, do, for, your, country 4

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 5

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country When we say “words”, which interpretation do we mean? Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 6

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country When we say “words”, which interpretation do we mean? Seventeen Only nine words words Which of these interpretations did use when we looked ask, not, what, your, country, can, ask, can, country, do, at word embeddings? do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 7

Word types Types are abstract and unique objects – Sets or concepts – e.g. there is only one thing called laptop – Think entries in a dictionary Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 8

Word tokens Tokens are instances of the types – Usage of a concept – this laptop , my laptop , your laptop Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 9

The type-token distinction • A larger philosophical discussion – See the Stanford Encyclopedia of Philosophy for a nuanced discussion • The distinction is broadly applicable and we implicitly reason about it "We got the same gift” We got the same gift type vs We got the same gift token 10

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word – The context may inform the embeddings 11

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word – The context may inform the embeddings 12

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word to inform the embedding – We may be able to resolve word sense ambiguity 13

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar 15

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar • Capture information about how words can be used – go and went are similar, but slightly different from each other 16

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous The bank was robbed again We walked along the river bank – Sense embeddings 17

Word embeddings should… Type embeddings can • Unify superficially different words address the first two – bunny and rabbit are similar requirements • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous The bank was robbed again We walked along the river bank – Sense embeddings 18

Word embeddings should… Type embeddings can • Unify superficially different words address the first two – bunny and rabbit are similar requirements • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous Word sense can be The bank was robbed again disambiguated using We walked along the river bank the context ⇒ – Sense embeddings contextual embeddings 19

Type embeddings vs token embeddings • Type embeddings can be thought of as a lookup table – Map words to vectors independent of any context – A big matrix • Token embeddings should be functions – Construct embeddings for a word on the fly – There is no fixed “bank” embedding, the usage decides what the word vector is 20

Contextual embeddings The big new thing in 2017-18 Two popular models ELMo BERT Peters et al 2018 Devlin et al 2018 Other work in this direction: ULMFit [Howard and Ruder 2018] 21

Contextual embeddings The big new thing in 2017-18 ELMo BERT We will look at ELMo now. We will visit BERT later in the semster 22

Embeddings from Language Models (ELMo) Two key insights 1. The embedding of a word type should depend on its context But the size of the context should not be fixed – No Markov assumption • Need arbitrary context – use an bidirectional RNN – 24

Embeddings from Language Models (ELMo) Two key insights 1. The embedding of a word type should depend on its context But the size of the context should not be fixed – No Markov assumption • Need arbitrary context – use an bidirectional RNN – 2. Language models are already encoding the contextual meaning of words Use the internal states of a language model as the word – embedding 25

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 26

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 27

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context Specific training/modeling details in the paper 28

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings Hidden state of each – Two layers of BiLSTMs, but could be more BiLSTM cell = embedding for the word • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 29

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview Word types and tokens Training contextual embeddings Embeddings from Language Models (ELMo) 1 Overview Word types and tokens Training

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Potpourri Doug Woos Logistics notes Piazza!!! https://piazza.com/washington/spring2017/cse452

Questions from First Workshop Non-Routine Extraordinary Maintenance (NREX) Items How do

Classical Economics after Adam Smith David Ricardo (1772-1823) Thomas Malthus (1766-1834)

TERRIFIC, TURBULENT OR TEPID? Presented by: Elliot F. Eisenberg, Ph.D. President:

Land Type " & " to go back a slide Type " # " to see the math Description

KEY INDICATORS OVERALL CAPITALIZATION RATES Investor Surveys OAR Range Average CBRE Self

Sharing is Caring Capitalization Do Not Capitalize Do Capitalize Civil Rights movement

Board of Visitors Finance Committee Meeting December 7, 2018 Agenda CONSENT AGENDA 1. Capital

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview Word types and tokens Training contextual embeddings Embeddings from Language Models (ELMo) 1 Overview Word types and tokens Training

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Potpourri Doug Woos Logistics notes Piazza!!! https://piazza.com/washington/spring2017/cse452

Questions from First Workshop Non-Routine Extraordinary Maintenance (NREX) Items How do

Classical Economics after Adam Smith David Ricardo (1772-1823) Thomas Malthus (1766-1834)

TERRIFIC, TURBULENT OR TEPID? Presented by: Elliot F. Eisenberg, Ph.D. President:

Land Type &quot; &amp; &quot; to go back a slide Type &quot; # &quot; to see the math Description

KEY INDICATORS OVERALL CAPITALIZATION RATES Investor Surveys OAR Range Average CBRE Self

Sharing is Caring Capitalization Do Not Capitalize Do Capitalize Civil Rights movement

Board of Visitors Finance Committee Meeting December 7, 2018 Agenda CONSENT AGENDA 1. Capital

Land Type " & " to go back a slide Type " # " to see the math Description