Word Meaning: Distributional Representations & Word Sense Disambiguation
CMSC 723 / LING 723 / INST 725 Marine Carpuat
Slides credit: Dan Jurafsky
Word Meaning: Distributional Representations & Word Sense - - PowerPoint PPT Presentation
Word Meaning: Distributional Representations & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Dan Jurafsky Reminders Read the syllabus Make sure you have access to piazza Get started on
CMSC 723 / LING 723 / INST 725 Marine Carpuat
Slides credit: Dan Jurafsky
meaning?
meaning, which one is used in a specific context?
“fast” is similar to “rapid” “tall” is similar to “height” Question answering: Q: “How tall is Mt. Everest?” Candidate A: “The official height of Mount Everest is 29029 feet”
Kulkarni, Al-Rfou, Perozzi, Skiena 2015
environments”
say that they are synonyms.”
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117
aardvark computer data pinch result sugar … apricot 1 1 pineapple 1 1 digital 2 1 1 information 1 6 4
sugar, a sliced lemon, a tablespoonful of apricot preserve or jam, a pinch each of, their enjoyment. Cautiously she sampled her first pineapple and another fruit whose taste she likened well suited to programming on the digital computer. In finding the optimal R-stage policy from for the purpose of gathering data and information necessary for the study authorized in the
… …
± 1-3 very syntacticy
± 4-10 more semanticy
association between words
context word is particularly informative about the target word.
Pointwise mutual information: Do events x and y co-occur more than if they were independent? PMI between two words: (Church & Hanks 1989) Do words x and y co-occur more than if they were independent?
PMI 𝑥𝑝𝑠𝑒), 𝑥𝑝𝑠𝑒+ = log+
0(23456,23457) 0 23456 0(23457)
PMI(X,Y) = log2 P(x,y) P(x)P(y)
PPMI 𝑥𝑝𝑠𝑒), 𝑥𝑝𝑠𝑒+ = max log+ 𝑄(𝑥𝑝𝑠𝑒), 𝑥𝑝𝑠𝑒+) 𝑄 𝑥𝑝𝑠𝑒) 𝑄(𝑥𝑝𝑠𝑒+) , 0
(words) and C columns (contexts)
context cj
pij = fij fij
j=1 C
∑
i=1 W
∑
pi* = fij
j=1 C
∑
fij
j=1 C
∑
i=1 W
∑
p* j = fij
i=1 W
∑
fij
j=1 C
∑
i=1 W
∑
pmiij = log2 pij pi*p* j ppmiij = pmiij if pmiij > 0
! " # $ #
p(w=information,c=data) = p(w=information) = p(c=data) =
p(w,context) p(w) computer data pinch result sugar apricot 0.00 0.00 0.05 0.00 0.05 0.11 pineapple 0.00 0.00 0.05 0.00 0.05 0.11 digital 0.11 0.05 0.00 0.05 0.00 0.21 information 0.05 0.32 0.00 0.21 0.00 0.58 p(context) 0.16 0.37 0.11 0.26 0.11 = .32 6/19 11/19 = .58 7/19 = .37
pij = fij fij
j=1 C
i=1 W
p(wi) = fij
j=1 C
∑
N p(cj) = fij
i=1 W
∑
N
pmiij = log2 pij pi*p* j p(w,context) p(w) computer data pinch result sugar apricot 0.00 0.00 0.05 0.00 0.05 0.11 pineapple 0.00 0.00 0.05 0.00 0.05 0.11 digital 0.11 0.05 0.00 0.05 0.00 0.21 information 0.05 0.32 0.00 0.21 0.00 0.58 p(context) 0.16 0.37 0.11 0.26 0.11
PPMI(w,context) computer data pinch result sugar apricot 1 1 2.25 1 2.25 pineapple 1 1 2.25 1 2.25 digital 1.66 0.00 1 0.00 1 information 0.00 0.57 1 0.47 1
𝑄
F 𝑏 = .HH.IJ .HH.IJK.L).IJ = .97 𝑄 F 𝑐 = .L).IJ .L).IJK.L).IJ = .03
PPMIα(w,c) = max(log2 P(w,c) P(w)P
α(c),0)
P
α(c) =
count(c)α P
c count(c)α
Add#2%Smoothed%Count(w,context)
computer data pinch result sugar apricot 2 2 3 2 3 pineapple 2 2 3 2 3 digital 4 3 2 3 2 information 3 8 2 6 2
PPMI(w,context).[add22] computer data pinch result sugar apricot 0.00 0.00 0.56 0.00 0.56 pineapple 0.00 0.00 0.56 0.00 0.56 digital 0.62 0.00 0.00 0.00 0.00 information 0.00 0.58 0.00 0.37 0.00 PPMI(w,context) computer data pinch result sugar apricot 1 1 2.25 1 2.25 pineapple 1 1 2.25 1 2.25 digital 1.66 0.00 1 0.00 1 information 0.00 0.57 1 0.47 1
= # of documents with word i
wij=tfij idfi
idfi = log N dfi ! " # # $ % & &
cos( v, w) = v • w v w = v v • w w = viwi
i=1 N
vi
2 i=1 N
wi
2 i=1 N
Dot product Unit vectors
vi is the PPMI value for word v in context i wi is the PPMI value for word w in context i.
Cos(v,w) is the cosine similarity of v and w
imposed, believed, requested, correlated
meaning?
meaning, which one is used in a specific context?
http://articles.latimes.com/2013/may/20/local/la-me-ln-big-rig-crash-20130520
this purpose
senses in an uncomfortable way
Which flight serves breakfast? Which flights serve BWI? *Which flights serve breakfast and BWI?
coincidental
meaning
pronunciation
https://wordnet.princeton.edu/
in many NLP applications
Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659
http://wordnet.princeton.edu/
62% accuracy in this case!
The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet
label1 label2 label3 label4 Classifier supervised machine learning algorithm
?
unlabeled document label1? label2? label3? label4?
Testing Training
training data
Feature Functions
meaning?
meaning, which one is used in a specific context?
unsupervised disambiguation (Lesk), supervised disambiguation