Lecture 6: Representing Words
Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage: https://uclanlp.github.io/CS269-17/
1 ML in NLP
Lecture 6: Representing Words Kai-Wei Chang CS @ UCLA - - PowerPoint PPT Presentation
Lecture 6: Representing Words Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage: https://uclanlp.github.io/CS269-17/ ML in NLP 1 Bag-of-Words with N-grams v N-grams: a contiguous sequence of n tokens from a given piece of text
Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage: https://uclanlp.github.io/CS269-17/
1 ML in NLP
CS 6501: Natural Language Processing 2
http://recognize-speech.com/language-model/n-gram-model/comparison
CS 6501: Natural Language Processing 3
CS 6501: Natural Language Processing 4
CS 6501: Natural Language Processing 5
ML in NLP 6
https://books.google.com/ngrams
ML in NLP 7
v π π₯1, π₯", π₯#, β¦ , π₯, = π π₯" π₯1 π π₯# π₯" β¦ π π₯, π₯,)" = Ξ 56"
7
P(w: β£ π₯:)")
8 6501 Natural Language Processing
π₯1 is a dummy word representing βbegin of a sentenceβ
v π π₯1, βπβ, βπππβ, β¦ , βπππ’β = π βπβ π₯1 π βπππβ βπβ β¦ π βπππ’β βπβ
9 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 64 chasing following bitingβ¦
10 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46
11 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
12 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 the boy is following a rabbit
13 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a fox was chasing a bird
14 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
15 6501 Natural Language Processing
Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
π π₯1, π₯", π₯#, β¦ , π₯, = π π·(π₯") π· π₯1 π π·(π₯#) π·(π₯") β¦ π π· π₯, π· π₯,)" π(π₯"|π· π₯" π π₯# π· π₯# β¦ π(π₯,|π· π₯, ) = Ξ 56"
7
P π· w: π· π₯:)" π(π₯: β£ π· π₯: )
16 6501 Natural Language Processing
π π₯1, π₯", π₯#, β¦ , π₯, = Ξ 56"
7
P π· w: π· π₯:)" π(π₯: β£ π· π₯: )
17 6501 Natural Language Processing Cluster 3 a the Cluster 46 dog cat fox rabbit bird boy Cluster 64 is was Cluster 8 chasing following biting⦠C3 C46 C64 C8 C3 C46 a dog is chasing a cat
π π₯1, π₯", π₯#, β¦ , π₯, = Ξ 56"
7
P π· w: π· π₯:)" π(π₯: β£ π· π₯: ) v A vocabulary set π v A function π·: π β {1, 2, 3, β¦ π }
v A partition of vocabulary into k classes
v Conditional probability π(πβ² β£ π) for π, πN β 1, β¦ , π v Conditional probability π(π₯ β£ π) for π, πN β 1, β¦ , π , π₯ β π
18 6501 Natural Language Processing
7
7
\β] ππ(π, π·)
_ ππ(π, π·)
19 6501 Natural Language Processing
\β] ππ(π, π·)
7
7
#a
#a
6501 Natural Language Processing 20
See section 9.2: http://ciml.info/dl/v0_99/ciml-v0_99-ch09.pdf
_ β56" 7
e a,ab e a e(ab) + π» ' aN6" ' a6"
# a,ab β #(a,ab)
h,hb
# a β #(a)
h
e a,ab e a e(ab) = e π πN e a
21 6501 Natural Language Processing
See classnote here: http://web.cs.ucla.edu/~kwchang/teaching /NLP16/slides/classnote.pdf
6501 Natural Language Processing 22
v Create a new cluster πlo" (we have m+1 clusters) v Choose two cluster from m+1 clusters based on
6501 Natural Language Processing 23
6501 Natural Language Processing 24
6501 Natural Language Processing 25