SI425 : NLP
Set 5 Naïve Bayes Classification
Fall 2020 : Chambers
SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers - - PowerPoint PPT Presentation
SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers Motivation We want to predict something . We have some text related to this something. something = target label Y text = text features X Given X, what is the
Fall 2020 : Chambers
Alas the day! take heed of him; he stabbed me in mine own house, and that most beastly: in good faith, he cares not what mischief he does. If his weapon be out: he will foin like any devil; he will spare neither man, woman, nor child.
{ Charles Dickens, William Shakespeare, Herman Melville, Jane Austin, Homer, Leo Tolstoy }
k k y
k
5
j i j i j i
j i j i j i
k k j k i j i j i
Remaining slides adapted from Tom Mitchell.
6
7
i i Y
8
Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ?
i i Y
9
k i k i k j i i j n j
1
i k i k y new
k
… This is just a language model!
Y1 = dickens Y2 = twain P(Y1) * P(X | Y1) P(Y2) * P(X | Y2) P(X | Y1) = PY1(X) P(X | Y2) = PY2(X) Bigrams: Bigrams:
PY1(X) = ∏
i
PY1(xi|xi−1) PY2(X) = ∏
i
PY2(xi|xi−1)
11
language is where the real gains come from in NLP.
i i n
1
…
CountDICKENS(“FEAT-NAME”) / (# dickens sentences)