Text classification I (NaΓ―ve Bayes)
CE-324: Modern Information Retrieval
Sharif University of Technology
- M. Soleymani
Spring 2020
Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
Text classification I (Nave Bayes) CE-324: Modern Information - - PowerPoint PPT Presentation
Text classification I (Nave Bayes) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Spring 2020 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Outline } Text
Sharif University of Technology
Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
} definition } relevance to information retrieval
2
3
} Docs are represented in this (typically high-dimensional) space
} Example: π· = {spam, nonβspam}
4
} You have an information need to monitor, say:
} Unrest in the Niger delta region
} You want to rerun an appropriate query periodically to find new news
} You will be sent new documents that are found
} I.e., itβs not ranking but classification (relevant vs. not relevant)
} Long used by βinformation professionalsβ } A modern mass instantiation is Google Alerts
5
3
6
From: "" <takworlld@hotmail.com> Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW ! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm =================================================
7
} A representation of a document d
} Issue: how to represent text documents. } Usually some type of high-dimensional space β bag of words
} A fixed set of classes:
} The category of d: Ξ³(d) β C
} Ξ³(d) is a classification function
} We want to build classification functions (βclassifiersβ).
8
Β§ Used by the originalYahoo! Directory Β§ Looksmart, about.com, ODP, PubMed
Β§ Means we need automatic classification methods for big
9
10
} A document d } A fixed set of classes:
} A training set D of documents each with a label in C
} A learning method or algorithm which will enable us to learn
} For a test document d, we assign it the class
11
12
7
7
} One for each unique combination of a class and a sequence of
} We would need a very, very large number of training examples to
13
} πC: length of doc π (number of tokens) } π(π’@|π·7): probability of term π’@ occurring in a doc of class π·7 } π(π·7): prior probability of class π·7.
14
} πC: length of doc π (number of tokens) } π(π’@|π·7): probability of term π’@ occurring in a doc of class π·7 } π(π·7): prior probability of class π·7.
DEFG,> H @A&
15
7
7
=> @A&
16
} π7: number of docs in class π·7 } π@,7: number of occurrence of π’@ in training docs from class π·7
V UWX
17
18
} Thus π cannot be assigned to class π
19
20
} Estimate parameters of Naive Bayes classifier
} Classifying the test doc
21
Β¨ π
n, π
n
Β¨ π
tsu = u &n π
msu = v w
Β¨ π
tsu = & &n π
msu = v w
Β¨ π
tsu = & &n π
msu = v w
} Classifying the test doc:
} π
n Γ u &n m
&n Γ & &n β 0.0003
} π
n Γ v w m
w Γ v w β 0.0001
22
23
24
} This is optimal time.
25
} Naive Bayes is terrible for correct estimation . . . } but it often performs well at choosing the correct class.
26
} Optimal if independence assumptions hold (never true for text,
} More robust to non-relevant features than some more
} More robust to concept drift (changing of definition of class
27