AUTOMATIC CLASSIFICATION:
NAÏVE BAYES
WM&R 2019/20 – 2 UNITS
- R. Basili
(many slides borrowed by: H. Schutze)
Università di Roma “Tor Vergata” Email: basili@info.uniroma2.it
1
AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. - - PowerPoint PPT Presentation
1 AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. Basili ( many slides borrowed by: H. Schutze) Universit di Roma Tor Vergata Email: basili@info.uniroma2.it 2 Summary The nature of probabilistic modeling
WM&R 2019/20 – 2 UNITS
(many slides borrowed by: H. Schutze)
Università di Roma “Tor Vergata” Email: basili@info.uniroma2.it
1
2
From: "" <takworlld@hotmail.com> Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW ! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm ================================================= 3
language or instance space.
C = {c1, c2,…, cn}
function whose domain is X that correspond to the classe(s) of C suitable for x.
4
5
MULTIMEDIA GUI GARB.COLL. SEMANTICS ML PLANNING planning temporal reasoning plan language... programming semantics language proof... learning intelligence algorithm reinforcement network... garbage collection memory
region...
“Artificial Intelligence in the Path Planning
Optimization of Mobile Agent Navigation”n
Training Data (bow): Test Data: Classes: (AI) (Programming) (HCI) ... ...
(Note: in real life there is often a hierarchy; and you may get papers on ML approaches to Garb. Coll., i.e. c is a multiclassificatio function)
“spam” : “not-spam”, “contains adult language” :“doesn’t”, “is a fake” :“it isn’t”
6
newspaper, or a social network)
7
(everything in IR query languages + accumulators)
time by a subject expert
8
(Autonomy, MSN, Yahoo!, Cortana),
9
theory.
classification.
produced
an item is available.
distribution over the possible categories given a description of an item
10 10
can be used as a joint event:
11 11
12 12
H h MAP
H h
H h
As P(X) is constant
13 13
H h ML
Task: Classify a new instance document D based on a tuple of attribute values D=(x1, x2, …, xn) into one of the classes cj C
14 14
𝑑
𝑁𝐵𝑄 = argmaxcj ∈ 𝐷 P Cj x1, x2, … , xn) =
= argmaxcj ∈ 𝐷
P(x1,x2,…,xn|cj)P(cj) P(x1,x2,…,xn)
= = argmaxcj ∈ 𝐷 P(x1, x2, … , xn|cj)P(cj)
D=(x1, x2, …, xn)=(xD
1, xD 2, …, xD n)
i) for the different properties/features i=1, …, n
1, xD 2, …, xD n | Cj) for the different tuples and classes
P(Cj | xD
1, xD 2, …, xD n) j=1, …k
15 15
D=(x1, x2, …, xn)=(xD
1, xD 2, …, xD n)
i) for the different properties/features i=1, …, n
1, xD 2, …, xD n | Cj) for the different tuples and classes
P(Cj | xD
1, xD 2, …, xD n) j=1, …k
16 16
D=(x1, x2, …, xn)=(xD
1, xD 2, …, xD n)
the content
from the text
the same word wordi ; they augment the probability P(xi)=P(X=wordi)
17 17
correspond to the outcome of a test
from the vocabulary V, whose outcomes are
wordi in the dictionary, that is one urn per word is accessed
can be written as Xi=0 or Xi=1
18 18
This is the basis for the so-called Multivariate binomial model!
D=(x1, x2, …, xn)=(xD1, xD2, …, xDn)
i) for the different properties/features i=1, …, n
1, xD 2, …, xD n | Cj) for the different tuples and classes
P(Cj | xD
1, xD 2, …, xD n) j=1, …k
19 19
training examples.
training examples was available. Naïve Bayes Conditional Independence Assumption:
20 20
O(|X|•|C|) parameters
detect term presence and are independent of each
21 21
Flu X1 X2 X5 X3 X4
fever sinus cough runnynose muscle-ache
5 2 1 5 1
… …
22 22
j j i i j i
C X1 X2 X5 X3 X4 X6
j j
23 23
D=(x1, x2, …, xn)=(xD1, xD2, …, xDn)
i) for the different properties/features i=1, …, n
1, xD 2, …, xD n | Cj) for the different tuples and classes
P(Cj | xD
1, xD 2, …, xD n) j=1, …k
24 24
P(Cj | xD
1, xD 2, …, xD n) j=1, …k
multiclassification is not applicable:
seemingly the model output the n most likely classes
25 25
) ( ) , ( ) | ( ˆ
5 5
nf C N nf C t X N nf C t X P
muscle aches?
evidence!
26 26
i i c
Flu
X1 X2 X5 X3 X4
fever sinus cough runnynose muscle-ache
5 2 1 5 1
… …
definition, can result in floating-point underflow.
computations by summing logs of probabilities rather than multiplying probabilities.
still the most probable.
27 27
positions i j i j C c NB
j
28 28
engineering/modeling decisions
Multivariate binomial (or Bernoulli) model described
Bayesian model
reproducing the classification function c : X C (or as in the multiclassification case: c : X 2C)
variables corresponds to individual words of the vocabulary and values range in the {0,1} set
adopting sums of logarithms instead of product of probabilities and by applying smoothing