1
Natural Language Processing
Classification I
Dan Klein – UC Berkeley
Natural Language Processing Classification I Dan Klein UC Berkeley - - PowerPoint PPT Presentation
Natural Language Processing Classification I Dan Klein UC Berkeley 1 2 Classification Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example: image
1
Dan Klein – UC Berkeley
2
3
4
INPUTS CANDIDATES FEATURE VECTORS
close the ____
CANDIDATE SET
y occurs in x “close” in x y=“door” x‐1=“the” y=“door”
TRUE OUTPUTS
{door, table, …} table door
x‐1=“the” y=“table”
5
6
xi = “Apple Computers”
7
are multiplied by outputs to form the candidates
“win” “election”
… win the election … … win the election … … win the election …
8
decomposed in this regular way
present in the tree
S NP VP V N N S NP VP N V N S NP VP NP N N VP V NP N VP V N
9
10
… win the election … … win the election … … win the election … … win the election …
11
… win the election … … win the election … … win the election … … win the election … … win the election … … win the election …
12
BIAS : -3 free : 4 money : 2 1 1 2 free money +1 = SPAM
13
complex
14
15
weights are (log) local conditional probabilities
backed by understanding of modeling
criterion
classification aren’t the ones which best describe the data
16
accuracy / F1 / whatever
accuracy?
Though, min-error training for MT does exactly this.
17
negatives or Hamming distance over structured labels)
18
19
xi = “Apple Computers”
20
20
21
parameters classify it perfectly
separable, perceptron will separate (binary case)
number of mistakes (binary case) related to the margin or degree of separability Separable Non-Separable
22
22
23
usually rises, then falls
source of overfitting, but it can be important
separable, weights often thrash around
help (averaged perceptron)
separating solution
24
feasible space
25
26
27
27
28
m
29
that mistake by a margin mi(y) (with zero‐one loss)
30
31
remember how this diagram is broken!)
Support vectors
32
take any separating w and scale up our margin
Remember this condition?
33
noisy examples, resulting in a soft margin classifier ξi ξi
34
knob
Note: exist other choices of how to penalize slacks!
35
36
37
Make positive Normalize
38
cases…
39
40
41
42
43
gaining objective once the true label wins by enough
SVM objective
decent (e.g. Pegasos: Shalev‐Shwartz et al 07)
Plot really only right in binary case
44
You can make this zero … but not this one
45
46
47
48
NB FACTORS:
Raining Sunny
P(+,+,r) = 3/8 P(+,+,s) = 1/8
Reality
P(-,-,r) = 1/8 P(-,-,s) = 3/8
Raining? M1 M2
NB Model
P(r,+,+) = (½)(¾)(¾) P(s,+,+) = (½)(¼)(¼) P(r|+,+) = 9/10 P(s|+,+) = 1/10
49
Lights Working Lights Broken P(g,r,w) = 3/7 P(r,g,w) = 3/7 P(r,r,b) = 1/7 Working? NS EW
NB Model Reality NB FACTORS:
P(b) = 1/7 P(r|b) = 1 P(g|b) = 0
50
= 1/7 = 4/28
= 6/28 = 6/28
= 1/2 = 4/8
= 1/8 = 1/8