1
NLP Programming Tutorial 6 – Advanced Discriminative Learning
NLP Programming Tutorial 6 - Advanced Discriminative Learning
Graham Neubig Nara Institute of Science and Technology (NAIST)
NLP Programming Tutorial 6 - Advanced Discriminative Learning - - PowerPoint PPT Presentation
NLP Programming Tutorial 6 Advanced Discriminative Learning NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 6 Advanced
1
NLP Programming Tutorial 6 – Advanced Discriminative Learning
Graham Neubig Nara Institute of Science and Technology (NAIST)
2
NLP Programming Tutorial 6 – Advanced Discriminative Learning
3
NLP Programming Tutorial 6 – Advanced Discriminative Learning
4
NLP Programming Tutorial 6 – Advanced Discriminative Learning
Give n
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.
No!
5
NLP Programming Tutorial 6 – Advanced Discriminative Learning
I
i( x))
6
NLP Programming Tutorial 6 – Advanced Discriminative Learning
create map w for I iterations for each labeled pair x, y in the data phi = create_features(x) y' = predict_one(w, phi) if y' != y update_weights(w, phi, y)
7
NLP Programming Tutorial 6 – Advanced Discriminative Learning
– Features for positive examples get a higher weight
– Features for negative examples get a lower weight
→ Every time we update, our predictions get better!
update_weights(w, phi, y) for name, value in phi: w[name] += value * y
8
NLP Programming Tutorial 6 – Advanced Discriminative Learning
9
NLP Programming Tutorial 6 – Advanced Discriminative Learning
5 10 0.5 1 w*phi(x) p(y|x)
In other words:
10
NLP Programming Tutorial 6 – Advanced Discriminative Learning
5 10 0.5 1 w*phi(x) p(y|x)
function used in the perceptron
5 10 0.5 1 w*phi(x) p(y|x)
Perceptron Logistic Function
w⋅ ϕ( x)
w⋅ϕ(x)
11
NLP Programming Tutorial 6 – Advanced Discriminative Learning
likelihood of all answers yi given the example xi
w
12
NLP Programming Tutorial 6 – Advanced Discriminative Learning
(including logistic regression) create map w for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
(the direction that will increase the probability of y)
13
NLP Programming Tutorial 6 – Advanced Discriminative Learning
5 10 0.1 0.2 0.3 0.4 w*phi(x) dp(y|x)/dw*phi(x)
d d w P( y=1∣x) = d d w e
w⋅ ϕ( x)
1+e
w⋅ϕ(x)
= ϕ(x) e
w⋅ϕ(x)
(1+e
w⋅ϕ(x)) 2
d d w P( y=−1∣x) = d d w (1− e
w⋅ϕ(x)
1+e
w⋅ϕ(x))
= −ϕ(x) e
w⋅ϕ(x)
(1+e
w⋅ϕ(x)) 2
14
NLP Programming Tutorial 6 – Advanced Discriminative Learning
x = A site , located in Maizuru , Kyoto y = -1
wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “,” = -0.5 wunigram “located” = -0.25 wunigram “in” = -0.25 wunigram “Maizuru” = -0.25 wunigram “Kyoto” = -0.25
d d w P( y=−1∣x) = − e (1+e
0) 2 ϕ(x)
= −0.25ϕ(x)
15
NLP Programming Tutorial 6 – Advanced Discriminative Learning
x = Shoken , monk born in Kyoto y = 1
wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “,” = -0.304 wunigram “located” = -0.25 wunigram “in” = -0.054 wunigram “Maizuru” = -0.25 wunigram “Kyoto” = -0.054
wunigram “Shoken” = 0.196 wunigram “monk” = 0.196 wunigram “born” = 0.196
d d w P( y=1∣x) = e
1
(1+e
1) 2 ϕ(x)
= 0.196ϕ(x)
16
NLP Programming Tutorial 6 – Advanced Discriminative Learning
when the likelihood rises
parameter number of samples
17
NLP Programming Tutorial 6 – Advanced Discriminative Learning
18
NLP Programming Tutorial 6 – Advanced Discriminative Learning
O X O X O X
19
NLP Programming Tutorial 6 – Advanced Discriminative Learning
O X O X O X
20
NLP Programming Tutorial 6 – Advanced Discriminative Learning
nearest example:
O X O X O X
21
NLP Programming Tutorial 6 – Advanced Discriminative Learning
convergence
http://disi.unitn.it/moschitti/material/Interspeech2010-Tutorial.Moschitti.pdf
LIBSVM, LIBLINEAR, SVMLite
22
NLP Programming Tutorial 6 – Advanced Discriminative Learning
under a margin create map w for I iterations for each labeled pair x, y in the data phi = create_features(x) val = w * phi * y if val <= margin update_weights(w, phi, y)
(A correct classifier will always make w * phi * y > 0) If margin = 0, this is the perceptron algorithm ★
23
NLP Programming Tutorial 6 – Advanced Discriminative Learning
24
NLP Programming Tutorial 6 – Advanced Discriminative Learning
+1 he saw a robbery in the park Classifier 1 he +3 saw
a +0.5 bird -1 robbery +1 in +5 the -3 park -2 Classifier 2 bird -1 robbery +1
25
NLP Programming Tutorial 6 – Advanced Discriminative Learning
+1 he saw a robbery in the park Classifier 1 he +3 saw
a +0.5 bird -1 robbery +1 in +5 the -3 park -2 Classifier 2 bird -1 robbery +1 Probably classifier 2! It doesn't use irrelevant information.
26
NLP Programming Tutorial 6 – Advanced Discriminative Learning
small penalty on small weights
become zero → small model
1 2 1 2 3 4 5 L2 L1
27
NLP Programming Tutorial 6 – Advanced Discriminative Learning
update_weights(w, phi, y, c) for name, value in w: if abs(value) < c: w[name] = 0 else: w[name] -= sign(value) * c for name, value in phi: w[name] += value * y
★ ★ ★ ★ ★ If abs. value < c, set weight to zero If value > 0, decrease by c If value < 0, increase by c
28
NLP Programming Tutorial 6 – Advanced Discriminative Learning
Regularization: c=0.1 Updates: {1, 0} on 1st and 5th turns {0, -1} on 3rd turn R1 U1
{0, 0}
Change: w:
{0, 0} {1, 0} {1, 0}
R2 U2 R3 U3
{-0.1, 0} {0, 0} {0.9, 0} {0.9, 0} {0.8, 0} {0, -1} {0.8, -1}
R4 U4
{-0.1, 0.1}
Change: w: {0.7, -0.9}
{0, 0}
R5 U5 R6 U6
{0, 0}
{0.7, -0.9} {0.6, -0.8}{1.6, -0.8} {1.5, -0.7}{1.5, -0.7}
{-0.1, 0} {1, 0} {-0.1, 0.1} {-0.1, 0.1}
29
NLP Programming Tutorial 6 – Advanced Discriminative Learning
This loop is VERY SLOW!
update_weights(w, phi, y, c) for name, value in w: if abs(value) <= c: w[name] = 0 else: w[name] -= sign(value) * c for name, value in phi: w[name] += value * y
30
NLP Programming Tutorial 6 – Advanced Discriminative Learning
applications getw(w, name, c, iter, last) if iter != last[name]: # regularize several times c_size = c * (iter - last[name]) if abs(w[name]) <= c_size: w[name] = 0 else: w[name] -= sign(w[name]) * c_size last[name] = iter return w[name]
31
NLP Programming Tutorial 6 – Advanced Discriminative Learning
32
NLP Programming Tutorial 6 – Advanced Discriminative Learning
33
NLP Programming Tutorial 6 – Advanced Discriminative Learning
regularization constant 0.001
perceptron
34
NLP Programming Tutorial 6 – Advanced Discriminative Learning