1
Sequential Data Modeling – Conditional Random Fields
Sequential Data Modeling - Conditional Random Fields
Graham Neubig Nara Institute of Science and Technology (NAIST)
Sequential Data Modeling - Conditional Random Fields Graham Neubig - - PowerPoint PPT Presentation
Sequential Data Modeling Conditional Random Fields Sequential Data Modeling - Conditional Random Fields Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Sequential Data Modeling Conditional Random Fields Prediction
1
Sequential Data Modeling – Conditional Random Fields
Graham Neubig Nara Institute of Science and Technology (NAIST)
2
Sequential Data Modeling – Conditional Random Fields
3
Sequential Data Modeling – Conditional Random Fields
A book review Oh, man I love this book! This book is so boring... Is it positive? yes no
Binary Prediction (2 choices)
A tweet On the way to the park! 公園に行くなう! Its language English Japanese
Multi-class Prediction (several choices)
A sentence I read a book Its parts-of-speech
Structured Prediction (millions of choices)
I read a book
DET NN VBD N
4
Sequential Data Modeling – Conditional Random Fields
5
Sequential Data Modeling – Conditional Random Fields
Given
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.
No!
6
Sequential Data Modeling – Conditional Random Fields
and negative if it indicates “no”
contains “priest” contains “(<#>-<#>)” contains “site” contains “Kyoto Prefecture” wcontains “priest” = 2 wcontains “(<#>-<#>)” = 1 wcontains “site” = -3 wcontains “Kyoto Prefecture” = -1 Kuya (903-972) was a priest born in Kyoto Prefecture.
2 + -1 + 1 = 2
7
Sequential Data Modeling – Conditional Random Fields
I
8
Sequential Data Modeling – Conditional Random Fields
5 10 0.5 1 w*phi(x) p ( y | x )
In other words:
9
Sequential Data Modeling – Conditional Random Fields
5 10 0.5 1 w*phi(x) p ( y | x )
function used in the perceptron
5 10 0.5 1 w*phi(x) p ( y | x )
Perceptron Logistic Function
w⋅ ϕ( x)
w⋅ϕ(x)
10
Sequential Data Modeling – Conditional Random Fields
likelihood of all answers yi given the example xi
w
11
Sequential Data Modeling – Conditional Random Fields
create map w for I iterations for each labeled pair x, y in the data phi = create_features(x) y' = predict_one(w, phi) if y' != y w += y * phi
12
Sequential Data Modeling – Conditional Random Fields
(including logistic regression) create map w for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
(the direction that will increase the probability of y)
13
Sequential Data Modeling – Conditional Random Fields
5 10 0.2 0.4 w*phi(x) d p ( y | x ) / d w * p h i ( x )
d d w P( y=1∣x) = d d w e
w⋅ ϕ( x)
1+e
w⋅ϕ(x)
= ϕ(x) e
w⋅ϕ(x)
(1+e
w⋅ϕ(x)) 2
d d w P( y=−1∣x) = d d w (1− e
w⋅ϕ(x)
1+e
w⋅ϕ(x))
= −ϕ (x) e
w⋅ϕ(x)
(1+e
w⋅ϕ(x)) 2
14
Sequential Data Modeling – Conditional Random Fields
x = A site , located in Maizuru , Kyoto y = -1
wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “,” = -0.5 wunigram “located” = -0.25 wunigram “in” = -0.25 wunigram “Maizuru” = -0.25 wunigram “Kyoto” = -0.25
d d w P( y=−1∣x) = − e (1+e
0) 2 ϕ(x)
= −0.25ϕ(x)
15
Sequential Data Modeling – Conditional Random Fields
x = Shoken , monk born in Kyoto y = 1
wunigram “A” = -0.25 wunigram “site” = -0.25 wunigram “,” = -0.304 wunigram “located” = -0.25 wunigram “in” = -0.054 wunigram “Maizuru” = -0.25 wunigram “Kyoto” = -0.054
wunigram “Shoken” = 0.196 wunigram “monk” = 0.196 wunigram “born” = 0.196
d d w P( y=1∣x) = e
1
(1+e
1) 2 ϕ(x)
= 0.196ϕ(x)
16
Sequential Data Modeling – Conditional Random Fields
17
Sequential Data Modeling – Conditional Random Fields
y∈{−1,+1}
I visited Nara PRN VBD NNP Xi Yi
18
Sequential Data Modeling – Conditional Random Fields
φ( )*w=1 φ( )*w=2 φ( )*w=0
time flies N V
φT,<S>,N=1 φT,N,V=1 φT,V,</S>=1 φE,N,time=1 φE,V,flies=1
time flies V N
φT,<S>,V=1 φT,V,N=1 φT,N,</S>=1 φE,V,time=1 φE,N,flies=1
time flies N N
φT,<S>,N=1 φT,N,N=1 φT,N,</S>=1 φE,N,time=1 φE,N,flies=1
time flies V V
φT,<S>,V=1 φT,V,V=1 φT,V,</S>=1 φE,V,time=1 φE,V,flies=1 wT,<S>,N=1 wE,N,time=1 wT,V,</S>=1
time flies N V φ( )*w=3 time flies V N time flies N N time flies V V
19
Sequential Data Modeling – Conditional Random Fields
exp(φ( )*w)=2.72 exp(φ( )*w)=7.39 exp(φ( )*w)=1.00
normalizing (the Softmax function)
time flies N V exp(φ( )*w)=20.08 time flies V N time flies N N time flies V V
w⋅ϕ(Y , X )
Y e w⋅ϕ( ̃ Y , X )
P(N V | time flies)=.6437 P(N N | time flies)=.2369 P(V V | time flies)=0.0872 P(V N | time flies)=0.0320
20
Sequential Data Modeling – Conditional Random Fields
<S> N V N V </S> time flies
φT,<S>,N=1 φT,<S>,V=1 φE,N,time=1 φE,V,time=1 φT,N,N=1 φT,V,V=1 φT,V,N=1 φT,N,V=1 φE,V,flies=1 φE,N,flies=1 φE,N,flies=1 φE,V,flies=1 φT,V,</S>=1 φT,N,</S>=1
21
Sequential Data Modeling – Conditional Random Fields
<S> N V N V </S> time flies
ew*φ=7.39 P=.881 ew*φ=1.00 P=.119 ew*φ=1.00 P=.237 ew*φ=1.00 P=.032 ew*φ=1.00 P=.644 ew*φ=2.72 P=.731 ew*φ=1.00 P=.269 ew*φ=1.00 P=.087
22
Sequential Data Modeling – Conditional Random Fields
23
Sequential Data Modeling – Conditional Random Fields
w⋅ϕ(Y , X )
Y e w⋅ϕ( ̃ Y , X )
Y e w⋅ ϕ( ̃ Y , X)
d d w log P(Y∣X)
̂ w=argmax
w
24
Sequential Data Modeling – Conditional Random Fields
Y e w⋅ϕ( ̃ Y , X )
Y e w⋅ϕ( ̃ Y , X)
Y
w⋅ϕ( ̃ Y , X )
Y
w⋅ϕ( ̃ Y , X)
Y P( ̃
25
Sequential Data Modeling – Conditional Random Fields
Y P( ̃
add the correct feature vector subtract the expectation of the features
26
Sequential Data Modeling – Conditional Random Fields
time flies N V
time flies V N
φT,<S>,V=1 φT,V,N=1 φT,N,</S>=1 φE,V,time=1 φE,N,flies=1
time flies N N
φT,<S>,N=1 φT,N,N=1 φT,N,</S>=1 φE,N,time=1 φE,N,flies=1
time flies V V
φT,<S>,V=1 φT,V,V=1 φT,V,</S>=1 φE,V,time=1 φE,V,flies=1
P=.644 P=.237 P=.087 P=.032
φT,<S>,N, φE,N,time = 1-.644-.237 = .119 φT,N,V = 1-.644 = .356 φT,V,</S>, φE,V,flies = 1-.644-.087 = .269 φT,V,N = 0-.032 = -.032 φT,N,N = 0-.237 = -.237 φT,V,V = 0-.087 = -.087 φT,<S>,V, φE,V,time = 0-.032-.087 = -.119 φT,N,</S>, φE,V,flies = 0-.032-.237 = -.269
27
Sequential Data Modeling – Conditional Random Fields
Y P( ̃
O(T|X|)
T = number of tags
28
Sequential Data Modeling – Conditional Random Fields
<S> N V time
φT,<S>,N=1 φT,<S>,V=1 φE,N,time=1 φE,V,time=1
ew*φ=7.39 P=.881 ew*φ=1.00 P=.119 φT,<S>,N, φE,N,time = 1-.881 = .119 φT,<S>,V, φE,V,time = 0-.119 = -.119 φT,<S>,N, φE,N,time = 1-.644-.237 = .119 φT,<S>,V, φE,V,time = 0-.032-.087 = -.119
Same answer as when we explicitly expand all Y!
29
Sequential Data Modeling – Conditional Random Fields
create map w for I iterations for each labeled pair X, Y in the data gradient = φ(Y,X) calculate eφ(y,x)*w for each edge run forward-backward algorithm to get P(edge) for each edge gradient -= P(edge)*φ(edge) w += α * gradient
regression
30
Sequential Data Modeling – Conditional Random Fields
31
Sequential Data Modeling – Conditional Random Fields
create map w for I iterations for each labeled pair x, y in the data w += α * dP(y|x)/dw
Online Stochastic Gradient Descent
create map w for I iterations for each labeled pair x, y in the data gradient += α * dP(y|x)/dw w += gradient
Batch Stochastic Gradient Descent
32
Sequential Data Modeling – Conditional Random Fields
derivatives (the Hessian matrix)
algorithm (L-BFGS):
http://homes.cs.washington.edu/~galen/files/quasi- newton-notes.pdf
33
Sequential Data Modeling – Conditional Random Fields
34
Sequential Data Modeling – Conditional Random Fields
35
Sequential Data Modeling – Conditional Random Fields
+1 he saw a robbery in the park Classifier 1 he +3 saw
a +0.5 bird -1 robbery +1 in +5 the -3 park -2 Classifier 2 bird -1 robbery +1
36
Sequential Data Modeling – Conditional Random Fields
+1 he saw a robbery in the park Classifier 1 he +3 saw
a +0.5 bird -1 robbery +1 in +5 the -3 park -2 Classifier 2 bird -1 robbery +1 Probably classifier 2! It doesn't use irrelevant information.
37
Sequential Data Modeling – Conditional Random Fields
small penalty on small weights
become zero → small model
1 2 1 2 3 4 5 L2 L1
38
Sequential Data Modeling – Conditional Random Fields
penalty to the log likelihood (for the whole corpus)
̂ w=argmax
w
(∏i P(Y i∣X i; w))−c∑w∈w w
2
L2 Regularization
39
Sequential Data Modeling – Conditional Random Fields
40
Sequential Data Modeling – Conditional Random Fields
discriminative prediction models
41
Sequential Data Modeling – Conditional Random Fields