Conditional Random Fields Dietrich Klakow Overview Sequence - - PowerPoint PPT Presentation
Conditional Random Fields Dietrich Klakow Overview Sequence - - PowerPoint PPT Presentation
Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks Markov Random Fields Conditional Random Fields Software example Sequence Labeling Tasks Sequence: a sentence Pierre Vinken , 61
Overview
- Sequence Labeling
- Bayesian Networks
- Markov Random Fields
- Conditional Random Fields
- Software example
Sequence Labeling Tasks
Sequence: a sentence
Pierre Vinken , 61 years
- ld
, will join the board as a nonexecutive director Nov. 29 .
POS Labels
Pierre Vinken , 61 years
- ld
, will join the board as a nonexecutive director Nov. 29 . NNP NNP , CD NNS JJ , MD VB DT NN IN DT JJ NN NNP CD .
Chunking
Task: find phrase boundaries:
Chunking
Pierre Vinken , 61 years
- ld
, will join the board as a nonexecutive director Nov. 29 . B-NP I-NP O B-NP I-NP B-ADJP O B-VP I-VP B-NP I-NP B-PP B-NP I-NP I-NP B-NP I-NP O
Named Entity Tagging
Pierre Vinken , 61 years
- ld
, will join the board as a nonexecutive director Nov. 29 . B-PERSON I-PERSON O B-DATE:AGE I-DATE:AGE I-DATE:AGE O O O O B-ORG_DESC:OTHER O O O B-PER_DESC B-DATE:DATE I-DATE:DATE O
Supertagging
Pierre Vinken , 61 years
- ld
, will join the board as a nonexecutive director Nov. 29 . N/N N , N/N N (S[adj]\NP)\NP , (S[dcl]\NP)/(S[b]\NP) ((S[b]\NP)/PP)/NP NP[nb]/N N PP/NP NP[nb]/N N/N N ((S\NP)\(S\NP))/N[num] N[num] .
Hidden Markov Model
HMM: just an Application of a Bayes Classifier
[ ]
) ... , , ... , ( max arg ) ˆ ... ˆ , ˆ (
2 1 2 1 .. , 2 1
2 1
N N N
x x x P
N
π π π π π π
π π π
=
Decomposition of Probabilities
) .. , , .. , (
2 1 2 1 N N
x x x P π π π
∏
= −
=
N i i i i i
P x P
1 1)
| ( ) | ( π π π
) | (
i i
x P π
) | (
1 − i i
P π π
: transition probability : emission probability
Graphical view HMM
X1 X2 X3 XN ……. π1 π2 π3 πN ……. Observation sequence Label sequence
Criticism
- HMMs model only limiter dependencies
come up with more flexible models come up with graphical description
Bayesian Networks
Example for Bayesian Network
) ( ) | ( ) | ( ) , | ( ) , , , ( C P C R P C S P R S W P W R S C P =
From Russel and Norvig 95 AI: A Modern Approach
Corresponding joint distribution
Naïve Bayes
∏
= D i i z
x P
1
) | (
Observations x1, …. xD are assumed to be independent
Markov Random Fields
- Undirected graphical model
- New term:
- clique in an undirected graph:
- Set of nodes such that every node is
connected to every other node
- maximal clique: there is no node that can
be added without add without destroying the clique property
Example
cliques: green and blue maximal clique: blue
Factorization
∑ ∏
∈
Ψ =
x C C C C
M
x Z ) (
∏
∈
Ψ =
M
C C C C x
Z x p ) ( 1 ) (
) ) ( ( function potential : ) ( cliques maximal all
- f
set : C C clique in nodes : ... nodes all :
C C C C M C 1
≥ Ψ Ψ x x x x x x
N
Joint distribution described by graph Normalization Z is sometimes call the partition function
Example
x1 x2 x5 x3 x4 What are the maximum cliques? Write down joint probability described by this graph white board
Energy Function
) (
) (
C
x E C C
e x
−
= Ψ
∑ =
∈
−
M C C C
x E
e Z x p
) (
1 ) (
Define Insert into joint distribution
Conditional Random Fields
Definition
Maximum random field were each random variable yi is conditioned on the complete input sequence x1, …xn
y1 y3 x yn-1 yn y2 ….. x=(x1…xn) y=(y1…yn)
Distribution
∑∑ =
= = −
−
n i N j i i j j
i x y y f
e x Z x y p
1 1 1
) , , , (
) ( 1 ) | (
λ
trained be to parameters :
j
λ
models) entropy maximum (see function feature : ) , , , (
1
i x y y f
i i j −
Distribution
Example feature functions
= = =
−
else y and y if
i 1
- i
1 ) , , , (
1 1
NNP IN i x y y f
i i
= = =
−
else x and y if
i i
1 ) , , , (
1 2
September NNP i x y y f
i i
Modeling transitions Modeling emissions
Training
- Like in maximum entropy models
Generalized iterative scaling
- Convergence:
p(y|x) is a convex function unique maximum Convergence is slow Improved algorithms exist
Define additional start symbol y0=START and stop symbol yn+1=STOP Define matrix such that
Decoding: Auxiliary Matrix
) (x M i
[ ]
∑ = =
= − − −
−
N j i i j j i i i i
i x y y f i y y y y i
e x M x M
1 1 1 1
) , , , (
) ( ) (
λ
Reformulate Probability
∏
+ =
−
=
1 1
) ( ) ( 1 ) | (
1
n i i y y
x M x Z x y p
i i
With that definition we have
) ( ).... ( ) ( ... ) (
1 2 1
1 2 1 1 2 3 1
x M x M x M x Z
n y y y y y y y y y y
n n n
+
+
∑∑∑ ∑
=
with
Use Matrix Properties
[ ]
STOP y START y n
n
x M x M x M x Z
= = +
+
=
1
, 1 2 1
) ( )... ( ) ( ) (
Use matrix product with
[ ]
∑
=
1 2 1 1 2
) ( ) ( ) ( ) (
2 1 2 1 y y y y y y y
x M x M x M x M
Software
CRF++
- See http://crfpp.sourceforge.net/
Summary
- Sequence labeling problems
- CRFs are
- flexible
- Expensive to train
- Fast to decode