Conditional Random Fields Dietrich Klakow Overview Sequence - - PowerPoint PPT Presentation

conditional random fields
SMART_READER_LITE
LIVE PREVIEW

Conditional Random Fields Dietrich Klakow Overview Sequence - - PowerPoint PPT Presentation

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks Markov Random Fields Conditional Random Fields Software example Sequence Labeling Tasks Sequence: a sentence Pierre Vinken , 61


slide-1
SLIDE 1

Conditional Random Fields

Dietrich Klakow

slide-2
SLIDE 2

Overview

  • Sequence Labeling
  • Bayesian Networks
  • Markov Random Fields
  • Conditional Random Fields
  • Software example
slide-3
SLIDE 3

Sequence Labeling Tasks

slide-4
SLIDE 4

Sequence: a sentence

Pierre Vinken , 61 years

  • ld

, will join the board as a nonexecutive director Nov. 29 .

slide-5
SLIDE 5

POS Labels

Pierre Vinken , 61 years

  • ld

, will join the board as a nonexecutive director Nov. 29 . NNP NNP , CD NNS JJ , MD VB DT NN IN DT JJ NN NNP CD .

slide-6
SLIDE 6

Chunking

Task: find phrase boundaries:

slide-7
SLIDE 7

Chunking

Pierre Vinken , 61 years

  • ld

, will join the board as a nonexecutive director Nov. 29 . B-NP I-NP O B-NP I-NP B-ADJP O B-VP I-VP B-NP I-NP B-PP B-NP I-NP I-NP B-NP I-NP O

slide-8
SLIDE 8

Named Entity Tagging

Pierre Vinken , 61 years

  • ld

, will join the board as a nonexecutive director Nov. 29 . B-PERSON I-PERSON O B-DATE:AGE I-DATE:AGE I-DATE:AGE O O O O B-ORG_DESC:OTHER O O O B-PER_DESC B-DATE:DATE I-DATE:DATE O

slide-9
SLIDE 9

Supertagging

Pierre Vinken , 61 years

  • ld

, will join the board as a nonexecutive director Nov. 29 . N/N N , N/N N (S[adj]\NP)\NP , (S[dcl]\NP)/(S[b]\NP) ((S[b]\NP)/PP)/NP NP[nb]/N N PP/NP NP[nb]/N N/N N ((S\NP)\(S\NP))/N[num] N[num] .

slide-10
SLIDE 10

Hidden Markov Model

slide-11
SLIDE 11

HMM: just an Application of a Bayes Classifier

[ ]

) ... , , ... , ( max arg ) ˆ ... ˆ , ˆ (

2 1 2 1 .. , 2 1

2 1

N N N

x x x P

N

π π π π π π

π π π

=

slide-12
SLIDE 12

Decomposition of Probabilities

) .. , , .. , (

2 1 2 1 N N

x x x P π π π

= −

=

N i i i i i

P x P

1 1)

| ( ) | ( π π π

) | (

i i

x P π

) | (

1 − i i

P π π

: transition probability : emission probability

slide-13
SLIDE 13

Graphical view HMM

X1 X2 X3 XN ……. π1 π2 π3 πN ……. Observation sequence Label sequence

slide-14
SLIDE 14

Criticism

  • HMMs model only limiter dependencies

come up with more flexible models come up with graphical description

slide-15
SLIDE 15

Bayesian Networks

slide-16
SLIDE 16

Example for Bayesian Network

) ( ) | ( ) | ( ) , | ( ) , , , ( C P C R P C S P R S W P W R S C P =

From Russel and Norvig 95 AI: A Modern Approach

Corresponding joint distribution

slide-17
SLIDE 17

Naïve Bayes

= D i i z

x P

1

) | (

Observations x1, …. xD are assumed to be independent

slide-18
SLIDE 18

Markov Random Fields

slide-19
SLIDE 19
  • Undirected graphical model
  • New term:
  • clique in an undirected graph:
  • Set of nodes such that every node is

connected to every other node

  • maximal clique: there is no node that can

be added without add without destroying the clique property

slide-20
SLIDE 20

Example

cliques: green and blue maximal clique: blue

slide-21
SLIDE 21

Factorization

∑ ∏

Ψ =

x C C C C

M

x Z ) (

Ψ =

M

C C C C x

Z x p ) ( 1 ) (

) ) ( ( function potential : ) ( cliques maximal all

  • f

set : C C clique in nodes : ... nodes all :

C C C C M C 1

≥ Ψ Ψ x x x x x x

N

Joint distribution described by graph Normalization Z is sometimes call the partition function

slide-22
SLIDE 22

Example

x1 x2 x5 x3 x4 What are the maximum cliques? Write down joint probability described by this graph white board

slide-23
SLIDE 23

Energy Function

) (

) (

C

x E C C

e x

= Ψ

∑ =

M C C C

x E

e Z x p

) (

1 ) (

Define Insert into joint distribution

slide-24
SLIDE 24

Conditional Random Fields

slide-25
SLIDE 25

Definition

Maximum random field were each random variable yi is conditioned on the complete input sequence x1, …xn

y1 y3 x yn-1 yn y2 ….. x=(x1…xn) y=(y1…yn)

slide-26
SLIDE 26

Distribution

∑∑ =

= = −

n i N j i i j j

i x y y f

e x Z x y p

1 1 1

) , , , (

) ( 1 ) | (

λ

trained be to parameters :

j

λ

models) entropy maximum (see function feature : ) , , , (

1

i x y y f

i i j −

Distribution

slide-27
SLIDE 27

Example feature functions

   = = =

else y and y if

i 1

  • i

1 ) , , , (

1 1

NNP IN i x y y f

i i

   = = =

else x and y if

i i

1 ) , , , (

1 2

September NNP i x y y f

i i

Modeling transitions Modeling emissions

slide-28
SLIDE 28

Training

  • Like in maximum entropy models

Generalized iterative scaling

  • Convergence:

p(y|x) is a convex function unique maximum Convergence is slow Improved algorithms exist

slide-29
SLIDE 29

Define additional start symbol y0=START and stop symbol yn+1=STOP Define matrix such that

Decoding: Auxiliary Matrix

) (x M i

[ ]

∑ = =

= − − −

N j i i j j i i i i

i x y y f i y y y y i

e x M x M

1 1 1 1

) , , , (

) ( ) (

λ

slide-30
SLIDE 30

Reformulate Probability

+ =

=

1 1

) ( ) ( 1 ) | (

1

n i i y y

x M x Z x y p

i i

With that definition we have

) ( ).... ( ) ( ... ) (

1 2 1

1 2 1 1 2 3 1

x M x M x M x Z

n y y y y y y y y y y

n n n

+

+

∑∑∑ ∑

=

with

slide-31
SLIDE 31

Use Matrix Properties

[ ]

STOP y START y n

n

x M x M x M x Z

= = +

+

=

1

, 1 2 1

) ( )... ( ) ( ) (

Use matrix product with

[ ]

=

1 2 1 1 2

) ( ) ( ) ( ) (

2 1 2 1 y y y y y y y

x M x M x M x M

slide-32
SLIDE 32

Software

slide-33
SLIDE 33

CRF++

  • See http://crfpp.sourceforge.net/
slide-34
SLIDE 34

Summary

  • Sequence labeling problems
  • CRFs are
  • flexible
  • Expensive to train
  • Fast to decode