introduction to crfs isabelle tellier 02 08 2013 plan
play

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - PowerPoint PPT Presentation

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for ? What is annotation ? inputs can be either texts ou trees or


  1. Introduction to CRFs Isabelle Tellier 02-08-2013

  2. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  3. 1. What is annotation for ? What is annotation ? – inputs can be either texts ou trees or any structure built on finite vocabulary items – annotate such a structure = associate to each of its items an output label belonging to another finite vocabulary – the structure is given and preserved

  4. 1. What is annotation for ? Exemples of text annotations – POS (“part of speech”) labeling : item = “word” annotation = morphosyntactic label (Det, N, etc.) in the text – named entities (NE), IE : item = “word” annotation = type (D for Date, E for Event, P for Place...) + position of the NE (B for Begin, I for In, O for Out) In 2016 the Olympic Games will take place in Rio de Janeiro O DB O EB EI O O O O PB PI PI – segmentation of a text into “chunks”, phrases, clauses... – segmentation of a document into sections (ex : distinguish Title, Menus, Adverts, etc. in a Web page)

  5. 1. What is annotation for ? Exemples of text annotations – Text alignment for automatic translation J’ aime le chocolat I X like X chocolate X – correspondance matrices are projected into couples of annotations J ′ aime 2 le 3 chocolat 4 I 1 like 2 chocolate 3 1 1 2 - 3 1 2 4

  6. 1. What is annotation for ? Exemples of tree annotations SENT NP VN VP . SUJ PRED OBJ VN NP PP Sligos va PRED OBJ MOD prendre pied au NP Royaume-Uni – syntactic functions, SLR (Semantic Role Labeling : agent, patient...) of a syntactic tree – label = value of an attribute in an XML node

  7. 1. What is annotation for ? Exemples of tree annotations HTML Channel BODY DelN . . . DIV . . . DelST item DelST TABLE #text DelN DelN TR DelN TD TD DelST DelN #text DIV A SPAN DIV DelST title DelN description DelST #text #text @href #text . . . 0 DelN link 0 DelST – on the left : an HTML tree – on the right : a labeling with editing operations – DelN, DelST : Delete a Node/SubTree – channel, item, title, link, description : rename a node

  8. 1. What is annotation for ? Exemples of tree annotations – execution of the editing operations HTML Channel BODY item . . . DIV . . . title link description TABLE #text TR TD TD #text DIV A SPAN DIV #text #text @href #text . . . – implemented application : generations of RSS feeds from HTML pages – other possible application : extraction of portions de Web pages

  9. 1. What is annotation for ? Summary – many tasks can be considered as annotation tasks – for this, you need to specify – the nature of input items – the relationships between items : order relations of the input structure (sequence, tree...) – the nature of the annotations and their meaning – the relationships between annotations – the relationships between the items their corresponding annotation – pre-treatments and post-treatments often necessary

  10. Plan 1. What is annotation for ? 2. Linear and Tree-shaped CRFs 3. State of the Art 4. Conclusion

  11. 2. Linear and Tree-shaped CRFs Basic notions – classical notations : x is the input, y its annotation (of the same structure) – x and y are decomposed into random variables : x = { X 1 , X 2 , ..., X n } et y = { Y 1 , Y 2 , ...Y n } – a graphical model defines dependances between the random variables in a graph – in a generative model (HMM, PCFG), there are oriented Y i X j dependence from Y i to X j – otherwise, in a discriminative model (CRF), it is possible to compute directly p ( y | x ) without knowing p ( x ) – learning : find the best possible parameters for p ( y | x ) from annotated examples ( x, y ) by maximazing the likelihood – annotation : for a new x , compute ˆ y = argmax y p ( y | x )

  12. 2. Linear and Tree-shaped CRFs Basic properties of CRFs – define a non oriented graph on the variables Y i (implicitely : every variable X is connected) – CRFs are markovien discriminative models : p ( Y i | X ) only dépends of X and Y j ( i � = j ) such that Y i and Y j are connected – CRFs are defined by (Lafferty, McCallum et Pereira 01) 1 � � � � p ( y | x ) = exp λ k f k ( y c , x, i ) Z ( x ) c ∈C k – C is the set of cliques of the graph – y c : values of y on the clique c – Z ( x ) un normalization factor – the f k are user-provided features – λ k are the parameters of the model (weights for f k )

  13. 2. Linear and Tree-shaped CRFs The usual graph for linear CRFs ... ... Y i − 1 Y i +1 Y 1 Y i Y N – the features can use any information in x combined with any information in y c – examples of features f k ( y i − 1 , y i , x, i ) at position i : * f k ( y i − 1 , y i , x, i ) = 1 if x i − 1 ∈ { the, a } and y i − 1 = Det et y i = N = 0 otherwise * f k ′ ( y i − 1 , y i , x, i ) = 1 if { Mr, Mrs, Miss } ∩ { x i − 3 , ..., x i − 1 } � = ∅ and y i = NE = 0 otherwise

  14. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Definition of features in softwares – define a pattern (any shape on x , at most clique-width on y ) – corresponding instance : f 1 ( y i − 1 , y i , x, i ) = 1 if ( x i =La) AND ( y i =Det) = 0 otherwise

  15. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 2 ( y i − 1 , y i , x, i ) = 1 if ( x i =bonne) AND ( y i =Adj) = 0 otherwise

  16. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 4 ( y i − 1 , y i , x, i ) = 1 if ( x i − 1 =La) AND ( y i − 1 =Det) AND ( x i =bonne) AND ( y i =Adj) = 0 otherwise

  17. 2. Linear and Tree-shaped CRFs Transform a HMM into a linear CRF 1/3 Adj bonne : 1 / 2 , grande : 1 / 2 1/3 2/3 2/3 1 Det N V intr la : 2 / 3 bonne : 1 / 3 fume : 4 / 5 une : 1 / 3 soupe : 2 / 3 soupe : 1 / 5 – f 1 ( y i , x, 1) = 1 if y i = Det and x i = la ( = 0 otherwise), λ 1 = log (2 / 3) – f 2 ( y i − 1 , y i , x, 1) = 1 if y i − 1 = Det and y i = Adj ( = 0 otherwise), λ 2 = log (1 / 3) (if empty transition λ = −∞ ) – the computation of p ( y | x ) is the same in both cases

  18. 2. Linear and Tree-shaped CRFs Possible graphs for trees ⊥ ⊥ SUJ PRED OBJ ⊥ SUJ PRED OBJ ⊥ ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥

  19. 2. Linear and Tree-shaped CRFs Implementations – learning step by maximizing the log-likelihood � � log ( p ( y | x )) = log p ( y | x ) + penalty... ( x,y ) ∈ S ( x,y ) ∈ S by gradient descent (L-BFGS) – annotation by Viterby (linear), inside-outside (trees), message passing (general)... – computation in K ∗ N ∗ | Y | c ( c length of the largest clique) – implementations available : Mallet, GRMM, CRFSuite, CRF++, Wapiti, XCRF (for 3-width clique trees), Factorie

  20. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  21. 3. State of the Art Use of CRFs for labeling tasks – NE recognition (McCallum & Li, 2003) – IE from tables (Pinto & al., 2003), – POS labeling (Altun & al., 2003) – shallow parsing (Sha & Pereira, 2003) – SRL for trees (Cohn & Blusom 2005) – tree transformation (Gilleron & al. 2006) – non linguistic uses : image labeling/segmenting, RNA alignment...

  22. 3. State of the Art Extensions about the graph – add dependencies in the graph : skip-chain CRFs, dynamic (multi-levels) CRFs... – use CRFs for syntactic parsing (Finkel & al. 2008) – build the tree structure of a CRF (Bradley & Guestrin 2010) – CRFs for general graphs (grid-shaped for images) How to build the features – nearly always binary – feature induction (Mc Callum 2003) – allow to integrate external knowledge... (cf. further) – more general features may be more effective (Pu & al. 2010)

  23. 3. State of the Art About the learning step – unsupervised or semi-supervised CRFs (difficult, not very effective) – add L1 penalty to the likelihood to select the best features (Lavergne & Yvon 2010) – add constraints at different possible levels (features, likelihood, labels...) : LREC 2012 tutorial (Druck & alii 2012) – MCMC inference methods

  24. 3. State of the Art Linguistic interest – sequential vs. direct complex labeling ? – how to integrate linguistic knowledge ? – as external constraints – as additional labeled input data – as features

  25. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  26. Conclusion Interests – very effective for many tasks – allow the integration of many distinct sources of information – many available easy-to-use libraries Weaknesses – does not support well unsupervised/semi-supervised learning – not very incremental – still high learning complexity with large cliques or large label vocabulary

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend