introduction to natural language processing parsing
play

Introduction to Natural Language Processing PARSING: Earley, - PowerPoint PPT Presentation

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing Jean-C edric Chappelier Jean-Cedric.Chappelier@epfl.ch Artificial Intelligence Laboratory M. Rajman LIA Introduction to Natural


  1. ✬ ✩ Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing Jean-C´ edric Chappelier Jean-Cedric.Chappelier@epfl.ch Artificial Intelligence Laboratory ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 1/20 I&C J.-C. Chappelier

  2. ✬ ✩ Objectives of this lecture ➥ After CYK algorithm, present two other algorithms used for syntactic parsing ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 2/20 I&C J.-C. Chappelier

  3. ✬ ✩ Earley Parsing Bottom-up = inference Top-down algorithm(predictive) Top-down = search 3 advantages: ✌ best known worst-case complexity (as CYK) ✌ adaptive complexity for least complex languages (e.g. regular languages) ✌ No need for a special form of the CF grammar 2 drawbacks : ➷ No way to correct/reconstruct non-parsable sentences (”early error detection”) ➷ not very intuitive ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 3/20 I&C J.-C. Chappelier

  4. ✬ ✩ Earley Parsing (2) Idea: on-line (i.e during parsing) binarization of the grammar ☞ doted rules and ”Earley items” doted rules: X → X 1 ...X k • X k +1 ...X m with X → X 1 ...X m a rule of the grammar Earley item: one doted rule with one integer i ( 0 ≤ i ≤ n , n : size of the input string) ☞ the part before the dot ( • ) represents the subpart of the rule that derives a substring of the input string starting at position i + 1 Example: ( VP → V • NP, 2) is an Earley item for input string the cat ate a mouse 1 2 3 4 5 ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 4/20 I&C J.-C. Chappelier

  5. ✬ ✩ Earley Parsing (3) Principle: Starting from all possible ( S → • X 1 ... X m , 0) , parallel construction of all the dotted rules deriving (larger and larger) substrings of the input string, up to a point where the whole input sentence is derived ☞ construction of sets of items ( E j ) such that: ( X → α • β, i ) ∈ E j ⇐ ⇒ ∃ γ, δ : S ⇒ ∗ γ X δ γ ⇒ ∗ w 1 ... w i α ⇒ ∗ w i +1 ... w j and and Example: in the former example ( VP → V • NP, 2) ∈ E 3 The input string (length n ) is syntactically correct (accepted) iff at least one ( S → X 1 ... X m • , 0) is in E n ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 5/20 I&C J.-C. Chappelier

  6. ✬ ✩ Earley Parsing (4) ➊ Initialization: construction of E 0 1. For each rule S → X 1 ... X m in the grammar: add ( S → • X 1 ... X m , 0) to E 0 2. For each ( X → • Y β, 0) in E 0 and every rule Y → γ , add ( Y → • γ, 0) to E 0 3. Iterate (2) until convergence of E 0 ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 6/20 I&C J.-C. Chappelier

  7. ✬ ✩ Earley Parsing: Interpretation ➋ Iterations: building of derivations of w 1 ...w j ( E j ) 1. Linking with words : Introduce word w j whenever a derivation of w 1 ...w j − 1 can ”eat” w j (i.e. ” there is a • before w j ”) 2. Stepping in the derivation : Whenever non-terminal X can derive a subsequence starting at w i +1 and if there exists one subderivation ending in w i which can ”eat” X , do it! 3. Prediction (of useful items): If at some place Y could be ”eaten” by some rule, then introduce all the rules that might (later on) produce Y ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 7/20 I&C J.-C. Chappelier

  8. ✬ ✩ Earley Parsing (next) ➋ Iterations: construction of the E j sets ( 1 ≤ j ≤ n ) 1. for all ( X → α • w j β, i ) in E j − 1 , add ( X → α w j • β, i ) to E j 2. For all ( X → γ • , i ) of E j , for all ( Y → α • Xβ, k ) of E i , add ( Y → α X • β, k ) to E j 3. For all ( Y → α • Xβ, i ) in E j and for each rule X → γ , add ( X → • γ, j ) to E j 4. Repeat to (2) while E j keeps changing ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 8/20 I&C J.-C. Chappelier

  9. ✬ ✩ Earley Parsing: Full Example Example for ” I think ”, and the grammar: S → NP VP Pron → I NP → Pron V → think NP → Det N VP → V VP → V S VP → V NP ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 9/20 I&C J.-C. Chappelier

  10. ✬ ✩ E 0 : ( S → • NP VP, 0) ( NP → • Pron, 0) ( NP → • Det N, 0) ( Pron → • I , 0) E 1 : ( Pron → I • , 0) ( NP → Pron • , 0) ( S → NP • VP, 0) ( VP → • V, 1) ( VP → • V P, 1) ( VP → • V NP, 1) ( V → • think , 1) E 2 : ( V → think • , 1) ( VP → V • , 1) ( VP → V • S, 1) ( VP → V • NP, 1) ( S → NP VP • , 0) ( S → • NP VP, 2) ( NP → • Pron, 2) ( NP → • Det N, 2) ( Pron → • I , 2) ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 10/20 I&C J.-C. Chappelier

  11. ✬ ✩ Link between CYK and Earley ( X → α • β, i ) ∈ E j ⇐ ⇒ ( X → α • β ) ∈ cell j − i, i +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( S → NP VP • , 0) ( S → NP • VP, 0) ( VP → V • S, 1) ( VP → V • NP, 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( NP → Pron • , 0) ( VP → V • , 1) ( Pron → I • , 0) ( V → think • , 1) ( S → • NP VP, 0) ( VP → • V, 1) ( S → • NP VP, 2) ( NP → • Pron, 0) ( VP → • V S, 1) ( NP → • N, 2) ( NP → • Det N, 0) ( VP → • V NP, 1) ( NP → • Det N, 2) ( Pron → • I , 0) ( V → • think , 1) ( Pron → • I , 2) I think ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 11/20 I&C J.-C. Chappelier

  12. ✬ ✩ Bottom-up Chart Parsing Idea: keep the best of both CYK and Earley ☞ on-line binarization ”` a la” Earley (and even better) within a bottom-up CYK-like algorithm Mainly: • no need for indices in items: cell position is enough • factorize (with respect to α ) all the X → α • β α • ... ☞ This is possible when processing bottom-up • replace all the X → α • simply by X • supression of X → • α This is possible when processing bottom-up (and without lookahead) ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 12/20 I&C J.-C. Chappelier

  13. ✬ ✩ Bottom-up Chart Parsing: Example S S VP ... ... NP NP �� �� �� �� NP NP ... �� �� ... ... Det V Det �� �� � � Det N V VP Det N The crocodile ate the cat ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 13/20 I&C J.-C. Chappelier

  14. ✬ ✩ Bottom-up Chart Parsing (3) More formally, a CYK algorithm in which: If cell contents are denoted by [ α • ..., i, j ] and [ X, i, j ] respectively Then initialization is w ij ⇒ [ X, i, j ] for X → w ij ∈ R and the completion phase becomes: (association of two cells)  [ α X • ..., i + k, j ] if Y → α Xβ ∈ R  [ α • ..., i, j ] ⊕ [ X, k, j + i ] ⇒ if Y → α X ∈ R [ Y, i + k, j ]  (”self-filling”)  [ X • ..., i, j ] if Y → Xβ ∈ R  [ X, i, j ] ⇒ if Y → X ∈ R [ Y, i, j ]  ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 14/20 I&C J.-C. Chappelier

  15. ✬ ✩ Bottom-up Chart Parsing: Example Det N V Det N dog The hate the cat Initialization: ... ... ... Det Det V Det N V VP Det N dog The hate the cat k �� �� �� �� �� �� Completion: �� �� �� �� �� �� k ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 15/20 I&C J.-C. Chappelier

  16. ✬ ✩ S S VP � � � � ... ... NP NP � � � � NP NP ... �� �� �� �� ... ... Det V Det �� �� �� �� �� �� Det N V NP Det N The crocodile ate the cat ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 16/20 I&C J.-C. Chappelier

  17. ✬ ✩ Dealing with compounds Example on how to deal with compouds during initialization phase: N N V N credit card ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 17/20 I&C J.-C. Chappelier

  18. ✬ ✩ Complexity still in O ( n 3 ) What coefficient for n 3 ? (with respect to grammar parameters) m ( R ′ ) · |NT | · n 3 where m ( R ′ ) the number of internal nodes of the trie of the right-hand sides of the non-lexical grammar rules NT : the set of non-terminals R ′ : the set of non-lexical grammar rules ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 18/20 I&C J.-C. Chappelier

  19. ✬ ✩ Keypoints ➟ The way algorithms work (Earley items, linking, stepping, prediction, link CYK-Earley) ➟ worst-case complexity O ( n 3 ) ➟ Advantages and drawbacks of algorithms ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 19/20 I&C J.-C. Chappelier

  20. ✬ ✩ References [1] D. Jurafsky & J. H. Martin, Speech and Language Processing , pp. 377-385, Prentice Hall, 2000. [2] R. Dale, H. Moisi, H. Somers, Handbook of Natural Language Processing , pp. 69-73, Dekker, 2000. ✫ ✪ M. Rajman LIA Introduction to Natural Language Processing (CS-431) 20/20 I&C J.-C. Chappelier

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend