Chart Parsing: The Earley Algorithm 2 The Earley Algorithm - PowerPoint PPT Presentation

1 Adding Prediction to the Chart Prediction Dotted Rules Chart Parsing: The Earley Algorithm 2 The Earley Algorithm Informatics 2A: Lecture 18 Parsing Operations Details of the Algorithm Bonnie Webber (revised by Frank Keller) Visualizing the Chart Comparing Earley and CYK School of Informatics University of Edinburgh Reading: bonnie@inf.ed.ac.uk J&M (1st ed), ch. 10 (pp. 377–385) or J&M (2nd ed), 26 October 2007 ch. 13 (pp. 10-25); NLTK Book, http://nltk.org/doc/en/advanced-parsing.pdf, pp. 8-19 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 1 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 2 Prediction Dotted Rules As we saw in the last lecture, the CYK algorithm avoids redundant With dotted rules, an arc/edge in the chart records: work by storing sub-trees in a chart. which rule has been used in the analysis; We can avoid even more work by adding prediction to the chart. which part of the rule has already been found (left of the dot), and which part is still predicted to be found (right of the dot); We need a new data structure: A dotted rule stands for a partially constructed constituent, with the dot indicating how much has the start and end position of the material left of the dot. already been found and how much is still predicted. For example, the input . . . 1 with 2 the 3 telescope 4 . . . could lead Dotted rules are generated from ordinary grammar rules. to the following dotted rule: The grammar rule NP → V NP yields the following dotted rules: NP → Det . N, [2 , 3] VP → . V NP incomplete edge This means the word from input position 2 to 3 is spanned by a VP → V . NP incomplete edge Det, and an N is predicted to come next; if found, it will yield an VP → V NP . complete edge NP. Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 3 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 4

Dotted Rules Parsing Operations Example for a chart which contains dotted rules (in graph The Earley algorithm comprises three main operations: representation): Predictor: an incomplete edge looks for a symbol to the right NP −> Det N . of its dot; if there is no matching symbol in the chart, one is predicted by adding all matching rules with an initial dot. NP −> Det . N Scanner: an incomplete edge looks for a POS to the right of Det −> the . Prep −> with . N −> telescope . its dot; this POS prediction is compared to the input, and a complete edge is added to the chart if it matches. with the telescope Completer: a complete edge is combined with an incomplete 1 2 3 4 edge that is looking for it to form another complete edge. NP −> . Det N Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 5 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 6 Properties of the Algorithm Adding Entries to the Chart The Earley algorithm is a bottom-up chart parser with top-down The main recursive step is: Enqueue ( state , chart entry ), where prediction. chart entry is a set of states: Top-down prediction: the algorithm builds up parse trees bottom-up, but the incomplete edges (the predictions) are Enqueue ( state , chart entry ) generated top-down, starting with the start symbol of the 1 if state is not already in chart entry grammar. 2 then Push ( state , chart entry ) Also, edges are added in left-to-right order: if A → X . B, [ i , j ] is added before C → Y . D, [ i ′ , j ′ ] then j ≤ j ′ . Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 7 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 8

Main Parsing Function Predictor, Scanner, Completer Earley Parse ( words , grammar ) Predictor (( A → α . B β, [ i , j ])) 1 Enqueue (( γ → . S , [0 , 0]) , chart [0]) 1 for each ( B → γ ) in Grammar Rules For ( B , grammar ) 2 for i ← 0 to length ( words ) 2 do Enqueue (( B → . γ, [ j , j ]) , chart [ j ]) 3 do for each state in chart [ i ] 4 do if Incomplete ( state ) and Scanner (( A → α . B β, [ i , j ])) 5 Next Cat ( state ) is not a part of speech 1 if B ⊂ Parts Of Speech ( word [ j ]) 6 then Predictor ( state ) 2 then Enqueue (( B → word [ j ] , [ j , j + 1]) , chart [ j + 1]) 7 else if Incomplete ( state ) and 8 Next Cat ( state ) is a part of speech Completer (( B → γ . , [ j , k ])) 9 then Scanner ( state ) 1 for each ( A → α . B β, [ i , j ]) in chart [ j ] 10 else Completer ( state ) 2 do Enqueue (( A → α B . β, [ i , k ]) , chart [ k ]) 11 return chart Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 9 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 10 Visualizing the Chart Step 0 Grammatical rules Lexical rules S → NP VP Det → a | the (determiner) j = 0 label begin end reason NP → Det Nom N → fish | frogs | soup (noun) 0 γ → . S 0 0 earley parse NP → Nom Prep → in | for (preposition) 1 S → . NP VP 0 0 enter Nom → N SRel TV → saw | ate (transitive verb) 2 NP → . Det Nom 0 0 predict from 1 Nom → N IV → fish | swim (intransitive verb) 3 NP → . Nom 0 0 predict from 1 VP → TV NP Relpro → that (relative pronoun) VP → IV PP 4 Nom → . N SRel 0 0 predict from 3 VP → IV 5 Nom → . N 0 0 predict from 3 PP → Prep NP SRel → Relpro VP Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 11 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 12

step 1 Step 2 j = 1 label begin end reason 6 N → fish . 0 1 scan 7 IV → fish . 0 1 scan j = 2 label begin end reason 8 Nom → N . SRel 0 1 complete from 4 using 6 16 IV → swim . 1 2 scan 9 Nom → N . 0 1 complete from 5 using 6 17 VP → IV . 1 2 complete from 14 using 16 10 NP → Nom . 0 1 complete from 3 using 9 18 S → NP VP . 0 2 complete from 12 using 17 11 SRel → . Relpro VP 1 1 predict from 8 19 VP → IV . PP 1 2 complete from 15 using 16 12 S → NP . VP 0 1 complete from 1 using 10 20 PP → . Prep NP 2 2 predict from 19 13 VP → . TV NP 1 1 predict from 12 14 VP → . IV 1 1 predict from 12 15 VP → . IV PP 1 1 predict from 12 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 13 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 14 Step 3 Step 4 j = 3 label begin end reason j = 4 label begin end reason 21 Prep → in . 2 3 scan 26 Det → the . 3 4 scan 22 PP → Prep . NP 2 3 complete from 20 using 21 27 NP → Det . Nom 3 4 complete from 23 using 26 23 NP → . Det Nom 3 3 predict from 22 28 Nom → . N SRel 4 4 predict from 27 24 NP → . Nom 3 3 predict from 22 29 Nom → . N 4 4 predict from 27 25 Nom → . N SRel 3 3 predict from 24 25 Nom → . N 3 3 predict from 24 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 15 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 16

Step 5 Variations on Earley Algorithm Control Structure The Earley algorithm can be run with other control structures: j = 5 label begin end reason left-corner: we predict incomplete edges bottom-up instead of 30 N → soup . 4 5 scan 31 Nom → N . SRel 4 5 complete from 28 using 30 top-down (for details on left-corner parsing, see lecture 30); 32 SRel → . Relpro NP 5 5 predict from 31 agenda-based, which uses an ordered list of “pending” edges; 33 Nom → N . 4 5 complete from 29 using 30 allows us to prioritize edges that are likely to be correct. 34 NP → Det Nom . 3 5 complete from 27 using 33 35 PP → Prep NP . 2 5 complete from 22 using 34 There are also probabilistic variants of the Earley algorithm that 36 VP → IV PP . 1 5 complete from 19 using 35 can deal with Probabilistic CFGs (introduced in lectures 19 and 37 S → NP VP . 0 5 complete from 12 using 36 20). Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 17 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 18 Comparing Earley and CKY Comparing Earley and CKY An object relative clause can lead, in English, to a form of local Both Earley and CYK are bottom-up chart parsing algorithms, but ambiguity called a garden path sentence: Earley also uses top-down prediction to avoid building edges that will not lead to a valid parse. The horse raced past the barn fell To illustrate a difference between Earley and CKY, consider an NP which comes from lexical ambiguity: raced ⊂ TV, IV. with an object relative clause: Object relative clauses also allow a kind of center embedding that is particularly difficult to understand: The milk that we drank Nom → N Relpro NP TV The milk which we drank Nom → N NP TV The cheese the rat the cat chased ate was moldy The milk we drank yesterday Relpro → that | which We will return to garden paths at the end of this course, when we deal with human sentence processing. Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 19 Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 20

Chart Parsing: The Earley Algorithm 2 The Earley Algorithm - PowerPoint PPT Presentation

1 Adding Prediction to the Chart Prediction Dotted Rules Chart Parsing: The Earley Algorithm 2 The Earley Algorithm Informatics 2A: Lecture 18 Parsing Operations Details of the Algorithm Bonnie Webber (revised by Frank Keller) Visualizing

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing

Parsing I: Earley Parser CMSC 35100 Natural Language Processing May 1, 2003 Roadmap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Working with Charts Objectives Understand and plan a chart Create a chart Move and

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Earley Parser Christopher Millar and Ekaterina Volkova Seminar fr Sprachwissenschaft

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

Manual On Presentation Of Data And Control Chart Analysis Pdf And Data Analysis Solution Manual

Ingredient chart Ingredient chart Ingredient chart return false; } boolean

Tree Decomposition Maren Kaluza HELMHOLTZ CENTRE FOR ENVIRONMENTAL November 2018 RESEARCH -

Connected Filters Alexandre Xavier Falc ao Instituto de Computa c ao - UNICAMP

NLO: June 13, 2013 Dr. Thomas M. Surowiec Humboldt University of Berlin Department of

The Local Elections Media Briefing Tuesday 29 th April John Curtice, Professor of Politics,

CS-5630 / CS-6630 Visualization for DataScience Tables Alexander Lex alex@sci.utah.edu

Continuous Improvement Toolkit Scatter Diagram www. citoolkit .com The Continuous Improvement Map

S oft QCD results from ATLAS QCD@LHC : St Andrews, 22 nd August 2011 Emily Nurse ATLAS y =

Gravity from BRST squared copy BRST Double-copy in a non-flat Silvia Nagy background

Chart Parsing: The Earley Algorithm 2 The Earley Algorithm - PowerPoint PPT Presentation

1 Adding Prediction to the Chart Prediction Dotted Rules Chart Parsing: The Earley Algorithm 2 The Earley Algorithm Informatics 2A: Lecture 18 Parsing Operations Details of the Algorithm Bonnie Webber (revised by Frank Keller) Visualizing

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CKY &amp; Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing

Parsing I: Earley Parser CMSC 35100 Natural Language Processing May 1, 2003 Roadmap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Working with Charts Objectives Understand and plan a chart Create a chart Move and

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Earley Parser Christopher Millar and Ekaterina Volkova Seminar fr Sprachwissenschaft

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

Manual On Presentation Of Data And Control Chart Analysis Pdf And Data Analysis Solution Manual

Ingredient chart Ingredient chart Ingredient chart return false; } boolean

Tree Decomposition Maren Kaluza HELMHOLTZ CENTRE FOR ENVIRONMENTAL November 2018 RESEARCH -

Connected Filters Alexandre Xavier Falc ao Instituto de Computa c ao - UNICAMP

NLO: June 13, 2013 Dr. Thomas M. Surowiec Humboldt University of Berlin Department of

The Local Elections Media Briefing Tuesday 29 th April John Curtice, Professor of Politics,

CS-5630 / CS-6630 Visualization for DataScience Tables Alexander Lex alex@sci.utah.edu

Continuous Improvement Toolkit Scatter Diagram www. citoolkit .com The Continuous Improvement Map

S oft QCD results from ATLAS QCD@LHC : St Andrews, 22 nd August 2011 Emily Nurse ATLAS y =

Gravity from BRST squared copy BRST Double-copy in a non-flat Silvia Nagy background

CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP