Chart Parsing: The Earley Algorithm
Informatics 2A: Lecture 18 Bonnie Webber (revised by Frank Keller)
School of Informatics University of Edinburgh bonnie@inf.ed.ac.uk26 October 2007
Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 1 1 Adding Prediction to the ChartPrediction Dotted Rules
2 The Earley AlgorithmParsing Operations Details of the Algorithm Visualizing the Chart Comparing Earley and CYK Reading: J&M (1st ed), ch. 10 (pp. 377–385) or J&M (2nd ed),
- ch. 13 (pp. 10-25);
NLTK Book, http://nltk.org/doc/en/advanced-parsing.pdf, pp. 8-19
Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 2Prediction
As we saw in the last lecture, the CYK algorithm avoids redundant work by storing sub-trees in a chart. We can avoid even more work by adding prediction to the chart. We need a new data structure: A dotted rule stands for a partially constructed constituent, with the dot indicating how much has already been found and how much is still predicted. Dotted rules are generated from ordinary grammar rules. The grammar rule NP → V NP yields the following dotted rules: VP → . V NP incomplete edge VP → V . NP incomplete edge VP → V NP . complete edge
Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 3Dotted Rules
With dotted rules, an arc/edge in the chart records: which rule has been used in the analysis; which part of the rule has already been found (left of the dot), and which part is still predicted to be found (right of the dot); the start and end position of the material left of the dot. For example, the input . . . 1 with 2 the 3 telescope 4 . . . could lead to the following dotted rule: NP → Det . N, [2, 3] This means the word from input position 2 to 3 is spanned by a Det, and an N is predicted to come next; if found, it will yield an NP.
Informatics 2A: Lecture 17 Chart Parsing: The Earley Algorithm 4