basic parsing algorithms chart parsing
play

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in - PowerPoint PPT Presentation

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt Talk Outline Chart Parsing Basics Chart Parsing Algorithms Earley Algorithm CKY Algorithm Basics


  1. Basic Parsing Algorithms – Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt

  2. Talk Outline  Chart Parsing – Basics  Chart Parsing – Algorithms – Earley Algorithm – CKY Algorithm → Basics → BitPar: Efficient Implementation of CKY

  3. Chart Parsing – Basics

  4. Chart Parsing – Basics  First proposed by Martin Kay  Dynamic programming approach – Partial results of the computation are stored and (re)used later if needed → Same problem is not solved more than once  Operates on a CFG  Functionality: Recogniser / Parser … in this talk focus on recogniser functionality

  5. Main Components  Chart  Edges  Agenda

  6. Component: Chart  Is a well-formed substring table (WFST) – Stores partial and complete analyses of substrings – Information stored in one triangular half of a two-dimensional array of (n+1)*(n+1) | n*n  Can also be understood as a (directed) graph – Vertices: positions between input words 0 Mary 1 feeds 2 the 3 otter 4 – Edges connecting vertices  Allows no duplicate entries

  7. Component: Edge  Data structure storing information about a particular step in the parsing process  Inhabit cells of the chart  Contain – Start and end position in input string – A dotted rule – Can also contain edge probability

  8. Component: Edge  A dotted rule consists of – Left hand side (LHS) = non-terminal symbol – Right hand side (RHS) = non-terminal or terminal symbol – A dot between RHS symbols indicating which constituents have already been found  Edges can be – Active / incomplete: dot not the last element of RHS – Inactive / complete: dot is last element of RHS  Example: S → NP • VP (0,1)

  9. Component: Agenda  Organises the order in which tasks are executed  Here all tasks (edges) are collected before being put on the chart  Ordering of agenda determines what is processed first → Therefore also which parse is found first – Queue, stack, ordering with respect to probabilities, …

  10. Parsing Strategies  Kay differentiates parsing strategies along two dimensions: – Bottom-up versus top-down – Directed versus undirected  Directed bottom-up – Only build edges for phrases that can actually be incorporated into a higher level structure → Left-Corner Parser  Directed top-down – Only build a new (active) edge if the next word of the input can be used to extend such an edge → Earley  Undirected varieties: No such restrictions → Undirected Bottom-Up: CKY

  11. Parsing Strategies Ways of achieving directedness:  Reachability Table: – Contains for each non-terminal N the set of all symbols that can be the first element of a string dominated by N – For example: NP can start with DET, N, ADJ, but not with V  Rule selection table: – M*N table where M = non-terminals excluding pre-terminals N = all non-terminals – Contains all grammar rules applicable in a situation where M is the 'upper' and N is the 'lower' symbol

  12. Chart Parsing: Advantages  No repeated computation of same subproblem  Deals well with left-recursive grammars  Deals well with ambiguity  No backtracking necessary

  13. Earley Algorithm

  14. Earley Algorithm  Proposed by Jay Earley  Top down search  Can handle all CFGs  Efficient: – O(n3) in the general case – Faster for particular types of grammar

  15. Terminology  In his paper, Earley does not use the notion of a 'chart'  He represents the parsing process as sets of states – Index of each state set = end position of all states in the set – A state largely corresponds to an edge - Contains dotted rule - Pointer to start position - End position can be derived from state set

  16. Terminology  Formalisms are very similar  Examples easier to follow when represented in charts  So we will stick with 'chart' representations

  17. Algorithm – Components  Initialization  Predictor  Scanner  Completer  Algorithm operates on one half of an array of size (n+1)*(n+1)

  18. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Initialise 0 1 2 3 4 5 0 X → • S eos 1 2 3 4 5

  19. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  20. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  21. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  22. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → • feeds 2 3 4 5

  23. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds 2 3 4 5

  24. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 3 4 5

  25. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N NP → • DET N N → • Mary N → • otter DET → • the 3 4 5

  26. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N N → • Mary N → • otter DET → • the 3 4 5

  27. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 4 5

  28. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → • otter 4 5

  29. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

  30. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

  31. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • VP → V NP • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend