parsing beyond context free grammar
play

Parsing beyond context-free grammar: adjunction: replacing an - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (1) A Tree Adjoining Grammars (TAG) (Joshi & Schabes 1997) is a tree-rewriting system, i.e., a set of elementary trees with two operations: Parsing beyond


  1. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (1) A Tree Adjoining Grammars (TAG) (Joshi & Schabes 1997) is a tree-rewriting system, i.e., a set of elementary trees with two operations: Parsing beyond context-free grammar: • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the Tree Adjoining Grammar Parsing foot node. Laura Kallmeyer, Wolfgang Maier • substitution: replacing a leaf with a new tree. University of T¨ ubingen The new tree is an initial tree ESSLLI Course 2008 Notation: γ [ p, γ ′ ] is the tree one obtains from replacing the node at position p in γ with the tree γ ′ (by substitution or adjunction). Parsing beyond CFG 1 TAG Parsing Parsing beyond CFG 3 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (2) (1) John sometimes laughs Overview S 1. Tree Adjoining Grammars NP VP VP 2. An Earley parser for TAG NP ADV VP ∗ V (a) Introduction John sometimes laughs (b) Items (c) Inference Rules S 3. LR Parsing NP VP (a) Introduction derived tree John ADV VP (b) Construction of the automaton laugh [1 , john ][2 , sometimes ]: (c) The recognizer sometimes V laughs Parsing beyond CFG 2 TAG Parsing Parsing beyond CFG 4 TAG Parsing

  2. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (3) Tree Adjoining Grammars (5) A Tree Adjoining Grammar (TAG) is a quadruple G = � N, T, I, A � Languages TAG can generate: such that • { ww | w ∈ { a, b } ∗ } • T and N are disjoint alphabets of terminals and nonterminals, • L 4 := { a n b n c n d n | n ≥ 0 } • I is a finite set of initial trees, and Languages TAG cannot generate: • { w n | w ∈ { a, b } ∗ } for any n > 2. • A is a finite set of auxiliary trees. ⇒ TAG generate only a limited amount of cross-serial The trees in I ∪ A are called elementary trees. dependencies G is lexicalized iff each elementary tree has at least one leaf with a • L k := { a n 1 a n 2 a n 3 . . . a n terminal label. k | n ≥ 0 } for any k > 4. ⇒ TAG can “count up to 4, not further”. TAG allows to specify for each node • L := { a 2 n | n ≥ 0 } . 1. whether adjunction is mandatory and ⇒ TAG cannot generate languages whose word lengths grow 2. which trees can be adjoined. exponentially. Parsing beyond CFG 5 TAG Parsing Parsing beyond CFG 7 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (4) Tree Adjoining Grammars (6) A derivation starts with an initial tree. In a final derived tree, all TAGs are mildly context-sensitive: leaves must have terminal labels: • TAGs are slightly more powerful than CFG, they can describe Let G = � I, A, N, T � be a TAG. Let γ and γ ′ be finite trees. a limited amount of cross-serial dependencies. • γ ⇒ γ ′ in G iff there is a node position p and an instance γ ′ 0 of a • TAGs are polynomially parsable (complexity O ( n 6 )). tree (possibly derived from some) γ 0 ∈ I ∪ A such that • TALs are of constant growth. γ ′ = γ [ p, γ 0 ]. ∗ ⇒ is the reflexive transitive closure of ⇒ . • The tree language of G is L T ( G ) := { γ | there is an α ∈ I such ∗ that α ⇒ γ , all leaves in γ have terminal labels and there are no OA nodes in γ } . Parsing beyond CFG 6 TAG Parsing Parsing beyond CFG 8 TAG Parsing

  3. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (1) Earley Parsing: Introduction (3) General idea: Whenever we are • Left-to-right CKY parser (Vijay-Shanker & Joshi, 1985) very slow: O( n 6 ) worst case and best case (just as in CFG version • left above a node, we can predict an adjunction and start the of CKY, to many partial trees not pertinent to the final tree traversal of the adjoined tree; are produced) • left of a foot node, we can move back to the adjunction site and • Behaviour is due to pure bottom-up approach, no predictive traverse the tree below it; information whatsoever is used • right of an adjunction site, we continue the traversal of the • Goal: Earley-style parser! First in Schabes & Joshi (1988). adjoined tree at the right of its foot node; Here, we present the algorithm from Joshi & Schabes (1997). • right above the root of an auxiliary tree, we can move back to We assume a TAG without substitution nodes. the right of the adjunction site. Parsing beyond CFG 9 TAG Parsing Parsing beyond CFG 11 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (2) Earley Parsing: Items (1) What kind of information do we need in an item characterizing a • Earley Parsing: Left-to-right scanning of the string (using partial parsing result? predictions to restrict hypothesis space) • Traversal of elementary trees, current position marked with a [ α, dot, pos, i, j, k, l, sat ?] dot. The dot can have exactly four positions with respect to the where node: left above (la), left below (lb), right above (ra), right • α ∈ I ∪ A is a (dotted) tree, dot and pos the address and below (rb). location of the dot • i, j, k, l are indices on the input string, where i, l ∈ { 0 , . . ., n } , j, k ∈ { 0 , . . ., n } ∪ {−} , n = | w | , − means unbound value • sat ? is a flag. It controls (prevents) multiple adjunctions at a single node ( sat ? = 1 means that something has already been adjoined to the dotted node) Parsing beyond CFG 10 TAG Parsing Parsing beyond CFG 12 TAG Parsing

  4. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (2) Earley Parsing: Inference Rules (1) What do the items mean? [ α, dot, la, i, j, k, l, nil ] ScanTerm α ( dot ) labelled w l +1 • [ α, dot, la, i, j, k, l, nil ]: In α part left of the dot ranges from i to [ α, dot, ra, i, j, k, l + 1 , nil ] l . If α is an auxiliary tree, part below foot node ranges from j to k . • [ α, dot, lb, i, − , − , i, nil ]: In α part below dotted node starts at position i . • w l +1 wi +1 . . . wl • [ α, dot, rb, i, j, k, l, sat ?]: In α part below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . If sat ? = nil , nothing was adjoined to [ α, dot, la, i, j, k, l, nil ] dotted node, sat ? = 1 means that adjunction took place. Scan- ǫ α ( dot ) labelled ǫ [ α, dot, ra, i, j, k, l, nil ] • [ α, dot, ra, i, j, k, l, nil ]: In α part left and below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . Parsing beyond CFG 13 TAG Parsing Parsing beyond CFG 15 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (3) Earley Parsing: Inference Rules (2) Some notational conventions: [ α, dot, la, i, j, k, l, nil ] PredictAdjoinable β ∈ Adj ( α ( dot )) • We use Gorn addresses for the nodes: 0 is the address of the [ β, 0 , la, l, − , − , l, nil ] root, i (1 ≤ i ) is the address of the i th daughter of the root, • and for p � = 0, p · i is the address of the i th daughter of the A • A node at address p . ⇒ • For a tree α and a Gorn address dot , α ( dot ) denotes the node at address dot in α (if defined). A ∗ wi +1 . . . wl • For a node n , Adj ( n ) is the set of trees adjoinable at n . nil ∈ Adj ( n ) signifies that adjunction is not obligatory. Adj ( n ) = ∅ if n has a terminal or ǫ as label. [ α, dot, la, i, j, k, l, nil ] nil ∈ Adj ( α ( dot )) PredictNoAdj [ α, dot, lb, l, − , − , l, nil ] Parsing beyond CFG 14 TAG Parsing Parsing beyond CFG 16 TAG Parsing

  5. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (3) Earley Parsing: Inference Rules (5) PredictAdjoined Complete II [ β, dot, lb, l, − , − , l, nil ] [ α, dot, rb, i, j, k, l, sat ?] , [ α, dot, la, h, − , − , i, nil ] dot = foot ( β ) , β ∈ Adj ( α ( dot ′ )) β ( dot ) ∈ N [ α, dot ′ , lb, l, − , − , l, nil ] [ α, dot, ra, h, j, k, l, nil ] or A [ α, dot, rb, i, − , − , l, sat ?] , [ α, dot, la, h, j, k, i, nil ] β ( dot ) ∈ N ⇒ • A [ α, dot, ra, h, j, k, l, nil ] • A ∗ A • • A ⇒ • A wh +1 . . . wl wi +1 . . . wl wh +1 . . . wi Parsing beyond CFG 17 TAG Parsing Parsing beyond CFG 19 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (4) Earley Parsing: Inference Rules (6) Complete I Adjoin [ α, dot, rb, i, j, k, l, 1] , [ β, dot ′ , lb, i, − , − , i, nil ] [ β, 0 , ra, i, j, k, l, nil ] , [ α, dot, rb, j, p, q, k, nil ] dot ′ = foot ( β ) , β ∈ Adj ( α ( dot )) [ β, dot ′ , rb, i, i, l, l, nil ] β ∈ Adj ( α ( dot )) [ α, dot, rb, i, p, q, l, 1] A • A A A ⇒ • A adj • A ∗ ⇒ A • A ∗ • A ∗ • wi +1 . . . wj wk +1 . . . wl wi +1 . . . wl wi +1 . . . wl wj +1 . . . wk sat ? = 1 prevents the new item from being reused in another Adjoin application. Parsing beyond CFG 18 TAG Parsing Parsing beyond CFG 20 TAG Parsing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend