Parsing beyond context-free grammar: adjunction: replacing an - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (1) A Tree Adjoining Grammars (TAG) (Joshi & Schabes 1997) is a tree-rewriting system, i.e., a set of elementary trees with two operations: Parsing beyond context-free grammar: • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the Tree Adjoining Grammar Parsing foot node. Laura Kallmeyer, Wolfgang Maier • substitution: replacing a leaf with a new tree. University of T¨ ubingen The new tree is an initial tree ESSLLI Course 2008 Notation: γ [ p, γ ′ ] is the tree one obtains from replacing the node at position p in γ with the tree γ ′ (by substitution or adjunction). Parsing beyond CFG 1 TAG Parsing Parsing beyond CFG 3 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (2) (1) John sometimes laughs Overview S 1. Tree Adjoining Grammars NP VP VP 2. An Earley parser for TAG NP ADV VP ∗ V (a) Introduction John sometimes laughs (b) Items (c) Inference Rules S 3. LR Parsing NP VP (a) Introduction derived tree John ADV VP (b) Construction of the automaton laugh [1 , john ][2 , sometimes ]: (c) The recognizer sometimes V laughs Parsing beyond CFG 2 TAG Parsing Parsing beyond CFG 4 TAG Parsing

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (3) Tree Adjoining Grammars (5) A Tree Adjoining Grammar (TAG) is a quadruple G = � N, T, I, A � Languages TAG can generate: such that • { ww | w ∈ { a, b } ∗ } • T and N are disjoint alphabets of terminals and nonterminals, • L 4 := { a n b n c n d n | n ≥ 0 } • I is a finite set of initial trees, and Languages TAG cannot generate: • { w n | w ∈ { a, b } ∗ } for any n > 2. • A is a finite set of auxiliary trees. ⇒ TAG generate only a limited amount of cross-serial The trees in I ∪ A are called elementary trees. dependencies G is lexicalized iff each elementary tree has at least one leaf with a • L k := { a n 1 a n 2 a n 3 . . . a n terminal label. k | n ≥ 0 } for any k > 4. ⇒ TAG can “count up to 4, not further”. TAG allows to specify for each node • L := { a 2 n | n ≥ 0 } . 1. whether adjunction is mandatory and ⇒ TAG cannot generate languages whose word lengths grow 2. which trees can be adjoined. exponentially. Parsing beyond CFG 5 TAG Parsing Parsing beyond CFG 7 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (4) Tree Adjoining Grammars (6) A derivation starts with an initial tree. In a final derived tree, all TAGs are mildly context-sensitive: leaves must have terminal labels: • TAGs are slightly more powerful than CFG, they can describe Let G = � I, A, N, T � be a TAG. Let γ and γ ′ be finite trees. a limited amount of cross-serial dependencies. • γ ⇒ γ ′ in G iff there is a node position p and an instance γ ′ 0 of a • TAGs are polynomially parsable (complexity O ( n 6 )). tree (possibly derived from some) γ 0 ∈ I ∪ A such that • TALs are of constant growth. γ ′ = γ [ p, γ 0 ]. ∗ ⇒ is the reflexive transitive closure of ⇒ . • The tree language of G is L T ( G ) := { γ | there is an α ∈ I such ∗ that α ⇒ γ , all leaves in γ have terminal labels and there are no OA nodes in γ } . Parsing beyond CFG 6 TAG Parsing Parsing beyond CFG 8 TAG Parsing

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (1) Earley Parsing: Introduction (3) General idea: Whenever we are • Left-to-right CKY parser (Vijay-Shanker & Joshi, 1985) very slow: O( n 6 ) worst case and best case (just as in CFG version • left above a node, we can predict an adjunction and start the of CKY, to many partial trees not pertinent to the final tree traversal of the adjoined tree; are produced) • left of a foot node, we can move back to the adjunction site and • Behaviour is due to pure bottom-up approach, no predictive traverse the tree below it; information whatsoever is used • right of an adjunction site, we continue the traversal of the • Goal: Earley-style parser! First in Schabes & Joshi (1988). adjoined tree at the right of its foot node; Here, we present the algorithm from Joshi & Schabes (1997). • right above the root of an auxiliary tree, we can move back to We assume a TAG without substitution nodes. the right of the adjunction site. Parsing beyond CFG 9 TAG Parsing Parsing beyond CFG 11 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (2) Earley Parsing: Items (1) What kind of information do we need in an item characterizing a • Earley Parsing: Left-to-right scanning of the string (using partial parsing result? predictions to restrict hypothesis space) • Traversal of elementary trees, current position marked with a [ α, dot, pos, i, j, k, l, sat ?] dot. The dot can have exactly four positions with respect to the where node: left above (la), left below (lb), right above (ra), right • α ∈ I ∪ A is a (dotted) tree, dot and pos the address and below (rb). location of the dot • i, j, k, l are indices on the input string, where i, l ∈ { 0 , . . ., n } , j, k ∈ { 0 , . . ., n } ∪ {−} , n = | w | , − means unbound value • sat ? is a flag. It controls (prevents) multiple adjunctions at a single node ( sat ? = 1 means that something has already been adjoined to the dotted node) Parsing beyond CFG 10 TAG Parsing Parsing beyond CFG 12 TAG Parsing

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (2) Earley Parsing: Inference Rules (1) What do the items mean? [ α, dot, la, i, j, k, l, nil ] ScanTerm α ( dot ) labelled w l +1 • [ α, dot, la, i, j, k, l, nil ]: In α part left of the dot ranges from i to [ α, dot, ra, i, j, k, l + 1 , nil ] l . If α is an auxiliary tree, part below foot node ranges from j to k . • [ α, dot, lb, i, − , − , i, nil ]: In α part below dotted node starts at position i . • w l +1 wi +1 . . . wl • [ α, dot, rb, i, j, k, l, sat ?]: In α part below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . If sat ? = nil , nothing was adjoined to [ α, dot, la, i, j, k, l, nil ] dotted node, sat ? = 1 means that adjunction took place. Scan- ǫ α ( dot ) labelled ǫ [ α, dot, ra, i, j, k, l, nil ] • [ α, dot, ra, i, j, k, l, nil ]: In α part left and below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . Parsing beyond CFG 13 TAG Parsing Parsing beyond CFG 15 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (3) Earley Parsing: Inference Rules (2) Some notational conventions: [ α, dot, la, i, j, k, l, nil ] PredictAdjoinable β ∈ Adj ( α ( dot )) • We use Gorn addresses for the nodes: 0 is the address of the [ β, 0 , la, l, − , − , l, nil ] root, i (1 ≤ i ) is the address of the i th daughter of the root, • and for p � = 0, p · i is the address of the i th daughter of the A • A node at address p . ⇒ • For a tree α and a Gorn address dot , α ( dot ) denotes the node at address dot in α (if defined). A ∗ wi +1 . . . wl • For a node n , Adj ( n ) is the set of trees adjoinable at n . nil ∈ Adj ( n ) signifies that adjunction is not obligatory. Adj ( n ) = ∅ if n has a terminal or ǫ as label. [ α, dot, la, i, j, k, l, nil ] nil ∈ Adj ( α ( dot )) PredictNoAdj [ α, dot, lb, l, − , − , l, nil ] Parsing beyond CFG 14 TAG Parsing Parsing beyond CFG 16 TAG Parsing

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (3) Earley Parsing: Inference Rules (5) PredictAdjoined Complete II [ β, dot, lb, l, − , − , l, nil ] [ α, dot, rb, i, j, k, l, sat ?] , [ α, dot, la, h, − , − , i, nil ] dot = foot ( β ) , β ∈ Adj ( α ( dot ′ )) β ( dot ) ∈ N [ α, dot ′ , lb, l, − , − , l, nil ] [ α, dot, ra, h, j, k, l, nil ] or A [ α, dot, rb, i, − , − , l, sat ?] , [ α, dot, la, h, j, k, i, nil ] β ( dot ) ∈ N ⇒ • A [ α, dot, ra, h, j, k, l, nil ] • A ∗ A • • A ⇒ • A wh +1 . . . wl wi +1 . . . wl wh +1 . . . wi Parsing beyond CFG 17 TAG Parsing Parsing beyond CFG 19 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (4) Earley Parsing: Inference Rules (6) Complete I Adjoin [ α, dot, rb, i, j, k, l, 1] , [ β, dot ′ , lb, i, − , − , i, nil ] [ β, 0 , ra, i, j, k, l, nil ] , [ α, dot, rb, j, p, q, k, nil ] dot ′ = foot ( β ) , β ∈ Adj ( α ( dot )) [ β, dot ′ , rb, i, i, l, l, nil ] β ∈ Adj ( α ( dot )) [ α, dot, rb, i, p, q, l, 1] A • A A A ⇒ • A adj • A ∗ ⇒ A • A ∗ • A ∗ • wi +1 . . . wj wk +1 . . . wl wi +1 . . . wl wi +1 . . . wl wj +1 . . . wk sat ? = 1 prevents the new item from being reused in another Adjoin application. Parsing beyond CFG 18 TAG Parsing Parsing beyond CFG 20 TAG Parsing

Parsing beyond context-free grammar: adjunction: replacing an - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (1) A Tree Adjoining Grammars (TAG) (Joshi & Schabes 1997) is a tree-rewriting system, i.e., a set of elementary trees with two operations: Parsing beyond

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

General Context-Free Grammar Parsing: Application of grammar rewrite rules A phrase structure

General Context-Free Grammar Parsing Application of grammar rewrite rules A phrase structure

Working Together What does his future hold? Carres Grammar School Carres Grammar School

Parsing beyond context-free grammar: 1. N and T are disjoint alphabets, the nonterminals and

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

1 Determinism and Parsing The parsing problem is, given a string w and a context-free grammar G ,

Parsing beyond context-free grammar: necessarily adjacent. Range Concatenation Grammar Parsing

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Parsing beyond context-free grammar: S ( 0 , n ) for any w T .

Objectives LL Parsing The topic for this lecture is a kind of grammar that works well with

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 26 September 2017 Christophe

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 24 September 2019 Christophe

Weather Unit Weather 101 Video from National Geographic 3:19 Weather Vocabulary 1. Atmosphere

Day 2 Large scale models Todays schedule (morning) 9:00-9:10 Wit, Insight, and Matters of

East Asia Regional Reanalysis Project of CMA Xudong LIANG, JinFang YIN, Feng CHEN, Ying LIU,

Disaster Mitigation Competence Centre Project Meeting Coordinator: Simon Lin June 28, 2016

GPGPU Programming in Haskell with Accelerate Trevor L. McDonell University of New South Wales

Course schedule INTERFACE AESTHETICS Beyond desktop 2/04 Beyond Desktop 2/11 Typography I

Flat Bunches in the Tevatron Chandra Bhat Fermilab (LARP) Tevatron Accelerator Studies Workshop

Queue ADT Tiziana Ligorio 1 Todays Plan Announcements Queue ADT Applications 2