Parsing beyond context-free grammar: 1. N and T are disjoint - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (1) Definition of a CFG: A context-free grammar (CFG) is a tuple G = � N, T, P, S � such that Parsing beyond context-free grammar: 1. N and T are disjoint alphabets, the nonterminals and Introduction terminals of G , 2. P ⊂ N × ( N ∪ T ) ∗ is a finite set of productions (also called Laura Kallmeyer, Wolfgang Maier rewriting rules). University of T¨ ubingen A production � A, α � is usually written A → α . ESSLLI Course 2008 3. S ∈ N is the start symbol. Parsing beyond CFG 1 Introduction Parsing beyond CFG 3 Introduction Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (2) Sample CFG G telescope : • Nonterminals: { S, NP, V P, PP, N, V, P, D } • Terminals: { the, man, telescope, saw, girl, with, John } Overview • Productions: 1. CFG and natural languages S → NP V P NP → D N 2. Parsing: Some preliminary notions V P → V P PP | V NP N → N PP 3. Polynomial extensions of CFG PP → P NP 4. Parsing schemata → man | girl | telescope → the N D N → John P → with V → saw Parsing beyond CFG 2 Introduction Parsing beyond CFG 4 Introduction

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (3) CFG and natural languages (5) Let G = � N, T, P, S � be a CFG. The (string) language L ( G ) of G is Question: Is CFG powerful enough to describe all natural language the set { w ∈ T ∗ | S ∗ ⇒ w } where phenomea? • for w, w ′ ∈ ( N ∪ T ) ∗ : w ⇒ w ′ iff there is a A → α ∈ P and Answer: No. There are constructions in natural languages that there are v, u ∈ ( N ∪ T ) ∗ such that w = vAu and w ′ = vαu . cannot be adequately described with a context-free grammar. Example: cross-serial dependencies in Dutch and in Swiss German. ∗ • ⇒ is the reflexive transitive closure of ⇒ : 0 Dutch: – w ⇒ w for all w ∈ ( N ∪ T ) ∗ , and – for all w, w ′ ∈ ( N ∪ T ) ∗ : w ⇒ w ′ iff there is a v such that n (1) n − 1 w ⇒ v and v ⇒ w ′ . ... dat Wim Jan Marie de kinderen zag helpen leren zwemmen – for all w, w ′ ∈ ( N ∪ T ) ∗ : w ⇒ w ′ iff there is a i ∈ I ∗ N such ... that Wim Jan Marie the children saw help teach swim i that w ⇒ w ′ . ‘... that Wim saw Jan help Marie teach the children to swim’ A language is called context-free iff it is generated by a CFG. Parsing beyond CFG 5 Introduction Parsing beyond CFG 7 Introduction Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (4) CFG and natural languages (6) Context-free languages (CFLs) Swiss German: • can be recognized in polynomial time ( O ( n 3 )); (2) ... das mer em Hans es huus h¨ alfed aastriiche • are accepted by push-down automata; ... that we Hans Dat house Acc helped paint • have nice closure properties (e.g., closure under ‘... that we helped Hans paint the house’ homomorphisms, intersection with regular languages . . . ); (3) • satisfy a pumping lemma; ... das mer d’chind em Hans es huus l¨ ond h¨ alfe aastriiche • can describe nested dependencies ( { ww R | w ∈ T ∗ } ). ... that we the children Acc Hans Dat house Acc let help paint ‘... that we let the children help Hans paint the house’ Swiss German uses case marking and displays cross-serial dependencies. Shieber (1985) shows that Swiss German is not context-free. Parsing beyond CFG 6 Introduction Parsing beyond CFG 8 Introduction

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (7) Parsing: Some preliminary notions (2) In general, because of the closure properties, the following holds: Parsing: decide whether w ∈ L and if so, then give its parse tree. A formalism that can generate cross-serial dependencies can also Example: generate the copy language { ww | w ∈ { a, b } ∗ } . Input: “the man saw the girl”. S The copy language is not context-free. NP VP Therefore we are interested in extensions of CFG in order to Output: D N V NP describe all natural language phenomena. the man saw D N the girl Input: “the man saw saw the girl”. Output: no. Parsing beyond CFG 9 Introduction Parsing beyond CFG 11 Introduction Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Parsing: Some preliminary notions (1) Parsing: Some preliminary notions (3) Let G be a grammar, L the string language of G and w ∈ T ∗ . The time and space complexity of a parsing algorithm can be determined depending on the length n of the input string and Recognition: decide whether w ∈ L . (sometimes) the size of the grammar. Example: Take the CFG G telescope . Complexity classes: Input: “the man saw the girl”. Output: yes. Input: “the man saw saw the girl”. Output: no. P (PTIME): problems that can be solved deterministically in an amount of time that is polynomial in the size of the input. I.e., there is a constant c and a k such that the parsing of a string of length n takes an amount of time ≤ cn k . Notation: O ( n k ). Parsing beyond CFG 10 Introduction Parsing beyond CFG 12 Introduction

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Parsing: Some preliminary notions (4) Polynomial extensions of CFG (2) Example: TAG derivation of abab : NP: problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose S NA solutions can be non-deterministically found in polynomial time. S NA S a S NP-complete: the hardest problems in NP. A problem is a S ❀ S ∗ a NP-complete if any problem in NP can be transformed into it in NA ǫ S ∗ a NA polynomial time. ǫ Open question: P � = NP? S NA S NA a S NA S NA In this course we are interested in extensions of CFG that are in P. a S b S b S ❀ S ∗ a S ∗ b NA NA S ∗ b NA S ∗ a ǫ NA ǫ Parsing beyond CFG 13 Introduction Parsing beyond CFG 15 Introduction Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Polynomial extensions of CFG (1) Polynomial extensions of CFG (3) Tree Adjoining Grammars (TAG), Joshi & Schabes (1997): Linear Indexed Grammars (LIG), Gazdar (1988), Vijayashanker (1987): • Tree-rewriting grammar. • Context-free productions where the nonterminal symbols are • Extension of CFG that allows to replace not only leaves but equipped with stacks containing indices. also internal nodes with new trees. • Three types of productions: • Can generate the copy language. – A [ . . . ] → X 1 . . .X i [ . . . ] . . .X n with X j ∈ N ∪ T for j � = i , Example: TAG for the copy language X i ∈ N . S NA S NA – A [ . . . ] → B [ f . . . ] S – A [ f . . . ] → X 1 . . .X i [ . . . ] . . .X n with X j ∈ N ∪ T for j � = i , a S b S X i ∈ N . ǫ S ∗ a S ∗ b NA NA LIGs are weakly equivalent to TAG. Parsing beyond CFG 14 Introduction Parsing beyond CFG 16 Introduction

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Polynomial extensions of CFG (4) Polynomial extensions of CFG (6) Example: LIG for the copy language Multiple Context-free Grammars (MCFG), Seki et al. (1991): 5-tuple � N, T, F, P, S � such that S 0 → S [#] S [ .. ] → aS a [ .. ] S a [ .. ] → S [ a.. ] • N, T are non-terminals and terminals where each non-terminal A has a dimension, dim ( A ). From a non-terminal of dimension S [ .. ] → bS b [ .. ] S b [ .. ] → S [ b.. ] k , k -tuples of terminal strings are derived. The dimension of S → T the start symbol S is 1. T [ a.. ] → T [ .. ] a T [ b.. ] → T [ .. ] b • F is a finite set of functions and P is a set of productions T [#] → ǫ A 0 → f [ A 1 , . . ., A k ] with f ∈ F . The idea is that f describes how to compute the yield of A 0 (a dim ( A 0 )-tuple of terminal strings) from the yields of A 1 , . . ., A k . f must be linear in the sense that each of its arguments is used at most once to compute the new string tuple. Parsing beyond CFG 17 Introduction Parsing beyond CFG 19 Introduction Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Polynomial extensions of CFG (5) Polynomial extensions of CFG (7) Linear Context-free rewriting systems (LCFRS), Weir (1988): MCFG example: Grammars that have an underlying context-free structure. An S → f [ A ], A → g [ A ], A → h [ ]. LCFRS consists of h [ ] = ( ab, cd ), g [( x 1 , x 2 )] = ( ax 1 b, cx 2 d ), f [( x 1 , x 2 )] = ( x 1 x 2 ). • a (generalized) context-free grammar that generates a set of Language: { a n b n c n d n | n ≥ 1 } . terms, • a yield function that specifies the structures corresponding to LCFRS and MCFG are weakly equivalent. these terms, and • a function specifying the strings yielded by these structures. LCFRS is more powerful than TAG and LIG. Parsing beyond CFG 18 Introduction Parsing beyond CFG 20 Introduction

Parsing beyond context-free grammar: 1. N and T are disjoint - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 CFG and natural languages (1) Definition of a CFG: A context-free grammar (CFG) is a tuple G = N, T, P, S such that Parsing beyond context-free grammar: 1. N and T are disjoint

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

General Context-Free Grammar Parsing: Application of grammar rewrite rules A phrase structure

General Context-Free Grammar Parsing Application of grammar rewrite rules A phrase structure

Working Together What does his future hold? Carres Grammar School Carres Grammar School

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

1 Determinism and Parsing The parsing problem is, given a string w and a context-free grammar G ,

Parsing beyond context-free grammar: necessarily adjacent. Range Concatenation Grammar Parsing

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Parsing beyond context-free grammar: S ( 0 , n ) for any w T .

Objectives LL Parsing The topic for this lecture is a kind of grammar that works well with

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 26 September 2017 Christophe

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 24 September 2019 Christophe

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Fieldwork with teams: practical problems and proposed solutions Advancing the Science of

Simulating Games on Networks with R Application to Coordination in Dynamic Social Network Under

Social Network Analysis and Qualitative Research Dr Mary Lam Senior Lecturer Discipline Lead

Back to School: How to Market Your School Meals Program for a Successful Year August 11, 2016

Lessons learned Techniques Limitations of formal specifications Cost of technical staff

5 th Meeting of the International Comparison Program (ICP) Governing Board December 13, 2019

Elio Shijaku, Mar/n Larraza-Kintana, Ainhoa Urtasun-Alonso

Over the Internet Highlights - Sockets and packets and ports, oh my! Packet When data travels