Efficient parsing with a large-scale unification-based grammar - PowerPoint PPT Presentation

0. Efficient parsing with a large-scale unification-based grammar Lessons from a multi-year, multi-team endeavour Liviu Ciortuz Department of Computer Science University of Iasi, Romania “ALEAR” Workshop, FP7 E.U. Project Humbold Universit¨ at, Berlin, Germany November 2008

1. PLAN • Fore-ground: LinGO, the large-scale HPSG for English Key efficciency issues in parsing with large-scale unification grammars • Back-ground: Unification-based grammars in the small OSF- and OSF-theory unification FS expansion Compilation of OSF- and OSF-theory unification LIGHT: The language and the system Two classes of feature paths: QC and GR

2. 1. Fore-ground: Based on “Collaborative Language Engineering” St. Oepen, D. Flickiger J. Tsujii, H. Uszko- reit (eds.), Center for Studies of Language and Information, Stanford, 2002 L. Ciortuz. “ LIGHT – a constraint language and compiler system for typed-unification grammars.” In LNAI vol. 2479, M. Jarke, J. K¨ ohler, G. Lakemeyer (eds.), Springer- Verlag, 2002, pp. 3–17. L. Ciortuz. On two classes of feature paths in large-scale unification grammars. In New Developments in Parsing Technolo- gies , Hurry Bunt, Giorgio Satta, John Car- roll (eds.), Kluwer Academic Publishers, 2004, pp. 203–227.

3. 1.1. LinGO – the English Resource Grammar EUBP version, www.delph-in.net Short description: from “Efficiency in Unification-Based Parsing”, Natural Language Engineering, special issue, 6(1), 2000 • Support theory: HPSG — Head-driven Phrase Structure Grammar [Pollard and Sag, 1987, 1994] • Size: un-expanded: 2.47MB, expanded: 40.34MB; 15059 types, 62 rules, 6897 lexical extries  TDL / PAGE, [Kiefer, 1994], DFKI   Type Description Language    • Developed within: LKB, [Copestake, 1999], CSLI Stanford    Linguistic Knowledge Base   Applications: machine translation of spoken and edited language, email auto response, consumer opinion tracking, question answering

4. Systems running LinGO ERG DFKI Saarbruecken TDL / PAGE Parsing (Control) Stanford Univ. LKB interpreter DFKI Saarbruecken PET FS Unifier Haiffa Univ. [AMALIA] (Logic) ALE Tokyo Univ. LiLFeS compiler / AM DFKI Saarbruecken OSF LIGHT

5. Some comparisons on performances in processing LinGO reported by [Oepen, Callmeier, 2000] version year test suite av. parsing time (sec.) space (Kb) ‘tsnlp’ 3.69 19016 TDL / PAGE 1996 ‘aged’ 2.16 79093 ‘tsnlp’ 0.03 333 PET 2000 ‘aged’ 0.14 1435

6. Performances of LIGHT w.r.t. other systems processing LinGO system optimization average parsing time on CSLI test-suite (sec./sentence) LIGHT quick-check 0.04 PET quick-check 0.04 LiLFeS CFG filter 0.06 LIGHT without quick-check 0.07 PET without quick-check 0.11

7. 1.2 Key efficiency issues in parsing with large-scale (LinGO-like) unification-based grammars (I) • choosing the right logical framework, and making your grammar a logical, declarative grammar • grammar expansion: full vs. partial expansion • sort lattice encoding • FS unification: compilation • FS sharing • lexicon pre-compilation

8. Key efficiency issues in parsing with large-scale (LinGO-like) unification-based grammars (II) • exploring grammar particularities: quick check (QC) pre-unification filtering (generalised) grammar reduction (GR) • two-step parsing ◦ hyper-active parsing ◦ ambiguity packing (based on FS subsumption) ◦ grammar approximation: CFGs

9. 2. Back-ground: PLAN 2.1 Unification-based grammars in the small 2.2 The Logics of feature structures 2.2.1 OSF notions 2.2.2 OSF- and OSF-theory unification 2.2.3 The osf unify function 2.2.4 The type-consistent OSF unifier 2.2.5 Feature Structure expansion 2.3 Compiled OSF-unification 2.4 Compiled OSF-theory unification 2.5 LIGHT : the language and the system 2.6 Two classes of feature paths in unification grammars: quick ckeck (QC) paths, and generalised reduction (GR) paths

10. 2.1 Unification-based grammars in the small Two sample feature structures OSF notation vp satisfy_HPSG_principles [ ARGS < verb [ CAT #1, [ HEAD #1, SUBCAT #2, OBJECT #3:np, HEAD top SUBJECT #2:sign ], [ CAT #1, #3 >, SUBCAT #3|#2 ], HEAD #1, COMP top SUBJECT #2 ] [ CAT #3, SUBCAT nil ] ]

11. HPSG principles as feature constraints • head principle: satisfy HPSG principles [ head.cat = cat ] • saturation principle: satisfy HPSG principles [ comp.subcat = nil ] • subcategorization principle: satisfy HPSG principles [ head.subcat = comp.cat | subcat ]

12. A sample top sort hierarchy start string phrase_or_word categ diff_list list categ_list cons det noun adjective verb phrase categ_cons nil word satisfy_HPSG_principles det_le noun_le adjective_le verb_le pnoun_le lh_phrase rh_phrase iverb_le tverb_le cnoun_le is nice kisses the mary girl thinkslaughs embarrasses embarrassed john meets pretty met kissed

13. An expaned feature structure... rewritten as a rule lh_phrase lh_phrase [ PHON list, [ PHON list, CAT #1:categ, CAT #1:categ, SUBCAT #2:categ_list, SUBCAT #2:categ_list, HEAD #4, HEAD #4:phrase_or_word COMP #5 ] [ PHON list, <- CAT #1, SUBCAT #3|#2 ], #4:phrase_or_word [ PHON list, COMP #5:phrase_or_word CAT #1, [ PHON list, SUBCAT #3|#2 ], CAT #3, #5:phrase_or_word SUBCAT nil ], ARGS <#4, #5> ] [ PHON list, CAT #3, SUBCAT nil ].

14. Tree representation lh_phrase of a feature structure PHON COMP CAT HEAD ARGS SUBCAT diff_list #1 #4 list #2 #5 FIRST REST #4 list phrase_or_word REST FIRST PHON SUBCAT CAT #1 diff_list nil list #5 categ phrase_or_word REST FIRST PHON SUBCAT CAT #3 #2 diff_list #3 nil

A simple typed-unification HPSG-like grammar types: start[ SUBCAT nil ] cons [ FIRST top, REST list ] diff_list [ FIRST_LIST list, program: // rules REST_LIST list ] categ_cons lh_phrase [ FIRST categ, [ HEAD #1, REST categ_list ] COMP #2, phrase_or_word ARGS <#1,#2> ] [ PHON list, rh_phrase CAT categ, [ HEAD #1, SUBCAT categ_list ] COMP #2, phrase ARGS <#2,#1> ] [ HEAD #1:phrase_or_word, COMP #2:phrase_or_word, query: // lexical entries ARGS cons ] satisfy_HPSG_principles the[ PHON <"the"> ] [ CAT #1, girl[ PHON <"girl"> ] SUBCAT #2, john[ PHON <"john"> ] HEAD top mary[ PHON <"mary"> ] [ CAT #1, nice[ PHON <"nice"> ] SUBCAT #3|#2 ], embarrassed[ PHON <"embarrassed"> ] COMP top pretty[ PHON <"pretty"> ] [ CAT #3, met[ PHON <"met"> ] SUBCAT nil ] ] kissed[ PHON <"kissed"> ] det_le is[ PHON <"is">, [ CAT det, CAT verb, SUBCAT nil ] SUBCAT <adjective, noun> ] noun_le laughs[ PHON <"laughs"> ] [ CAT noun ] kisses[ PHON <"kisses"> ] pnoun_le thinks[ PHON <"thinks">, [ SUBCAT nil ] CAT verb, cnoun_le SUBCAT <verb, noun> ] [ SUBCAT <det> ] meets[ PHON <"meets"> ] adjective_le embarrasses[ PHON <"embarrasses"> ] [ CAT adjective, SUBCAT nil ] iverb_le [ CAT verb, SUBCAT <noun> ] tverb_le [ CAT verb, SUBCAT <noun, noun> ]

A simple typed-unification grammar sorts: sign:top. rule:sign. np:rule. vp:rule. query: // lexical entries s:rule. lex_entry:sign. the det:lex_entry. [ HEAD top noun:lex_entry. [ TRANS top verb:lex_entry. [ DETNESS + ] ], the:det. PHON < "the" > ] a:det. a cat:noun. [ HEAD top mouse:noun. [ TRANS top catches:verb. [ DETNESS - ] ], PHON < "a" > ] types: cat [ HEAD top 3sing [ AGR 3sing, [ NR sing, TRANS top PERS third ] [ PRED cat ] ], PHON < "cat" > ] program: // rules mouse [ HEAD top np [ AGR 3sing, [ ARGS < det TRANS top [ HEAD top [ PRED mouse ] ], [ TRANS #1 ] ], PHON < "mouse" > ] noun catches [ HEAD #2:top [ HEAD top [ TRANS #1 ], [ AGR #2:3sing, KEY-ARG + ] >, TENSE present, HEAD #2 ] TRANS top vp [ ARG1 #3, [ ARGS < verb ARG2 #1, [ HEAD #1, PRED catches ] ], OBJECT #3:np, OBJECT sign SUBJECT #2:np, [ HEAD top KEY-ARG + ], [ TRANS #1 ] ], #3 >, PHON < "catches" >, HEAD #1, SUBJECT sign SUBJECT #2 ] [ HEAD top s [ AGR #2, [ ARGS < #2:np, TRANS #3 ] ] ] vp [ HEAD #1, SUBJECT #2, KEY-ARG + ] >, HEAD #1 ] The context-free backbone of the above grammar np → det ∗ noun det ⇒ the | a vp → ∗ verb np noun ⇒ cat | mouse s → np ∗ vp verb ⇒ catches

17. Parsing The cat catches a mouse 12 DC 7 11 6 KC 10 RC 5 6 9 DC DC 4 1 5 8 7 0 3 KC 2 KC KC 4 3 1 0 2 the cat catches a mouse 0 1 2 3 4 5

18. syn. rule / lex. categ. start - end env s → .np vp. 0 − 5 12 7 s → np .vp. 2 − 5 11 6 10 vp → .verb np. 2 − 5 5 9 np → .det noun. 3 − 5 4 8 np → det .noun. 4 − 5 3 The final content of the 7 vp → .verb. np 2 − 3 2 chart when parsing np → .det noun. 0 − 2 6 1 The cat catches a mouse np → det .noun. 1 − 2 5 0 det ⇒ the 0 − 1 4 3 noun ⇒ cat 1 − 2 2 verb ⇒ catches 2 − 3 1 det ⇒ a 3 − 4 noun ⇒ mouse 4 − 5 0

Efficient parsing with a large-scale unification-based grammar - PowerPoint PPT Presentation

0. Efficient parsing with a large-scale unification-based grammar Lessons from a multi-year, multi-team endeavour Liviu Ciortuz Department of Computer Science University of Iasi, Romania ALEAR Workshop, FP7 E.U. Project Humbold

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

unification 2016 unification strategic roadmap succession unification strategic roadmap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Parsing with unification Frederik Fouvry Department of Computational Linguistics and Phonetics

Unification in the Description Logic EL EL - unification Minimal unifiers Franz Baader and

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Unification Parsing Typed Feature Structures demo: agree grammar engineering Ling 571: Deep

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

Introduction to Unification Theory Syntactic Unification Temur Kutsia RISC, Johannes Kepler

Introduction to Unification Theory Higher-Order Unification Temur Kutsia RISC, Johannes Kepler

Unification on Subvarieties of Introduction Algebraic Unification Pseudocomplemented lattices

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Chapter 5 WIRELESS INTERNET: IEEE 802.11B Abstract This tutorial article describes the IEEE

Miscellaneous SIENA topics Christian Steglich Behavioral and Social Sciences University of

Variabilit et figement dexpressions polylexicales: annotation multilingue, codage lexical et

Factivity and Presupposition in Dependent Type Semantics Koji Mineshima Ochanomizu University

Leveraging Mobile Interaction with Sensor-Driven and Multimodal User Interfaces Andreas

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Preservation of uniform prox-regularity of sets and application to constrained optimization

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Efficient parsing with a large-scale unification-based grammar - PowerPoint PPT Presentation

0. Efficient parsing with a large-scale unification-based grammar Lessons from a multi-year, multi-team endeavour Liviu Ciortuz Department of Computer Science University of Iasi, Romania ALEAR Workshop, FP7 E.U. Project Humbold

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

unification 2016 unification strategic roadmap succession unification strategic roadmap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Parsing with unification Frederik Fouvry Department of Computational Linguistics and Phonetics

Unification in the Description Logic EL EL - unification Minimal unifiers Franz Baader and

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Unification Parsing Typed Feature Structures demo: agree grammar engineering Ling 571: Deep

Projective unification in modal logic II Projective unification in modal logic II Piotr Wojtylak

Introduction to Unification Theory Syntactic Unification Temur Kutsia RISC, Johannes Kepler

Introduction to Unification Theory Higher-Order Unification Temur Kutsia RISC, Johannes Kepler

Unification on Subvarieties of Introduction Algebraic Unification Pseudocomplemented lattices

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Chapter 5 WIRELESS INTERNET: IEEE 802.11B Abstract This tutorial article describes the IEEE

Miscellaneous SIENA topics Christian Steglich Behavioral and Social Sciences University of

Variabilit et figement dexpressions polylexicales: annotation multilingue, codage lexical et

Factivity and Presupposition in Dependent Type Semantics Koji Mineshima Ochanomizu University

Leveraging Mobile Interaction with Sensor-Driven and Multimodal User Interfaces Andreas

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Preservation of uniform prox-regularity of sets and application to constrained optimization

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP