Some Fine Points of Hybrid Natural Language Processing Peter - PowerPoint PPT Presentation

LREC 2008 Marrakech, Morocco 28 th May 2008 Some Fine Points of Hybrid Natural Language Processing Peter Adolphs, DFKI GmbH, Language Technology Lab, Berlin Stephan Oepen, Universitetet i Oslo, Department of Informatics Ulrich Callmeier, acrolinx GmbH, Berlin Berthold Crysmann, Universität Bonn Dan Flickinger, Stanford University, CSLI Bernd Kiefer, DFKI GmbH, Language Technology Lab, Saarbrücken

Motivation ● hybrid processing, integrating annotations of ‘shallow’ tools into HPSG parsing ● different tools make different assumptions ● example: PTB-style tokenizers for English – e.g.: Don't you! → <do, n't, you, !> – contracted verb forms are split – punctuation is split off the preceding word form ● we need to adapt annotations of different tools to the requirements of our grammar ● goal: a declarative, expressive, scalable device

Token Feature Structures ● feature structures for describing tokens ● different annotations provided as feature structures ● lattice of structured categories (token feature structures) as input to the parser

Generalized Chart ● tools may assume different tokenization (paradigm case: input from speech recognizers) ● chart: dag whose vertices are abstract objects rather than indexed token boundary positions

Chart Mapping ● chart mapping: non-monotonic rewrite mechanism on feature structure chart edges ● general format: [ CONTEXT : ] INPUT → OUTPUT ● CONTEXT, INPUT, OUTPUT are sequences of feature structures (each possibly empty) ● resource-sensitive: chart edges that let a rule fire may be removed (namely, all INPUT edges)

Chart Mapping – Example ● example: recombining split contracted forms ● rules extended with regular expression matches ● regex capture groups can be referred to in the output ● rules themselves described as feature structures, thus we can use re-entrancies

Chart Mapping – Examples ● light-weight named entity recognition ● fixing broken tokenization

Previous Architecture (Simplified) ● preprocessing has to provide natural language input the input chart as expected by the grammar Preprocessing ● this has to be ensured by specialized conversion routines without recourse to the grammar Lexical Instantiation ● changes to the grammar have to be reflected in these Syntactic Parsing data adaptation routines SYN ... SEM ...

Proposed Architecture (Simplified) ● proposal: token mapping per- natural language input forms certain preprocessing Preprocessing steps within the grammar ● advantages: – full control for the grammar Token Mapping writer, using the same formalism as for the grammar Lexical Instantiation – makes assumptions by the grammar explicit Syntactic Parsing – removes complexity from preprocessing SYN ... SEM ...

Hybrid Processing ● shaping the search ● constraints on the space of the parser: search space – widening search – hard: categorial space (e.g. unknown conditions for word handling) introduction / removal of chart edges – narrowing search – soft: probabilistic space (e.g. removing / postponing the disambiguation, processing of edges) prioritize parser's tasks on the agenda

Lexical Instantiation ● native and generic lexical entries (les) ● selection of appropriate generic lexical entries originally controlled by the parser (hard-coded) ● strategy: – map from part-of-speech tags to generic les – instantiate generic le for highest ranked pos tag where no native le is available ● disadvantage: – not flexible enough (e.g. no chain of responsibility) – partial lexical coverage: We’ll bus to Paris.

Lexical Instantiation ● proposal: try to instantiate all generic les for all tokens ● token feature structure is unified into a predefined path in the lexical entry ● selection of compatible tokens by constraints on the token feature structure ● example:

Lexical Filtering ● after lexical instantiation, native and generic les may be available in the same chart cell ● we can restrict lexical instantiation by positing constraints on the token feature structures ● but we might also want to prevent some lexical chart edges in certain contexts (set operations) ● proposal: lexical filtering phase ● same formalism as for token mapping: chart mapping rules with empty OUTPUT list

Proposed Architecture ● use feature structures to natural language input describe tokens Preprocessing ● chart mapping: resource- Token Mapping sensitive rewriting of feature structure items Lexical Instantiation ● chart mapping on token fs Lexical Parsing ● generic instantiation driven by Lexical Filtering compatibility with token fs Syntactic Parsing ● lexical filtering with chart mapping SYN ... SEM ...

Applications ● fine grained control over instantiation of generic lexical entries ● mapping external morphological information into the grammar's universe ● chart dependency filter (optimizing parsing performance) ● activate syntactic rules only for certain spans of the input (e.g., in hybrid grammar checking)

Conclusions ● versatile device for many applications ● external information is made accessible to the grammar ● pre-processing can be better controlled with grammar-specific means ● reduces the need for special code inside and outside the parser ● outlook: consilidation of our current parsers and grammars

Thank you!

Acknowledgements ● DELPH-IN community and beyond, especially Nuria Bertomeu, Ann Copestake, Remy Sanouillet, Ulrich Schäfer and Benjamin Waldron for numerous in-depth discussions ● funding: – ProFIT program of the German federal state of Berlin and the EFRE program of the EU (to the DFKI project Checkpoint) – the University of Oslo (through its scientific partnership with CSLI)

Some Fine Points of Hybrid Natural Language Processing Peter - PowerPoint PPT Presentation

LREC 2008 Marrakech, Morocco 28 th May 2008 Some Fine Points of Hybrid Natural Language Processing Peter Adolphs, DFKI GmbH, Language Technology Lab, Berlin Stephan Oepen, Universitetet i Oslo, Department of Informatics Ulrich Callmeier,

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Maintaining Perfect Matchings at Low Cost Jannik Matuschke Ulrike Schmidt-Kraepelin Jos e

Multidisciplinary Computing Research and Education Jim Foley College of Computing Georgia

WIMP Shigeki Matsumoto (Kavli IPMU) Collaborators:

Earnings Conference Call October 21th, 2016 Forward-Looking Statement The information herein

Dampening the Curse of Dimensionality Decomposition Methods for Stochastic Optimization Problems

Contact interactions in string theory and a reformulation of QED James Edwards QFT Seminar

Introduction to Natural Language Processing Steven Bird Ewan Klein Edward Loper University of

An elementary proof of James' characterization of weak compactness (Lecture slides) Conference