Natural Language Parsing Techonlogy Foundations of Language Science - PowerPoint PPT Presentation

Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015) Bernd Kiefer Language Technology Lab, DFKI GmbH Department of Computational Linguistics Saarland University November 2014 1 Natural Language Parsing Technology

Outline Overview Basic Parsing Algorithms Parsing Strategies CYK Algorithm Earley’s Algorithm Parsing with Probabilistic Context-Free Grammar PCFG Inside-Outside Algorithm Recent Advances in Parsing Technology 2 Natural Language Parsing Technology

Language & Grammar q Language q Structural q Productive q Ambiguous, yet efficient in human-human communication q Grammar q Generalization of regularities in language structures q Morphology & syntax, often complemented by phonetics, phonology, semantics, and pragmatics 4 Natural Language Parsing Technology

Ambiguity q Human languages are ambiguous on almost every layer q Grammar frameworks are designed to represent necessary ambiguities, and eliminate unnecessary ones q Parsing models are responsible for retrieving valid analyses according to the grammar 5 Natural Language Parsing Technology

Syntactic Parser as NLP Component PoS Tagging Chunking Morph. Analysis NER Syntactic Parsing Semantic Analysis . . . 6 Natural Language Parsing Technology

Trees (or not) S D E   PHON | ORTH " GAVE " NP VP          2 3  Sue V NP NP  2 3   HEAD VERB         6 2 3 7  D E  6 7  gave Paul Det N  6 NP 1 7  SUBJ  6 7   CAT  6 7  6 7   VAL 6 7  6 7 6 7   an A N D E  6 7  4 5  4 NP 2 , NP 3 5  COMPS  6 7     6 7    old penny  SYNSEM | LOC  6 7    6 7    8 9  6 2 3 7    ARG 1 1  6 7   > >   > >  6 7 > >    < 6 7 =  6 7 ARG 2 2  CONT | RELS   6 6 7 7    DOBJ  6 4 5 7    > ARG 3 3 >  4 5  > >    > >  give_rel : ;   DET   SBJ IOBJ ADJ gave penny Sue Paul an old 7 Natural Language Parsing Technology

Chomsky Hierarchy q Type 0 (unrestricted rewriting system) ↵ ! � ↵ , � 2 ( V N [ V T ) ∗ q Type 1 (context sensitive grammars) � A ! ! ��! A 2 V N , � , � , ! 2 ( V N [ V T ) ∗ q Type 2 (context free grammars) A ! � A 2 V N , � 2 ( V N [ V T ) ∗ q Type 3 (regular grammars) A ! xB _ A ! x A , B 2 V N , x 2 V T 8 Natural Language Parsing Technology

Context-Free Grammar A CFG is a quadruple: h V T , V N , P , S i q V T : terminal symbols q V N : non-terminal symbols q P : context-free productions A 2 V N , � 2 ( V N [ V T ) ∗ A ! � q S : start symbol 9 Natural Language Parsing Technology

Context-Free Phrase Structure Grammar q S ! NP VP q N ! dog | cat q NP ! Det N q Det ! the | a q N ! Adj N q V ! chases | sleeps q VP ! V q Adj ! gray | lazy q VP ! V NP q Adv ! fiercely q VP ! Adv VP 10 Natural Language Parsing Technology

CFG Derivation q If � = � A � , ! = �↵� and A ! ↵ 2 P then ! follows � , � ) ! q If a sequence of strings � 1 , � 2 , . . . , � m where for all i (1  i  m � 1), � i ) � i + 1 then � 1 , � 2 , . . . , � m is a derivation from � 1 to � m q “Derivable” relation: transitive, reflexive ∗ ) � m � 1 11 Natural Language Parsing Technology

Parsing Strategies q Top-down: start from the start symbol, and expand the tree with grammar rules (e.g. replace LHS symbol with RHS sequences of CFG productions) q Bottom-up: start from the input sequence, and apply grammar rules to build trees upwards (e.g. reducing RHS sequence into LHS symbols) 13 Natural Language Parsing Technology

Top-Down Parsing q Goal-directed search 1. S ! NP VP q Waste time on trees that do 2. NP ! NP PP not match input sentence 3. . . . q Pure top-down (left-first) S approach cannot parse NP VP (left-)recursion grammars NP PP NP PP NP PP . . . 14 Natural Language Parsing Technology

Bottom-Up Parsing q Use the input to guide the 1. A ! B | a search (data-driven) 2. B ! A q Waste time on trees that don’t 3. . . . result in S . . . q Recursive unary rules still B create an infinite parse forest A for a finite length sentence B A a 15 Natural Language Parsing Technology

Problems q Left-recursion NP ! NP PP q Ambiguity q Repeated parsing of subtrees 16 Natural Language Parsing Technology

Dynamic Programming (DP) q Divisibility: the optimal solution of a sub problem is part of the optimal solution of the whole problem q Memoization: solve small problems only once and remember the answers Example Calculating Fibonacci numbers: F n = F n − 1 + F n − 2 ( F 0 = 0 , F 1 = 1 ) Pascal Triangle (Binomial Coefficients): ✓ n + 1 ◆ ✓ n ◆ ✓ n ◆ = + k + 1 k k + 1 17 Natural Language Parsing Technology

CYK Algorithm q Cocke-Younger-Kasami, also known as CKY algorithm q Essentially a bottom-up chart parsing algorithm using dynamic programming q CFG is in Chomsky Normal Form (CNF) q A ! BC q A ! a q S ! ✏ q A , B , C 2 V N , a 2 V T , B , C 6 = S q Fill in a two-dimension array: C [ i ][ j ] contains all the possible syntactic interpretations of the substring w i + 1 . . . w j q Complexity O ( n 3 ) 18 Natural Language Parsing Technology

CYK Algorithm 0  i < j  n do 1: for all i , j C [ i ][ j ] ( ; 2: 3: end for 4: for all A ! w i 2 P do C [ i � 1 ][ i ] ( { A } [ C [ i � 1 ][ i ] 5: 6: end for 7: for s = h 2 . . . n i do 8: for all A ! B C 2 P , i , k : 0  i < k < i + s do 9: if B 2 C [ i ][ k ] ^ C 2 C [ k ][ i + s ] then 10: C [ i ][ i + s ] ( { A } [ C [ i ][ i + s ] 11: end if 12: end for 13: end for 19 Natural Language Parsing Technology

Natural Language Parsing Techonlogy Foundations of Language Science - PowerPoint PPT Presentation

Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015) Bernd Kiefer Language Technology Lab, DFKI GmbH Department of Computational Linguistics Saarland University November 2014 1 Natural Language

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Empirical Methods in Natural Language Processing Lecture 10 Parsing (II): Probabilistic parsing

Empirical Methods in Natural Language Processing Lecture 10 Parsing (II): Probabilistic parsing

Natural Language Processing: Natural Language Processing: Introduction to Syntactic Parsing

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing

Parsing Parsing involves: determining if a string belongs to a language, and

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

CS453 Intro and PA1 1 Augmenting the grammar with End of File Predictive Parsing Predictive

Syntax Analyzer Parser ALSU Textbook Chapter 4.14.7 Tsan-sheng Hsu

Parsing, and Context-Free Grammars Michael Collins, Columbia University Overview An

Optimization Remarks Update Francis Visoiu Mistrih Optimization Remarks opt-viewer.py

Introduction to Parsing Detmar Meurers: Intro to Computational Linguistics I OSU, LING 684.01,

Computational Linguistics: Dynamic and Statistical Parsing Raffaella Bernardi CIMeC, University

Development of an OCL-Parser for UML-Extensions Closure of a Diploma Thesis Fadi Chabarek