STATISTICAL PARSING 23.04.19 Statistical Natural Language - PowerPoint PPT Presentation

Jurafsky, D. and Martin, J. H. (2009): Speech and Language Processing. An Introduction • to Natural Language Processing, Computational Linguistics and Speech Recognition . Second Edition. Pearson: New Jersey: Chapter 14 Manning, C. D. and Schütze, H. (1999): Foundations of Statistical Natural Language • Processing. MIT Press: Cambridge, Massachusetts. Chapters 11, 12. with further examples by Ray Mooney, UT at Austin • PCFGs, probabilistic CYK, dependency parsing STATISTICAL PARSING 23.04.19 Statistical Natural Language Processing 1

Statistical Parsing 23.04.19 Statistical Natural Language Processing 2

Statistical Parsing • Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to each parse tree. • Provides principled approach to resolving syntactic ambiguity. • Allows supervised learning of parsers from tree-banks of parse trees provided by human linguists. • Also allows unsupervised learning of parsers from unannotated text, but the accuracy of such parsers has been limited. 23.04.19 Statistical Natural Language Processing 3

Probabilistic Context Free Grammar (PCFG) A probabilistic context free grammar PCFG=(W,N,N 1 ,R,P) consists of terminal vocabulary W={w 1 ,…, w V } • set of non-terminals N={N 1 ,., N n } • start symbol N 1 ∈ N • set of rules R:{N i → D j } , where D j is a sequence over W ∪ N • corresponding set of probabilities on rules P such that the sum of • probabilities per LHS is 1 A PCFG is a probabilistic version of a CFG where each production has a probability. • Probabilities of all productions rewriting a given non-terminal must add to 1, defining a • distribution for each non-terminal. String generation is now probabilistic where production probabilities are used to non- • deterministically select a production for rewriting a given non-terminal. 23.04.19 Statistical Natural Language Processing 4

Derivation Probability Assume productions for each node are chosen independently. • Probability of derivation is the product of the probabilities of its productions. • T 1 S 0.1 P(T 1 ) = 0.1 x 0.5 x 0.5 x 0.6 x 0.6 x VP 0.5 0.5 x 0.3 x 1.0 x 0.2 x 0.2 x Verb NP 0.5 x 0.8 0.6 0.5 = 2.16 E-5 Det Nominal book 0.5 0.6 Nominal PP the 1.0 0.3 Prep NP Noun 0.2 0.2 0.5 Proper-Noun flight through 0.8 Houston 23.04.19 Statistical Natural Language Processing 6

Syntactic Disambiguation • Resolve ambiguity by picking most probable parse tree. S T 2 P(T 2 ) = 0.1 x 0.3 x 0.5 x 0.6 x 0.5 x 0.1 0.6 x 0.3 x 1.0 x 0.5 x 0.2 x 0.2 x 0.8 VP 0.3 = 1.296 E-5 VP 0.5 Verb NP 0.6 0.5 PP Det Nominal book 1.0 0.6 0.3 Noun Prep NP the 0.2 0.2 0.5 Proper-Noun flight through 0.8 Houston 23.04.19 Statistical Natural Language Processing 7

Sentence Probability Probability of a sentence is the sum of the probabilities of all of its • derivations T 1 S T 2 S 0.1 0.1 VP VP 0.5 0.3 Verb NP VP 0.6 0.5 0.5 book Det Nominal Verb NP 0.6 0.5 0.6 0.5 PP Det Nominal book the Nominal PP 1.0 1.0 0.6 0.3 0.3 Prep NP Noun Prep NP Noun the 0.2 0.2 0.2 0.2 0.5 0.5 Proper-Noun flight through flight through Proper-Noun 0.8 0.8 Houston Houston P( “book the flight through Houston” ) = P(T 1 )+P(T 2 )=2.16 E-5+1.296 E-5 = 3.456 E-5 23.04.19 Statistical Natural Language Processing 8

Three Tasks for PCFGs observation likelihood: how do we efficiently compute the probability of a • sentence, given a PCFG? most likely derivation: given a PCFG and a sentence, how do we find the • derivation that best explains the sentence? Given a set of sentences and a space of possible PCFGs, how do we find the • PCFG parameters that best explain the observations? This is is called training of the PCFG Sounds familiar? 23.04.19 Statistical Natural Language Processing 9

Probabilistic CKY • An analog to the Viterbi algorithm to efficiently determine the most probable derivation (parse tree) for a sentence. • CKY can be modified for PCFG parsing by including in each cell a probability for each non-terminal. • Cell[i,j] must retain the most probable derivation of each constituent (non-terminal) covering words i +1 through j together with its associated probability. • When transforming the grammar to CNF, must set production probabilities to preserve the probability of derivations. 23.04.19 Statistical Natural Language Processing 10

Probabilistic conversion to CNF Original Grammar Chomsky Normal Form S → NP VP 0.8 S → NP VP 0.8 S → Aux NP VP 0.1 S → X1 VP 0.1 X1 → Aux NP 1.0 S → VP 0.1 S → book | include | prefer 0.01 0.004 0.006 S → Verb NP 0.05 S → VP PP 0.03 NP → Pronoun 0.2 NP → I | he | she | me 0.1 0.02 0.02 0.06 NP → Proper-Noun 0.2 NP → Houston | NWA 0.16 .04 NP → Det Nominal 0.6 NP → Det Nominal 0.6 Nominal → Noun 0.3 Nominal → book | flight | meal | money 0.03 0.15 0.06 0.06 Nominal → Nominal Noun 0.2 Nominal → Nominal Noun 0.2 Nominal → Nominal PP 0.5 Nominal → Nominal PP 0.5 VP → Verb 0.2 VP → book | include | prefer 0.1 0.04 0.06 VP → Verb NP 0.5 VP → Verb NP 0.5 VP → VP PP 0.3 VP → VP PP 0.3 PP → Prep NP 1.0 PP → Prep NP 1.0 23.04.19 11

STATISTICAL PARSING 23.04.19 Statistical Natural Language - PowerPoint PPT Presentation

Jurafsky, D. and Martin, J. H. (2009): Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition . Second Edition. Pearson: New Jersey: Chapter 14 Manning, C. D. and

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

gumtree Nick Hauser NOBUGS 2014 Sept 24-26 Tsukuba Adding spokes Tried not to reinvent the

Feature Consistency in Compile-TimeConfigurable System Software Facing the Linux 10000 Feature

dlvhex Thomas Eiter, Tobias Kaminski, Christoph Redl, Peter Sch uller, Antonius Weinzierl {

CLOSING IN ON THE DR MICHELLE CLUVER ARC FUTURE FELLOW HI-DDEN UNIVERSE mcluver@swin.edu.au

Emergent Phenomena in High-Energy Particle Collisions Peter Skands (Monash University) Image

An Interactive System Level Simulation Environment for Systems-On-Chip Daniel Knorreck,

Advanced SQL 03 Arrays and User-Defined Functions Torsten Grust Universitt Tbingen,

Modeling Heterogeneous Embedded Systems with TTool Daniela Genius, Ludovic Apvrille, Letitia W.