Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from - PowerPoint PPT Presentation

SFU NatLangLab CMPT 825: Natural Language Processing Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from Danqi Chen and Karthik Narasimhan (with some content from David Bamman, Chris Manning, Mike Collins, and Graham Neubig)

Project Milestone • Project Milestone due Tuesday 3/31 • PDF (2-4 pages) in the style of a conference (e.g. ACL/EMNLP) submission • https://2020.emnlp.org/files/emnlp2020-templates.zip • Milestone should include: • Title and Abstract - motivate the problem, describe your goals, and highlight your findings • Approach - details on your main approach and baselines. Be specific. Make clear what part is original, what code you are writing yourself, what code you are using • Experiment - describe dataset, evaluation metrics, what experiments you plan to run, any results you have so far. Also provide training details, training times, etc. • Future Work - what is your plan for the rest of the project • Reference - provide references using BibTex • Milestone will be graded based on progress and writing quality

Overview • Constituency structure vs dependency structure • Context-free grammar (CFG) • Probabilistic context-free grammar (PCFG) • The CKY algorithm • Evaluation • Lexicalized PCFGs • Neural methods for constituency parsing

Syntactic structure: constituency and dependency Two views of linguistic structure • Constituency • = phrase structure grammar • = context-free grammars (CFGs) • Dependency

Constituency structure • Phrase structure organizes words into nested constituents • Starting units: words are given a category: part-of-speech tags the, cuddly, cat, by, the, door DT, JJ, NN, IN, DT, NN • Words combine into phrases with categories the cuddly cat, by, the door NP DT JJ NN IN NP DT NN → → • Phrases can combine into bigger phrases recursively the cuddly cat, by the door NP PP IN NP → the cuddly cat by the door NP NP PP →

This Thursday Dependency structure • Dependency structure shows which words depend on (modify or are arguments of) which other words. nmod nsubj dobj case Satellites spot whales from space Satellites spot whales from space ❌

Why do we need sentence structure? • We need to understand sentence structure in order to be able to interpret language correctly • Human communicate complex ideas by composing words together into bigger units • We need to know what is connected to what

Syntactic parsing • Syntactic parsing is the task of recognizing a sentence and assigning a structure to it. Input: Output: Boeing is located in Seattle.

Syntactic parsing • Used as intermediate representation for downstream applications English word order: subject — verb — object subject — object — verb Japanese word order: Syntax based machine translation Image credit: http://vas3k.com/blog/machine_translation/

Syntactic parsing • Used as intermediate representation for downstream applications Relation Extraction Image credit: (Zhang et al, 2018)

Beyond syntactic parsing This file doesn’t care about cleverness, wit or any other kind of intelligent humor. Negative Nested Sentiment Analysis Recursive deep models for semantic compositionality over a sentiment treebank Socher et al, EMNLP 2013

Context-free grammars (CFG) • Widely used formal system for modeling constituency structure in English and other natural languages • A context free grammar where G = ( N , Σ , R , S ) • is a set of non-terminal symbols N • is a set of terminal symbols Σ • is a set of rules of the form for R X → Y 1 Y 2 … Y n , n ≥ 1 X ∈ N , Y i ∈ ( N ∪ Σ ) • is a distinguished start symbol S ∈ N

A Context-Free Grammar for English Grammar Lexicon S:sentence, VP:verb phrase, NP: noun phrase, PP:prepositional phrase, DT:determiner, Vi:intransitive verb, Vt:transitive verb, NN: noun, IN:preposition

(Left-most) Derivations • Given a CFG , a left-most derivation is a sequence of G strings , where s 1 , s 2 , …, s n • s 1 = S • : all possible strings made up of words from s n ∈ Σ * Σ • Each for is derived from by picking the left-most s i i = 2,…, n s i − 1 non-terminal in and replacing it by some where X s i − 1 X → β ∈ R β • : yield of the derivation s n

(Left-most) Derivations • S s 1 = • NP VP s 2 = • DT NN VP s 3 = • the NN VP s 4 = • the man VP s 5 = • the man Vi s 6 = • the man sleeps s 7 = A derivation can be represented as a parse tree! • A string is in the language defined by the CFG if s ∈ Σ * there is at least one derivation whose yield is s • The set of possible derivations may be finite or infinite

Ambiguity • Some strings may have more than one derivations (i.e. more than one parse trees!).

“Classical” NLP Parsing • In fact, sentences can have a very large number of possible parses The board approved [its acquisition] [by Royal Trustco Ltd.] [of Toronto] [for $27 a share] [at its monthly meeting]. ((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd)) 1 n + 1 ( 2 n n ) Catalan number: C n = • It is also difficult to construct a grammar with enough coverage • A less constrained grammar can parse more sentences but result in more parses for even simple sentences • There is no way to choose the right parse!

Statistical parsing • Learning from data : treebanks • Adding probabilities to the rules : probabilistic CFGs (PCFGs) Treebanks : a collection of sentences paired with their parse trees The Penn Treebank Project (Marcus et al, 1993)

Treebanks • Standard setup (WSJ portion of Penn Treebank): • 40,000 sentences for training • 1,700 for development • 2,400 for testing • Why building a treebank instead of a grammar? • Broad coverage • Frequencies and distributional information • A way to evaluate systems

Probabilistic context-free grammars (PCFGs) • A probabilistic context-free grammar (PCFG) consists of: • A context-free grammar: G = ( N , Σ , R , S ) • For each rule , there is a parameter . q ( α → β ) ≥ 0 α → β ∈ R For any , X ∈ N ∑ q ( α → β ) = 1 α → β : α = X

Probabilistic context-free grammars (PCFGs) For any derivation (parse tree) containing rules: , the probability of the parse is: α 1 → β 1 , α 2 → β 2 , …, α l → β l l ∏ q ( α i → β i ) i =1 P ( t ) = q ( S → NP VP ) × q ( NP → DT NN ) × q ( DT → the ) × q ( NN → man ) × q ( VP → Vi ) × q ( Vi → sleeps ) = 1.0 × 0.3 × 1.0 × 0.7 × 0.4 × 1.0 = 0.084 ∑ Why do we want ? q ( α → β ) = 1 α → β : α = X

Deriving a PCFG from a treebank • Training data: a set of parse trees t 1 , t 2 , …, t m • A PCFG : ( N , Σ , S , R , q ) • is the set of all non-terminals seen in the trees N • is the set of all words seen in the trees Σ • is taken to be the start symbol S. S • is taken to be the set of all rules seen in the trees R α → β • The maximum-likelihood parameter estimates are: q ML ( α → β ) = Count ( α → β ) Can add smoothing Count ( α ) If we have seen the rule VP → Vt NP 105 times, and the non-terminal VP 1000 times, q ( VP → Vt NP ) = 0.105

CFG vs PCFG • A CFG tells us whether a sentence is in the language it defines • A PCFG gives us a mechanism for assigning scores (here, probabilities) to di ff erent parses for the same sentence.

Parsing with PCFGs • Given a sentence and a PCFG, how to find the highest scoring s parse tree for ? s argmax t ∈𝒰 ( s ) P ( t ) • The CKY algorithm : applies to a PCFG in Chomsky normal form (CNF) • Chomsky Normal Form (CNF) : all the rules take one of the two following forms: • Binary where X → Y 1 Y 2 X ∈ N , Y 1 ∈ N , Y 2 ∈ N • Unary where X → Y X ∈ N , Y ∈ Σ • Can convert any PCFG into an equivalent grammar in CNF! • However, the trees will look differently • Possible to do “reverse transformation”

Converting PCFGs into a CNF grammar • -ary rules ( ): NP → DT NNP VBG NN n n > 2 • Unary rules: VP → Vi , Vi → sleeps • Eliminate all the unary rules recursively by adding VP → sleeps • We will come back to this later!

The CKY algorithm • Dynamic programming • Given a sentence , denote as the x 1 , x 2 , …, x n π ( i , j , X ) highest score for any parse tree that dominates words and has non-terminal as its root. x i , …, x j X ∈ N • Output: π (1, n , S ) • Initially, for , i = 1,2,…, n π ( i , i , X ) = { q ( X → x i ) if X → x i ∈ R otherwise 0 Book the flight through Houston 0 1 2 3 4 5

The CKY algorithm • For all such that for all , ( i , j ) 1 ≤ i < j ≤ n X ∈ N π ( i , j , X ) = X → YZ ∈ R , i ≤ k < j q ( X → YZ ) × π ( i , k , Y ) × π ( k + 1, j , Z ) max Consider all ways span (i,j) can be split into 2 (k is the split point) Also stores backpointers which allow us to recover the parse tree Cells contain: - Best score for parse of span (i,j) for each non-terminal X - Backpointers

The CKY algorithm Running time? O ( n 3 | R | )

Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from - PowerPoint PPT Presentation

SFU NatLangLab CMPT 825: Natural Language Processing Constituency Parsing Spring 2020 2020-03-24 Adapted from slides from Danqi Chen and Karthik Narasimhan (with some content from David Bamman, Chris Manning, Mike Collins, and Graham Neubig)

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Domain Adaptation for Constituency Parsing Using Partial Annotations Vidur Joshi Matthew Peters

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Span-Based Constituency Parsing with Provably Optimal Dynamic Oracles James Cross and Liang Huang

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 17 th Statutory Constituency Meeting ANNUAL REPORT

Advisory Group 5 Subregional Focal Points, 14 Constituency Focal Points Constituency Groups (1)

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 16 th Statutory Constituency Meeting INTERIM REPORT

Constituency/Stakeholder Travel FY 11 FY 12 Update Update on Constituency Travel Support

Syntax: Conjunction Constituency Tests Recursion, Conjunction, and Auxiliary Verbs

The Constituency of Hyperlinks in a Hypertext Corpus . mitcho (Michael Yoshitaka Erlewine)

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein

Constituency Parsing CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Definitions and Proofs Structural Induction Three approaches to semantics compositional

Compiler Construction Lecture 5: Syntax Analysis I (Introduction) Thomas Noll Lehrstuhl f ur

Introduction to Parsing Ambiguity and Syntax Errors Outline Regular languages revisited

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Sequence Labeling II CMSC 470 Marine Carpuat Recap: We know how to perform POS tagging with

Models and Algorithms Image Parsing Pedro Felzenszwalb and David McAllester Lightest Derivation

Derivation of 1d and 2d GrossPitaevskii equations for strongly confined 3d bosons Lea Bomann

Generation of Verification Conditions Andreas Podelski November 15, 2011 mechanization of