Statistical Parsing Paper presentation: natural language parsing. - PowerPoint PPT Presentation

Statistical Parsing Paper presentation: natural language parsing”. In: Computational linguistics Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft December 2016 Michael Collins (2003). “Head-driven statistical models for 29.4, pp. 589–637. doi: 10.1162/089120103322753356

Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, others in the literature that are easy to estimate 1 / 20 sparse data) but parameter estimation becomes diffjcult (many rules, What is the paper about? • A head-driven, lexicalized PCFG • PCFGs cannot capture many linguistic phenomena • Lexicalizing PCFGs allows capturing lexical dependencies, • The main idea is factoring the rule probabilities, into parts • The paper does that in a linguistically-motivated way • The resulting parser works better than PCFGs, and some

Introduction/Motivation A summary of the paper Three models Model 1 parts of its LHS distance to their head Model 2 Add complement-adjunct distinction (use subcategorization frames) Model 3 Add conditions for wh-movement Ç. Çöltekin, SfS / University of Tübingen Collins parser 2 / 20 • Lexicalize the PCFG • Condition the probability of a rule based on • Condition probabilities of non-heads on

Introduction/Motivation A summary of the paper An overview of the paper 2. Background: PCFGs, lexicalization, estimation (MLE) 3. Model defjnitions 4. Special cases: mainly related to treebank format 5. Practical issues: parameter estimation, unknown words, parsing algorithm 6. Results 7. Discussion 8. Related work 9. Conclusions Ç. Çöltekin, SfS / University of Tübingen Collins parser 3 / 20

Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, 4 / 20 – all derivations terminate in a fjnite number of steps – if all rule probabilities with the same LHS sum to 1 Probabilistic context-free grammars • A CFG augmented with probabilities for each rule • Assigns a proper probability distribution to parse trees • The main problem is estimating probabilities associated with each rule X → β • Maximum-likelihood estimate: count ( X → β count ( X ) • With rule probabilities, parsing is fjnding the best tree P ( T, S ) T best = arg max P ( T | S ) = arg max = arg max P ( T, S ) P ( S ) T T T

Introduction/Motivation A summary of the paper Probabilistic context-free grammars (2) of rules used in the derivation Ç. Çöltekin, SfS / University of Tübingen Collins parser 5 / 20 • In PCFGs derivations are assumed to be independent • The probability of a tree is the product of the probabilities • PCFGs cannot capture lexical or structural dependencies

Introduction/Motivation A summary of the paper Lexicalizing PCFGs the lexical word and its POS tag dependencies are automatically annotated (based on heuristics) Ç. Çöltekin, SfS / University of Tübingen Collins parser 6 / 20 • Replace non-terminal X with X ( h ) , where h is a tuple with • Now the grammar can capture (head-driven) lexical • But number of nonterminals grow by | V | × | T | • Estimation becomes diffjcult (many rules, data sparsity) • Note: Penn Treebank (PTB) does not annotate heads, they

Introduction/Motivation NPN(Lotus,NNP) Collins parser SfS / University of Tübingen Ç. Çöltekin, Last JJ(last,JJ) VBD(bought,VBD) NP(Lotus,NNP) VP(bought,VBD) NP(week,NN) NP(IBM,NNP) VP(bought,VBD) S(bought,VBD) S(bought,VBD) TOP Example rules: A summary of the paper Lotus NP(Lotus,NNP) bought Example lexicalized derivation TOP S(bought,VBD) NP(week,NN) JJ(last,JJ) Last NN(week,NN) week NP(IBM,NNP) NNP(IBM,NNP) IBM VP(bought,VBD) VBD(bought,VBD) 7 / 20 → → → →

Introduction/Motivation A summary of the paper Model 1: the generative story We take each lexicalized CF rule is formed as 2. Generate the left modifjer(s) independently, each with 3. Generate the left modifjer(s) independently, each with generation Ç. Çöltekin, SfS / University of Tübingen Collins parser 8 / 20 X ( h ) → ⟨ left-dependents ⟩ H ( h ) ⟨ right-dependents ⟩ 1. Generate the head with probability P h ( H | X, h ) probability P l ( L i ( l i ) | X, h, H ) probability P r ( R i ( r i ) | X, h, H ) • A special left/right dependent label ‘STOP’ terminates the

Introduction/Motivation A summary of the paper Model 1: distance using – Is the intervening string length 0? (adjacency) – Does the intervening string contain a verb? (clausal modifjers) Ç. Çöltekin, SfS / University of Tübingen Collins parser 9 / 20 • Model 1, also conditions the left and right dependents on their distance from the head. For example P l is estimated P l ( L i ( l i ) | X, h, H, distance ( i − 1 )) • Two distance measures:

Introduction/Motivation A summary of the paper Model 2: the generative story Main idea: condition the right/left modifjers on right complements of the head. 2. Choose left and aright subcategorization frames, with 3. Generate the left/right modifjer(s) independently, each Ç. Çöltekin, SfS / University of Tübingen Collins parser 10 / 20 subcategorization frames ( LC and RC ), which are the left and 1. Generate the head with probability P h ( H | X, h ) probabilities P lc ( LC | X, H, h ) and P rc ( RC | X, H, h ) with probability P l ( L i ( l i ) | X, h, H, LC ) and P r ( R i ( R i ) | X, h, H, RC )

Introduction/Motivation IBM Collins parser SfS / University of Tübingen Ç. Çöltekin, last week NP(week) TRACE bought VBD VP(bought)(+gap) NP-C(IBM) A summary of the paper S(bought)(+gap) that WDT WHNP(that) SBAR(that)(+gap) The store NP(store) NP(store) The idea: mark and propagate ‘gaps’. Model 3: traces and wh-movement 11 / 20

Introduction/Motivation A summary of the paper Special cases comma and colon, treat the rest as coordination preprocessing Ç. Çöltekin, SfS / University of Tübingen Collins parser 12 / 20 • Non-recursive (base) NPs are marked as NPB • Coordination: allow only a single phrase after a CC • Punctuation: remove all except non-initial/non-fjnal • Empty subjects: introduce a dummy empty subject during

Introduction/Motivation A summary of the paper Collins parser SfS / University of Tübingen Ç. Çöltekin, is the relevant number of types. 13 / 20 where, smoothing in the paper for details), using a version of Witten-Bell Parameters are estimated by three levels of backofg (see Table 1 Parameter estimation e = λ 1 e 1 + ( 1 − λ 1 )( λ 2 e 2 + ( 1 − λ 2 ) e 3 ) f 1 λ 1 = f 1 + 5u 1 f 1 is the relevant number of tokens (count in denominator), u 1 Other λ s are calculated similarly.

Introduction/Motivation A summary of the paper Unknown words and parsing algorithm were replaced with UNKNOWN assigned using using the tagger by Ratnaparkhi (1996) Ç. Çöltekin, SfS / University of Tübingen Collins parser 14 / 20 • During training, all words with frequencies less than 6 • During testing, the POS tags for unknown words were • The parsing algorithm is a version of CKY parser with O ( n 5 0 complexity

Introduction/Motivation A summary of the paper Results earlier/state-of-the-art models Ç. Çöltekin, SfS / University of Tübingen Collins parser 15 / 20 • Model 2 performs better than Model 1 • Model 2 also performs better/similar in comparison to • Details: Table 2 on page 608 on paper.

Introduction/Motivation A summary of the paper More on results attachment problems. page 610) Ç. Çöltekin, SfS / University of Tübingen Collins parser 16 / 20 • Phrase-label precision/recall results do not show • Extracted dependencies are more useful (Figure 12 on • The parser recovers ‘core’ dependencies successfully, • Main problems are with adjuncts and coordination

Introduction/Motivation preferences: structural preferences seem to be necessary. Collins parser SfS / University of Tübingen Ç. Çöltekin, Flip said that Squeaky will do the work yesterday John was believed to have been shot by Bill For example: right-branching A summary of the paper – the probability of attaching ‘STOP’ increases – the probability of attaching a new modifjer decreases for Model 1 More on distance measure 17 / 20 • Distance measure seem to help fjnding subcategorization • As the distance from the head increases, • Distance measure is also useful for preferring • Structural (e.g., close attachment) vs. lexical/semantic

Introduction/Motivation A summary of the paper Choice of representation lexical) preferences. important – fmat trees – difgerent constituent labels at difgerent levels Ç. Çöltekin, SfS / University of Tübingen Collins parser 18 / 20 • The parser prefers PTB-style (fmat) trees • For binary representations, do pre-/post-processing • This would have an efgect on capturing structural (but not • Preprocessing steps, e.g., NPB labeling, seem to be • In general, the parser works best with

Statistical Parsing Paper presentation: natural language parsing. - PowerPoint PPT Presentation

Statistical Parsing Paper presentation: natural language parsing. In: Computational linguistics ar ltekin University of Tbingen Seminar fr Sprachwissenschaft December 2016 Michael Collins (2003). Head-driven statistical

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, June 1996, pp. 79-86.

Probabilistic Frame-Semantic Parsing Noah A. Smith Dipanjan Das Nathan Schneider Desai Chen

Cadastral GIS An example from Bolivia Pretoria November 7, 2002 Christiaan Lemmen Foto: Edgar

Overview Last Time Mid-Way Evaluation Forward Algorithm Quiz & Bonus Points

Maximum Entropy Tagging (for the Maximum Entropy method itself, refer to NPFL067 added slides

QuantLib Erlk onige Peter Caspers IKB December 4th 2014 Peter Caspers (IKB) QuantLib Erlk

PROBABILISTIC ASPECTS OF ARBITRAGE IOANNIS KARATZAS INTECH Investment Management LLC, Princeton,

Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen (joint work with Robert J.

Statistical Parsing Paper presentation: natural language parsing. - PowerPoint PPT Presentation

Statistical Parsing Paper presentation: natural language parsing. In: Computational linguistics ar ltekin University of Tbingen Seminar fr Sprachwissenschaft December 2016 Michael Collins (2003). Head-driven statistical

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South

Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, June 1996, pp. 79-86.

Probabilistic Frame-Semantic Parsing Noah A. Smith Dipanjan Das Nathan Schneider Desai Chen

Cadastral GIS An example from Bolivia Pretoria November 7, 2002 Christiaan Lemmen Foto: Edgar

Overview Last Time Mid-Way Evaluation Forward Algorithm Quiz &amp; Bonus Points

Maximum Entropy Tagging (for the Maximum Entropy method itself, refer to NPFL067 added slides

QuantLib Erlk onige Peter Caspers IKB December 4th 2014 Peter Caspers (IKB) QuantLib Erlk

PROBABILISTIC ASPECTS OF ARBITRAGE IOANNIS KARATZAS INTECH Investment Management LLC, Princeton,

Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen (joint work with Robert J.

Overview Last Time Mid-Way Evaluation Forward Algorithm Quiz & Bonus Points