Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 — Lecture 30 Efficient PCFG Inference Instructor: Vlado Keselj Time and date: 09:35–10:25, 27-Mar-2020 Location: On-line Delivery CSCI 4152/6509, Vlado Keselj Lecture 30 1 / 22

Previous Lecture Are NLs context-free? Natural Language Phenomena ◮ agreement, movement, subcategorization Typical phrase structure rules in English: ◮ Sentence, NP, VP, PP, ADJP, ADVP ◮ Additional notes about typical phrase structure rules in English Heads and dependency CSCI 4152/6509, Vlado Keselj Lecture 30 2 / 22

Head-feature Principle Head Feature Principle: It is a principle that a set of characteristic features of a head word are transferred to the containing phrase. Examples of annotating head in a context-free rule: NP → DT NN H or [ NP ] → [ DT ] H [ NN ] HPSG—Head-driven Phrase Structure Grammars CSCI 4152/6509, Vlado Keselj Lecture 30 3 / 22

Dependency Tree dependency grammar example with “That man caught the butterfly with a net.” That a the net man with butterfly caught CSCI 4152/6509, Vlado Keselj Lecture 30 4 / 22

Arguments and Adjuncts There ar two kinds of dependents: arguments, which are required dependents, e.g., 1 We deprived him of food. adjuncts, which are not required; 2 ⋆ they have a “less tight” link to the head, and ⋆ can be moved around more easily Example: We deprived him of food yesterday in the restaurant. CSCI 4152/6509, Vlado Keselj Lecture 30 5 / 22

Efficient Inference in PCFG Model • consider marginalization task: P(sentence) =? • or: P(sentence) = P( w 1 w 2 . . . w n | S ) • One way to compute: � P(sentence) = P( t ) , t ∈ T • Likely inefficient; need a parsing algorithm CSCI 4152/6509, Vlado Keselj Lecture 30 6 / 22

Efficient PCFG Marginalization Idea: adapt CYK algorithm to store marginal probabilities Replace algorithm line: β [ i, j, k ] ← β [ i, j, k ] OR ( β [ i, l, k 1 ] AND β [ i + l, j − l, k 2 ]) with β [ i, j, k ] ← β [ i, j, k ] + P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] and the first-chart-row line: β [ i, 1 , k ] ← 1 with β [ i, 1 , k ] ← P( N k → w i ) CSCI 4152/6509, Vlado Keselj Lecture 30 7 / 22

Probabilistic CYK for Marginalization Require: sentence = w 1 . . . w n , and a PCFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: P(sentence) is returned 1: allocate β ∈ R n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← P( N k → w i ) 4: 5: for j ← 2 to n do 6: for i ← 1 to n − j + 1 do 7: for l ← 1 to j − 1 do for all rules N k → N k 1 N k 2 do 8: 9: β [ i, j, k ] ← β [ i, j, k ]+ P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] 10: return β [1 , n, 1] CSCI 4152/6509, Vlado Keselj Lecture 30 8 / 22

PCFG Marginalization Example (grammar) S → NP VP / 1 VP → V NP /. 5 N → time /. 5 NP → time /. 4 VP → V PP /. 5 N → arrow /. 3 NP → N N /. 2 PP → P NP / 1 N → flies /. 2 NP → D N /. 4 D → an / 1 V → like /. 3 V → flies /. 7 P → like / 1 CSCI 4152/6509, Vlado Keselj Lecture 30 9 / 22

PCFG Marginalization Example (chart) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 V: 0.3 D: 1 N: 0.3 N: 0.5 N: 0.2 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.02 NP: 0.12 1 x 0.3 x 0.4 = 0.12 N N P(NP−> N N) P NP P(PP−>P NP) 0.5 x 0.2 x 0.2 = 0.02 PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.018 V NP P(VP−>V NP) 0.3 x 0.12 x 0.5 = 0.018 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.01716 0.4 x 0.042 x 1 = 0.0168 add P(time flies like an arrow) = NP VP P(S−>NP VP) 0.02 x 0.018 x 1 = 0.00036 = 0.01716 0.0168+0.00036=0.01716 CSCI 4152/6509, Vlado Keselj Lecture 30 10 / 22

Conditioning Conditioning in the PCFG model: P(tree | sentence) Use the formula: P(tree | sentence) = P(tree , sentence) P(tree) = P(sentence) P(sentence) P(tree) — directly evaluated P(sentence) — marginalization CSCI 4152/6509, Vlado Keselj Lecture 30 11 / 22

Completion Finding the most likely parse tree of a sentence: arg max P(tree | sentence) tree Use the CYK algorithm in which line 9 is replaced with: 9: β [ i, j, k ] ← max( β [ i, j, k ] , P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ]) Return the most likely tree CSCI 4152/6509, Vlado Keselj Lecture 30 12 / 22

CYK-based Completion Algorithm Require: sentence = w 1 . . . w n , and a PCFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: The most likely parse tree is returned 1: allocate β ∈ R n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← P( N k → w i ) 4: 5: for j ← 2 to n do for i ← 1 to n − j + 1 do 6: for l ← 1 to j − 1 do 7: for all rules N k → N k 1 N k 2 do 8: β [ i, j, k ] ← max( β [ i, j, k ] , P( N k → 9: N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ]) 10: return Reconstruct (1 , n, 1 , β ) CSCI 4152/6509, Vlado Keselj Lecture 30 13 / 22

Algorithm: Reconstruct ( i, j, k, β ) Require: β — table from CYK, i — index of the first word, j — length of sub-string sentence, k — index of non-terminal Ensure: a most probable tree with root N k and leaves w i . . . w i + j − 1 is returned 1: if j = 1 then return tree with root N k and child w i 2: 3: for l ← 1 to j − 1 do for all rules N k → N k 1 N k 2 do 4: if 5: β [ i, j, k ] = P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] then create a tree t with root N k 6: t. left child ← Reconstruct( i, l, k 1 , β ) 7: t. right child ← Reconstruct( i + l, j − l, k 2 , β ) 8: return t 9: CSCI 4152/6509, Vlado Keselj Lecture 30 14 / 22

PCFG Completion Example (grammar) S → NP VP / 1 VP → V NP /. 5 N → time /. 5 NP → time /. 4 VP → V PP /. 5 N → arrow /. 3 NP → N N /. 2 PP → P NP / 1 N → flies /. 2 NP → D N /. 4 D → an / 1 V → like /. 3 V → flies /. 7 P → like / 1 CSCI 4152/6509, Vlado Keselj Lecture 30 15 / 22

PCFG Completion Example (chart) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 V: 0.3 D: 1 N: 0.3 N: 0.5 N: 0.2 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.02 NP: 0.12 1 x 0.3 x 0.4 = 0.12 N N P(NP−> N N) P NP P(PP−>P NP) 0.5 x 0.2 x 0.2 = 0.02 PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.018 V NP P(VP−>V NP) 0.3 x 0.12 x 0.5 = 0.018 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.4 x 0.042 x 1 = 0.0168 choose max: max P(tree | time flies like an arrow) = NP VP P(S−>NP VP) = 0.0168 0.02 x 0.018 x 1 = 0.00036 max(0.0168,0.00036)=0.0168 CSCI 4152/6509, Vlado Keselj Lecture 30 16 / 22

PCFG Completion Example (tree reconstruction) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 D: 1 N: 0.3 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.12 1 x 0.3 x 0.4 = 0.12 P NP P(PP−>P NP) PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.4 x 0.042 x 1 = 0.0168 choose max: start here NP VP P(S−>NP VP) 0.02 x 0.018 x 1 = 0.00036 max(0.0168,0.00036)=0.0168 CSCI 4152/6509, Vlado Keselj Lecture 30 17 / 22

PCFG Completion Example (final tree) The most probable three: S NP VP time V PP flies P NP like D N an arrow CSCI 4152/6509, Vlado Keselj Lecture 30 18 / 22

Issues with PCFGs Structural dependencies 1 ◮ Dependency on position in a tree ◮ Example: consider rules NP → PRP and NP → DT NN ◮ PRP is more likely as a subject than an object ◮ NL parse trees are usually deeper on their right side Lexical dependencies 2 ◮ Example: PP-attachment problem ◮ In a PCFG, decided using probabilities for higher level rules; e.g., NP → NP PP , VP → VBD NP , and VP → VBD NP PP ◮ Actually, they frequently depend on the actual words CSCI 4152/6509, Vlado Keselj Lecture 30 19 / 22

PP-Attachment Example Consider sentences: ◮ “Workers dumped sacks into a bin.” and ◮ “Workers dumped sacks of fish.” and rules: ◮ NP → NP PP ◮ VP → VBD NP ◮ VP → VBD NP PP CSCI 4152/6509, Vlado Keselj Lecture 30 20 / 22

A Solution: Probabilistic Lexicalized CFGs use heads of phrases expanded set of rules, e.g.: VP(dumped) → VBD(dumped) NP(sacks) PP(into) large number of new rules sparse data problem solution: new independence assumptions proposed solutions by Charniak, Collins, etc. around 1999 CSCI 4152/6509, Vlado Keselj Lecture 30 21 / 22

Parser Evaluation (not covered) We will not cover Parser Evaluation section in class It will not be on exam Notes are provided for your information CSCI 4152/6509, Vlado Keselj Lecture 30 22 / 22

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient PCFG Inference Instructor: Vlado Keselj Time and date: 09:3510:25, 27-Mar-2020 Location: On-line Delivery CSCI 4152/6509, Vlado Keselj Lecture 30 1 / 22 Previous Lecture

Natural Language Processing CSCI 4152/6509 Lecture 1 Course Introduction Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 2 Introduction to Natural Language

Natural Language Processing CSCI 4152/6509 Lecture 7 Perl Processing Examples Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 29 Context-Free Grammars for Natural

Natural Language Processing CSCI 4152/6509 Lecture 31 Introduction to Semantic Processing

Natural Language Processing CSCI 4152/6509 Lecture 6 Regular Expressions; Text Processing in

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 9 Elements of Morphology Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 26 CFGs and CYK Parsing Algorithm

Natural Language Processing CSCI 4152/6509 Lecture 17 N-gram Model Smoothing Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 14 Probabilistic Modeling Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier Evaluation Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 11 IR Measures and Text Mining

Natural Language Processing CSCI 4152/6509 Lecture 4 About Course Project; Automata and

Natural Language Processing CSCI 4152/6509 Lecture 10 Elements of Information Retrieval

Natural Language Processing CSCI 4152/6509 Lecture 18 POS Tags; Hidden Markov Model (HMM)

Family Medicine Board Review 2016 Brian Schwartz, MD UCSF, Division of Infectious Diseases

E ULERIAN DIGRAPHS & TORIC C ALABI -Y AU VARIETIES Paul de Medeiros based on 1011.2963

Object & Polymorphism Check out Polymorphism from SVN Inheritance, Associations, and

Q 1 TDT4250 - Model-driven Development of Information Systems, Autumn 2009 Overview of GMF

Causal commutative arrows revisited Jeremy Yallop Hai Liu WadlerFest Edinburgh, April 2016

Categories and Logic Programming LSV, October 2016 Logic Programming A Category - Theoretic

Programming from Gershom Bazerman, Universal Properties S&P/CapitalIQ Warning: This Talk

Programming: Control Flow Functions, Graphics Marco Chiarandini (marco@imada.sdu.dk) Department

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient PCFG Inference Instructor: Vlado Keselj Time and date: 09:3510:25, 27-Mar-2020 Location: On-line Delivery CSCI 4152/6509, Vlado Keselj Lecture 30 1 / 22 Previous Lecture

Natural Language Processing CSCI 4152/6509 Lecture 1 Course Introduction Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 2 Introduction to Natural Language

Natural Language Processing CSCI 4152/6509 Lecture 7 Perl Processing Examples Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 29 Context-Free Grammars for Natural

Natural Language Processing CSCI 4152/6509 Lecture 31 Introduction to Semantic Processing

Natural Language Processing CSCI 4152/6509 Lecture 6 Regular Expressions; Text Processing in

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 9 Elements of Morphology Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 26 CFGs and CYK Parsing Algorithm

Natural Language Processing CSCI 4152/6509 Lecture 17 N-gram Model Smoothing Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 14 Probabilistic Modeling Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier Evaluation Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 11 IR Measures and Text Mining

Natural Language Processing CSCI 4152/6509 Lecture 4 About Course Project; Automata and

Natural Language Processing CSCI 4152/6509 Lecture 10 Elements of Information Retrieval

Natural Language Processing CSCI 4152/6509 Lecture 18 POS Tags; Hidden Markov Model (HMM)

Family Medicine Board Review 2016 Brian Schwartz, MD UCSF, Division of Infectious Diseases

E ULERIAN DIGRAPHS &amp; TORIC C ALABI -Y AU VARIETIES Paul de Medeiros based on 1011.2963

Object &amp; Polymorphism Check out Polymorphism from SVN Inheritance, Associations, and

Q 1 TDT4250 - Model-driven Development of Information Systems, Autumn 2009 Overview of GMF

Causal commutative arrows revisited Jeremy Yallop Hai Liu WadlerFest Edinburgh, April 2016

Categories and Logic Programming LSV, October 2016 Logic Programming A Category - Theoretic

Programming from Gershom Bazerman, Universal Properties S&amp;P/CapitalIQ Warning: This Talk

Programming: Control Flow Functions, Graphics Marco Chiarandini (marco@imada.sdu.dk) Department

E ULERIAN DIGRAPHS & TORIC C ALABI -Y AU VARIETIES Paul de Medeiros based on 1011.2963

Object & Polymorphism Check out Polymorphism from SVN Inheritance, Associations, and

Programming from Gershom Bazerman, Universal Properties S&P/CapitalIQ Warning: This Talk