natural language processing csci 4152 6509 lecture 30
play

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient PCFG Inference Instructor: Vlado Keselj Time and date: 09:3510:25, 27-Mar-2020 Location: On-line Delivery CSCI 4152/6509, Vlado Keselj Lecture 30 1 / 22 Previous Lecture


  1. Natural Language Processing CSCI 4152/6509 — Lecture 30 Efficient PCFG Inference Instructor: Vlado Keselj Time and date: 09:35–10:25, 27-Mar-2020 Location: On-line Delivery CSCI 4152/6509, Vlado Keselj Lecture 30 1 / 22

  2. Previous Lecture Are NLs context-free? Natural Language Phenomena ◮ agreement, movement, subcategorization Typical phrase structure rules in English: ◮ Sentence, NP, VP, PP, ADJP, ADVP ◮ Additional notes about typical phrase structure rules in English Heads and dependency CSCI 4152/6509, Vlado Keselj Lecture 30 2 / 22

  3. Head-feature Principle Head Feature Principle: It is a principle that a set of characteristic features of a head word are transferred to the containing phrase. Examples of annotating head in a context-free rule: NP → DT NN H or [ NP ] → [ DT ] H [ NN ] HPSG—Head-driven Phrase Structure Grammars CSCI 4152/6509, Vlado Keselj Lecture 30 3 / 22

  4. Dependency Tree dependency grammar example with “That man caught the butterfly with a net.” That a the net man with butterfly caught CSCI 4152/6509, Vlado Keselj Lecture 30 4 / 22

  5. Arguments and Adjuncts There ar two kinds of dependents: arguments, which are required dependents, e.g., 1 We deprived him of food. adjuncts, which are not required; 2 ⋆ they have a “less tight” link to the head, and ⋆ can be moved around more easily Example: We deprived him of food yesterday in the restaurant. CSCI 4152/6509, Vlado Keselj Lecture 30 5 / 22

  6. Efficient Inference in PCFG Model • consider marginalization task: P(sentence) =? • or: P(sentence) = P( w 1 w 2 . . . w n | S ) • One way to compute: � P(sentence) = P( t ) , t ∈ T • Likely inefficient; need a parsing algorithm CSCI 4152/6509, Vlado Keselj Lecture 30 6 / 22

  7. Efficient PCFG Marginalization Idea: adapt CYK algorithm to store marginal probabilities Replace algorithm line: β [ i, j, k ] ← β [ i, j, k ] OR ( β [ i, l, k 1 ] AND β [ i + l, j − l, k 2 ]) with β [ i, j, k ] ← β [ i, j, k ] + P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] and the first-chart-row line: β [ i, 1 , k ] ← 1 with β [ i, 1 , k ] ← P( N k → w i ) CSCI 4152/6509, Vlado Keselj Lecture 30 7 / 22

  8. Probabilistic CYK for Marginalization Require: sentence = w 1 . . . w n , and a PCFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: P(sentence) is returned 1: allocate β ∈ R n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← P( N k → w i ) 4: 5: for j ← 2 to n do 6: for i ← 1 to n − j + 1 do 7: for l ← 1 to j − 1 do for all rules N k → N k 1 N k 2 do 8: 9: β [ i, j, k ] ← β [ i, j, k ]+ P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] 10: return β [1 , n, 1] CSCI 4152/6509, Vlado Keselj Lecture 30 8 / 22

  9. PCFG Marginalization Example (grammar) S → NP VP / 1 VP → V NP /. 5 N → time /. 5 NP → time /. 4 VP → V PP /. 5 N → arrow /. 3 NP → N N /. 2 PP → P NP / 1 N → flies /. 2 NP → D N /. 4 D → an / 1 V → like /. 3 V → flies /. 7 P → like / 1 CSCI 4152/6509, Vlado Keselj Lecture 30 9 / 22

  10. PCFG Marginalization Example (chart) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 V: 0.3 D: 1 N: 0.3 N: 0.5 N: 0.2 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.02 NP: 0.12 1 x 0.3 x 0.4 = 0.12 N N P(NP−> N N) P NP P(PP−>P NP) 0.5 x 0.2 x 0.2 = 0.02 PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.018 V NP P(VP−>V NP) 0.3 x 0.12 x 0.5 = 0.018 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.01716 0.4 x 0.042 x 1 = 0.0168 add P(time flies like an arrow) = NP VP P(S−>NP VP) 0.02 x 0.018 x 1 = 0.00036 = 0.01716 0.0168+0.00036=0.01716 CSCI 4152/6509, Vlado Keselj Lecture 30 10 / 22

  11. Conditioning Conditioning in the PCFG model: P(tree | sentence) Use the formula: P(tree | sentence) = P(tree , sentence) P(tree) = P(sentence) P(sentence) P(tree) — directly evaluated P(sentence) — marginalization CSCI 4152/6509, Vlado Keselj Lecture 30 11 / 22

  12. Completion Finding the most likely parse tree of a sentence: arg max P(tree | sentence) tree Use the CYK algorithm in which line 9 is replaced with: 9: β [ i, j, k ] ← max( β [ i, j, k ] , P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ]) Return the most likely tree CSCI 4152/6509, Vlado Keselj Lecture 30 12 / 22

  13. CYK-based Completion Algorithm Require: sentence = w 1 . . . w n , and a PCFG in CNF with nonterminals N 1 . . . N m , N 1 is the start symbol Ensure: The most likely parse tree is returned 1: allocate β ∈ R n × n × m and initialize all entries to 0 2: for i ← 1 to n do for all rules N k → w i do 3: β [ i, 1 , k ] ← P( N k → w i ) 4: 5: for j ← 2 to n do for i ← 1 to n − j + 1 do 6: for l ← 1 to j − 1 do 7: for all rules N k → N k 1 N k 2 do 8: β [ i, j, k ] ← max( β [ i, j, k ] , P( N k → 9: N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ]) 10: return Reconstruct (1 , n, 1 , β ) CSCI 4152/6509, Vlado Keselj Lecture 30 13 / 22

  14. Algorithm: Reconstruct ( i, j, k, β ) Require: β — table from CYK, i — index of the first word, j — length of sub-string sentence, k — index of non-terminal Ensure: a most probable tree with root N k and leaves w i . . . w i + j − 1 is returned 1: if j = 1 then return tree with root N k and child w i 2: 3: for l ← 1 to j − 1 do for all rules N k → N k 1 N k 2 do 4: if 5: β [ i, j, k ] = P( N k → N k 1 N k 2 ) · β [ i, l, k 1 ] · β [ i + l, j − l, k 2 ] then create a tree t with root N k 6: t. left child ← Reconstruct( i, l, k 1 , β ) 7: t. right child ← Reconstruct( i + l, j − l, k 2 , β ) 8: return t 9: CSCI 4152/6509, Vlado Keselj Lecture 30 14 / 22

  15. PCFG Completion Example (grammar) S → NP VP / 1 VP → V NP /. 5 N → time /. 5 NP → time /. 4 VP → V PP /. 5 N → arrow /. 3 NP → N N /. 2 PP → P NP / 1 N → flies /. 2 NP → D N /. 4 D → an / 1 V → like /. 3 V → flies /. 7 P → like / 1 CSCI 4152/6509, Vlado Keselj Lecture 30 15 / 22

  16. PCFG Completion Example (chart) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 V: 0.3 D: 1 N: 0.3 N: 0.5 N: 0.2 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.02 NP: 0.12 1 x 0.3 x 0.4 = 0.12 N N P(NP−> N N) P NP P(PP−>P NP) 0.5 x 0.2 x 0.2 = 0.02 PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.018 V NP P(VP−>V NP) 0.3 x 0.12 x 0.5 = 0.018 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.4 x 0.042 x 1 = 0.0168 choose max: max P(tree | time flies like an arrow) = NP VP P(S−>NP VP) = 0.0168 0.02 x 0.018 x 1 = 0.00036 max(0.0168,0.00036)=0.0168 CSCI 4152/6509, Vlado Keselj Lecture 30 16 / 22

  17. PCFG Completion Example (tree reconstruction) β [1,1,.] β [1,2,.] β [1,3,.] β [1,4,.] β [1,5,.] time flies like an arrow 1 2 3 4 5 6 NP: 0.4 V: 0.7 D: 1 N: 0.3 P: 1 D N P(NP−>D N) β [2,1,.] NP: 0.12 1 x 0.3 x 0.4 = 0.12 P NP P(PP−>P NP) PP: 0.12 1 x 0.12 x 1 = 0.12 VP: 0.042 V PP P(VP−>V PP) 0.7 x 0.12 x 0.5 = 0.042 S: 0.0168 NP VP P(S−>NP VP) 0.4 x 0.042 x 1 = 0.0168 choose max: start here NP VP P(S−>NP VP) 0.02 x 0.018 x 1 = 0.00036 max(0.0168,0.00036)=0.0168 CSCI 4152/6509, Vlado Keselj Lecture 30 17 / 22

  18. PCFG Completion Example (final tree) The most probable three: S NP VP time V PP flies P NP like D N an arrow CSCI 4152/6509, Vlado Keselj Lecture 30 18 / 22

  19. Issues with PCFGs Structural dependencies 1 ◮ Dependency on position in a tree ◮ Example: consider rules NP → PRP and NP → DT NN ◮ PRP is more likely as a subject than an object ◮ NL parse trees are usually deeper on their right side Lexical dependencies 2 ◮ Example: PP-attachment problem ◮ In a PCFG, decided using probabilities for higher level rules; e.g., NP → NP PP , VP → VBD NP , and VP → VBD NP PP ◮ Actually, they frequently depend on the actual words CSCI 4152/6509, Vlado Keselj Lecture 30 19 / 22

  20. PP-Attachment Example Consider sentences: ◮ “Workers dumped sacks into a bin.” and ◮ “Workers dumped sacks of fish.” and rules: ◮ NP → NP PP ◮ VP → VBD NP ◮ VP → VBD NP PP CSCI 4152/6509, Vlado Keselj Lecture 30 20 / 22

  21. A Solution: Probabilistic Lexicalized CFGs use heads of phrases expanded set of rules, e.g.: VP(dumped) → VBD(dumped) NP(sacks) PP(into) large number of new rules sparse data problem solution: new independence assumptions proposed solutions by Charniak, Collins, etc. around 1999 CSCI 4152/6509, Vlado Keselj Lecture 30 21 / 22

  22. Parser Evaluation (not covered) We will not cover Parser Evaluation section in class It will not be on exam Notes are provided for your information CSCI 4152/6509, Vlado Keselj Lecture 30 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend