NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 8 – Phrase Structure Parsing NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST) 1

NLP Programming Tutorial 8 – Phrase Structure Parsing Interpreting Language is Hard! I saw a girl with a telescope ● “Parsing” resolves structural ambiguity in a formal way 2

NLP Programming Tutorial 8 – Phrase Structure Parsing Two Types of Parsing ● Dependency: focuses on relations between words I saw a girl with a telescope ● Phrase structure: focuses on identifying phrases and their recursive structure S VP PP NP NP NP PRPVBD DT NN IN DT NN 3 I saw a girl with a telescope

NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 4

NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 5

NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP ??? PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 6

NLP Programming Tutorial 8 – Phrase Structure Parsing Different Structure, Different Interpretation S VP ??? NP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 9

NLP Programming Tutorial 8 – Phrase Structure Parsing Non-Terminals, Pre-Terminals, Terminals S VP Non-Terminal PP NP NP NP Pre-Terminal PRP VBD DT NN IN DT NN I saw a girl with a telescope Terminal 13

NLP Programming Tutorial 8 – Phrase Structure Parsing Parsing as a Prediction Problem ● Given a sentence X, predict its parse tree Y S VP PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope ● A type of “structured” prediction (similar to POS tagging, word segmentation, etc.) 14

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Model for Parsing ● Given a sentence X, predict the most probable parse tree Y S VP PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope P ( Y ∣ X ) argmax Y 15

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Generative Model ● We assume some probabilistic model generated the parse tree Y and sentence X jointly P ( Y , X ) ● The parse tree with highest joint probability given X also has the highest conditional probability P ( Y ∣ X )= argmax P ( Y , X ) argmax Y Y 16

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● How do we define a joint probability for a parse tree? S VP P( ) PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope 17

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● PCFG: Define probability for each node S P(S → NP VP) P(VP → VBD NP PP) VP P(PP → IN NP) PP P(NP → DT NN) NP NP NP P(PRP → “I”) P(NN → “telescope”) PRPVBD DT NN IN DT NN I saw a girl with a telescope 18

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● PCFG: Define probability for each node S P(S → NP VP) P(VP → VBD NP PP) VP P(PP → IN NP) PP P(NP → DT NN) NP NP NP P(PRP → “I”) P(NN → “telescope”) PRPVBD DT NN IN DT NN I saw a girl with a telescope ● Parse tree probability is product of node probabilities P(S → NP VP) * P(NP → PRP) * P(PRP → “I”) * P(VP → VBD NP PP) * P(VBD → “saw”) * P(NP → DT NN) * P(DT → “a”) * P(NN → “girl”) * P(PP → IN NP) * P(IN → “with”) 19 * P(NP → DT NN) * P(DT → “a”) * P(NN → “telescope”)

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Parsing ● Given this model, parsing is the algorithm to find P ( Y , X ) argmax Y ● Can we use the Viterbi algorithm as we did before? 20

NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Parsing ● Given this model, parsing is the algorithm to find P ( Y , X ) argmax Y ● Can we use the Viterbi algorithm as we did before? ● Answer: No! ● Reason: Parse candidates are not graphs, but hypergraphs. 21

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Let's say we have S 0,7 two parse trees VP 1,7 NP 2,7 S PP 0,7 4,7 NP NP NP VP 0,1 2,4 5,7 1,7 PRP VBD DT NN IN DT NN PP 0,1 1,2 2,3 3,4 4,5 5,6 6,7 4,7 I saw a girl with a telescope NP NP NP 0,1 2,4 5,7 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 22 I saw a girl with a telescope

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Most parts are the S 0,7 same! VP 1,7 NP 2,7 S PP 0,7 4,7 NP NP NP VP 0,1 2,4 5,7 1,7 PRP VBD DT NN IN DT NN PP 0,1 1,2 2,3 3,4 4,5 5,6 6,7 4,7 I saw a girl with a telescope NP NP NP 0,1 2,4 5,7 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 23 I saw a girl with a telescope

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Graph with all same edges + all nodes S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 24

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Create graph with all same edges + all nodes S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 25

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the first trees: S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 26

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the second tree: S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 27

NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the first and second trees: S Two choices! 0,7 Choose red, get the first tree VP Choose blue, get the second tree 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 28

NLP Programming Tutorial 8 – Phrase Structure Parsing Why a “Hyper”graph? ● The “degree” of an edge is the number of children Degree 1 Degree 2 Degree 3 VP VP PRP VBD 1,7 1,7 0,1 1,2 VBD VBD NP NP PP I saw 1,2 1,2 2,7 2,4 4,7 ● The degree of a hypergraph is the maximum degree of all its edges ● A graph is a hypergraph of degree 1! 1.4 2.3 4.0 2.5 Example → 0 1 2 3 29 2.1

NLP Programming Tutorial 8 – Phrase Structure Parsing Weighted Hypergraphs ● Like graphs: ● can add weights to hypergraph edges ● use negative log probability of rule S 0,7 -log(P(VP → VBD NP)) -log(P(S → NP VP)) VP 1,7 NP -log(P(VP → VBD NP PP)) 2,7 PP 4,7 NP NP NP 0,1 5,7 2,4 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 log(P(PRP → “I”)) 30 I saw a girl with a telescope

NLP Programming Tutorial 8 – Phrase Structure Parsing Solving Hypergraphs ● Parsing = finding minimum path through a hypergraph 31

NLP Programming Tutorial 8 – Phrase Structure Parsing Solving Hypergraphs ● Parsing = finding minimum path through a hypergraph ● We can do this for graphs with the Viterbi algorithm ● Forward: Calculate score of best path to each state ● Backward: Recover the best path 32

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 8 Phrase Structure Parsing NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 8 Phrase Structure Parsing

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley Parse Trees Phrase

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Kana-Kanji Conversion Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 13 - Beam and A* Search Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

2D Computer Graphics Diego Nehab Summer 2020 IMPA 1 Anti-aliasing and texture mapping Value of

The Toda lattice and Bruhat interval polytopes Lauren K. Williams, UC Berkeley 2314 1324 2413

MATH 12002 - CALCULUS I 2.7: Related Rates Part 1: Introduction & Example Revisited

Invariant measures for KdV and Toda-type discrete integrable systems Online Open Probability

Asset pricing under optimal contracts Jak sa Cvitani c (Caltech) joint work with Hao Xing

Abstract The Hamilton- Jacobi partial differential equation is generalized to be applicable for

Lecture 4.9: Variation of parameters for systems Matthew Macauley Department of Mathematical

: g 0 n 1 i m e r m u t a c r e g L o r P s M c i M m H a r n o y

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig - PowerPoint PPT Presentation

NLP Programming Tutorial 8 Phrase Structure Parsing NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 8 Phrase Structure Parsing

NLP Programming Tutorial 12 - Dependency Parsing Graham Neubig Nara Institute of Science and

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley Parse Trees Phrase

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

NLP Programming Tutorial 2 - Bigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 1 - Unigram Language Models Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 6 - Kana-Kanji Conversion Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 13 - Beam and A* Search Graham Neubig Nara Institute of Science and

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

2D Computer Graphics Diego Nehab Summer 2020 IMPA 1 Anti-aliasing and texture mapping Value of

The Toda lattice and Bruhat interval polytopes Lauren K. Williams, UC Berkeley 2314 1324 2413

MATH 12002 - CALCULUS I 2.7: Related Rates Part 1: Introduction &amp; Example Revisited

Invariant measures for KdV and Toda-type discrete integrable systems Online Open Probability

Asset pricing under optimal contracts Jak sa Cvitani c (Caltech) joint work with Hao Xing

Abstract The Hamilton- Jacobi partial differential equation is generalized to be applicable for

Lecture 4.9: Variation of parameters for systems Matthew Macauley Department of Mathematical

: g 0 n 1 i m e r m u t a c r e g L o r P s M c i M m H a r n o y

MATH 12002 - CALCULUS I 2.7: Related Rates Part 1: Introduction & Example Revisited