CS 533: Natural Language Processing
Constituency Parsing
Karl Stratos
Rutgers University
Karl Stratos CS 533: Natural Language Processing 1/37
Constituency Parsing Karl Stratos Rutgers University Karl Stratos - - PowerPoint PPT Presentation
CS 533: Natural Language Processing Constituency Parsing Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing 1/37 Project Logistics 1. Proposal (due 3/24) 2. Milestone (due 4/15) 3. Presentation (tentatively
CS 533: Natural Language Processing
Rutgers University
Karl Stratos CS 533: Natural Language Processing 1/37
Karl Stratos CS 533: Natural Language Processing 2/37
◮ Extend and apply recent machine learning methods to
◮ Search last/this year’s AISTATS/ICLR/ICML/NeurIPS/UAI
publications
◮ Extend a recent machine learning method in NLP
◮ Search last/this year’s ACL/CoNLL/EMNLP/NAACL
◮ Reimplement and replicate an existing technically challenging
◮ No available public code!
◮ There may be a limited number of predefined projects
◮ No promise ◮ Priority given to those with higher assignment grades Karl Stratos CS 533: Natural Language Processing 3/37
◮ Language models and conditional language models ◮ Pretrained representations from language modeling, evaluation
◮ Tagging, parsing (today)
◮ Latent-variable models (EM, VAEs) ◮ Information extraction (entity linking)
Karl Stratos CS 533: Natural Language Processing 4/37
◮ Recall HMM
n+1
n
◮ The forward algorithm computes in O(|Y|2 n)
◮ The backward algorithm computes in O(|Y|2 n)
Karl Stratos CS 533: Natural Language Processing 5/37
Karl Stratos CS 533: Natural Language Processing 6/37
◮ Once forward and backward probabilities are computed, useful
◮ Debugging tip: p(x1 . . . xn) =
y α(i, y)β(i, y) for any i
◮ Marginal decoding: for i = 1 . . . n predict y ∈ Y with highest
◮ Tag pair conditional marginals (will be useful later)
Karl Stratos CS 533: Natural Language Processing 7/37
(Some slides adapted from Chris Manning, Mike Collins)
Karl Stratos CS 533: Natural Language Processing 8/37
Karl Stratos CS 533: Natural Language Processing 9/37
◮ Start by labeing words with POS tags
◮ Recursively combine constituents according to rules
Karl Stratos CS 533: Natural Language Processing 10/37
Karl Stratos CS 533: Natural Language Processing 11/37
◮ Most compelling example of latent structure in language
the man saw the moon with a telescope
◮ Hypothesis: parsing is intimately related to intelligence ◮ Also some useful applications, e.g., relation extraction
Karl Stratos CS 533: Natural Language Processing 12/37
◮ N: non-terminal symbols (constituents) ◮ Σ: terminal symbols (words) ◮ R: rules of form X → Y1 . . . Ym where X ∈ N, Yi ∈ N ∪ Σ ◮ S ∈ N: start symbol
Karl Stratos CS 533: Natural Language Processing 13/37
Karl Stratos CS 533: Natural Language Processing 14/37
◮ s1 = S ◮ For i = 2 . . . n,
◮ sn ∈ Σ∗ (aka “yield” of the derivation)
Karl Stratos CS 533: Natural Language Processing 15/37
Karl Stratos CS 533: Natural Language Processing 16/37
Karl Stratos CS 533: Natural Language Processing 17/37
◮ Rule-based: manually construct some CFG that recognizes as
◮ Never enough, no way to choose the right parse
◮ Statistical: annotate sentences with parse trees (aka.
Karl Stratos CS 533: Natural Language Processing 18/37
◮ Standard setup: WSJ portion of Penn Treebank
◮ 40,000 trees for training ◮ 1,700 trees for validation ◮ 2,400 trees for evaluation
◮ Building a treebank vs building a grammar?
◮ Broad coverage, more natural annotation ◮ Contains distributional information of English ◮ Can be used to evaluate parsers Karl Stratos CS 533: Natural Language Processing 19/37
◮ N: non-terminal symbols (constituents) ◮ Σ: terminal symbols (words) ◮ R: rules of form X → Y1 . . . Ym where X ∈ N, Yi ∈ N ∪ Σ ◮ S ∈ N: start symbol ◮ q: rule probability q(α → β) ≥ 0 for every rule α → β ∈ R
β q(X → β) = 1 for any X ∈ N
Karl Stratos CS 533: Natural Language Processing 20/37
Karl Stratos CS 533: Natural Language Processing 21/37
◮ N: all non-terminal symbols (constituents) seen in the data ◮ Σ: all terminal symbols (words) seen in the data ◮ R: all rules seen in the data ◮ S ∈ N: special start symbol (if the data does not already
◮ q: MLE estimate
Karl Stratos CS 533: Natural Language Processing 22/37
t: height(t)≤h p(t). Let S∗ = limh→∞ Sh.
◮ Total probability of parses is less than one!
Karl Stratos CS 533: Natural Language Processing 23/37
Karl Stratos CS 533: Natural Language Processing 24/37
Karl Stratos CS 533: Natural Language Processing 25/37
◮ For 1 ≤ i ≤ j ≤ n, for all X ∈ N,
◮ Return
◮ Total runtime?
Karl Stratos CS 533: Natural Language Processing 26/37
Karl Stratos CS 533: Natural Language Processing 27/37
Karl Stratos CS 533: Natural Language Processing 28/37
◮ For 1 ≤ i ≤ j ≤ n, for all X ∈ N,
t∈GEN(xi...xj): root(t)=X p(t) ◮ Base: π(i, j, X) = q(X → xi) ◮ Main: π(i, j, X) = maxi≤k<j, X→Y Z∈R q(X →
◮ We have
t∈GEN(x1...xn) p(t)
Karl Stratos CS 533: Natural Language Processing 29/37
i≤k<j X→Y Z∈R
Karl Stratos CS 533: Natural Language Processing 30/37
◮ Can we calculate something like
◮ Yes, by combining the inside algorithm with the outside
Karl Stratos CS 533: Natural Language Processing 31/37
◮ Base. β(1, n, S) = 1 and β(1, n, X) = 0 for all X = S ◮ Main. For l = n − 2 . . . 1, for i = 1 . . . n − l (set j = i + l), for X ∈ N,
β(i, j, X) =
Z→X Y ∈R
β(i, k, Z) × α(j + 1, k, Y ) × q(Z → X Y )+
Z→Y X∈R
β(k, j, Z) × α(k, i − 1, Y ) × q(Z → Y X)
X
xi ……... xj
Z Y
xj+1 ……... xk
Y
xk ……... xi-1
Z X
xi ……... xj Karl Stratos CS 533: Natural Language Processing 32/37
◮ Inside + outside: for 1 ≤ i ≤ j ≤ n, for all X ∈ N,
◮ New parsing objective: find max marginal parse
t∈GEN(x1...xn)
◮ Labeled recall algorithm O(n3 |N|) (Goodman, 1996)
X
i≤k<j γ(i, k) + γ(k + 1, j) ◮ How many algorithms do we need for max marginal parsing??
Karl Stratos CS 533: Natural Language Processing 33/37
◮ Precision
◮ Recall
◮ Labeled F1
Karl Stratos CS 533: Natural Language Processing 34/37
◮ What is the tagging accuracy?
Karl Stratos CS 533: Natural Language Processing 35/37
◮ PCFG: extremely strong conditional independence assumption
◮ Can consider lexicalizing the grammar (Collins, 2003)
Karl Stratos CS 533: Natural Language Processing 36/37
◮ Vanilla PCFG: 70.6% recall, 74.8% precision ◮ Lexicalized PCFG: 88.1% recall, 88.3% precision ◮ Neuralized constituency parsing (Kitaev and Klein, 2018): 94.9%
Karl Stratos CS 533: Natural Language Processing 37/37