Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: - - PowerPoint PPT Presentation
Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: - - PowerPoint PPT Presentation
Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics Probabilistic Context Free Grammar 1.0 S NP VP 1.0 PP P NP .4 NP Det
Outline
Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
1.0 S → NP VP .4 NP → Det Noun .3 NP → Noun .2 NP → Det AdjP .1 NP → NP PP 1.0 PP → P NP .34 AdjP → Adj Noun .26 VP → V NP .0003 Noun → Baltimore … Q: What are the distributions? What must sum to 1?
A: P(X → Y Z | X)
Probabilistic Context Free Grammar
p(
S NP VP
Noun
Baltimore
Verb
NP
is a great city
)=
p(
S NP VP ) *
p( ) * p( ) *
NP
Noun
p( ) *
Noun
Baltimore
VP
Verb
NP
p( ) *
Verb
is
p( )
NP
a great city
product of probabilities of individual rules used in the derivation
“Papa ate the caviar with a spoon”
S → NP VP NP → Det N NP → NP PP VP → V NP VP → VP PP PP → P NP NP → Papa N → caviar N → spoon V → spoon V → ate P → with Det → the Det → a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
NP VP
6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end S
CKY Recognizer
Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i → j For Viterbi in HMMs: build table left-to-right For CKY in trees:
- 1. build smallest-to-largest &
- 2. left-to-right
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X → wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X → Y Z : G) } } } } }
X Y Z Y Z
CKY is Versatile: PCFG Tasks
Task PCFG algorithm name HMM analog Find any parse CKY recognizer none Find the most likely parse (for an
- bserved sequence)
CKY weighted Viterbi Viterbi Calculate the (log) likelihood of an
- bserved sequence w1, …, wN
Inside algorithm Forward algorithm Learn the grammar parameters Inside-outside algorithm (EM) Forward- backward/Baum- Welch (EM)
Outline
Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics
Structure vs. Word Relations
Constituency trees/analyses: based on structure Dependency analyses: based on word relations
Remember: (P)CFGs Help Clearly Show Ambiguity
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
CFGs to Dependencies
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
CFGs to Dependencies
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
CFGs to Dependencies
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
CFGs to Labeled Dependencies
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
nsubj dobj nmod nsubj dobj nmod
Labeled Dependencies
Word-to-word labeled relations governor (head) dependent
Chris ate nsubj
Labeled Dependencies
Word-to-word labeled relations governor (head) dependent
Chris ate nsubj de Marneffe et al., 2014
http://universaldependencies.org/
(Labeled) Dependency Parse
Directed graphs Vertices: linguistic blobs in a sentence Edges: (labeled) arcs
(Labeled) Dependency Parse
Directed graphs Vertices: linguistic blobs in a sentence Edges: (labeled) arcs Often directed trees
- 1. A single root node with no incoming arcs
- 2. Each vertex except root has exactly one incoming arc
- 3. Unique path from the root node to each vertex
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective ✖ Not projective
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective ✖ Not projective non projective parses capture
- certain long-range dependencies
- free word order
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of each node in a phrase
structure, using “appropriate” head rules.
- 2. In the dependency structure, make the head of
each non-head child depend on the head of the head-child.
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of
each node in a phrase structure, using “appropriate” head rules.
- 2. In the dependency
structure, make the head
- f each non-head child
depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N NP NP PP VP VP S
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of
each node in a phrase structure, using “appropriate” head rules.
- 2. In the dependency
structure, make the head
- f each non-head child
depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N NP NP PP VP VP S
spoon caviar
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of
each node in a phrase structure, using “appropriate” head rules.
- 2. In the dependency
structure, make the head
- f each non-head child
depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N NP NP PP VP VP S
ate spoon spoon caviar
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of
each node in a phrase structure, using “appropriate” head rules.
- 2. In the dependency
structure, make the head
- f each non-head child
depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N NP NP PP VP VP S
ate spoon spoon caviar ate
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
- 1. Mark the head child of
each node in a phrase structure, using “appropriate” head rules.
- 2. In the dependency
structure, make the head
- f each non-head child
depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N NP NP PP VP VP S
ate spoon spoon caviar ate ate
Dependency Post-Processing (Keep Tree Structure)
Dependency Post-Processing (Get Possible Graph Structure)
Amaranthus, collectively known as amaranth, is a cosmopolitan genus of annual or short-lived perennial plants.
Outline
Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics
(Some) Dependency Parsing Algorithms
Dynamic Programming Eisner Algorithm (Eisner 1996) Transition-based Shift-reduce, arc standard Graph-based Maximum spanning tree
(Some) Dependency Parsing Algorithms
Dynamic Programming Eisner Algorithm (Eisner 1996) Transition-based Shift-reduce, arc standard Graph-based Maximum spanning tree
Shift-Reduce Parsing
Recall from CMSC 331 Bottom-up Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift-Reduce Parsing
Recall from CMSC 331 Bottom-up Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – match top two elements of the stack against the grammar (RHS)
Reduce:
– If match occurred, place LHS symbol on the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
decide how? Search problem!
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
decide how? Search problem! what is valid? Learn it!
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
decide how? Search problem! what is valid? Learn it! what are the possible actions?
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
decide how? Search problem! what is valid? Learn it! what are the possible actions?
Shift-Reduce Actions
Possibility Assign the current word as the head of some previously seen word Assign some previously seen word as the head
- f the current word
Wait processing the current word; add it for later
Shift-Reduce Actions
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head
- f the current word
RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
Shift-Reduce Actions
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head
- f the current word
RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
“Arc standard”
Arc Standard Parsing
state {[root], [words], [] }
Arc Standard Parsing
state {[root], [words], [] } while state not final { }
Arc Standard Parsing
state {[root], [words], [] } while state not final { t ← ORACLE(state) state ← APPLY(t, state) }
Arc Standard Parsing
state {[root], [words], [] } while state not final { t ← ORACLE(state) state ← APPLY(t, state) } return state
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← PREDICT(state) state ← APPLY(t, state) } return state
what is valid? Learn it!
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:
– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:
– If there’s a valid relation, place head on the stack
decide how? Search problem! what is valid? Learn it! what are the possible actions?
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
- Choose LEFTARC if it produces a correct head-dependent relation given the reference
parse and the current configuration
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
- Choose LEFTARC if it produces a correct head-dependent relation given the reference
parse and the current configuration
- Choose RIGHTARC if
- it produces a correct head-dependent relation given the reference parse and
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
- Choose LEFTARC if it produces a correct head-dependent relation given the reference
parse and the current configuration
- Choose RIGHTARC if
- it produces a correct head-dependent relation given the reference parse and
- all of the dependents of the word at the top of the stack have already been
assigned
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
- Choose LEFTARC if it produces a correct head-dependent relation given the reference
parse and the current configuration
- Choose RIGHTARC if
- it produces a correct head-dependent relation given the reference parse and
- all of the dependents of the word at the top of the stack have already been
assigned
Deps Stack Word Buffer Action
- $
Papa ate the caviar SHIFT
- Papa $
ate the caviar SHIFT
- ate Papa $
the caviar LEFTARC ate->Papa ate $ caviar SHIFT ate->Papa the ate $
- SHIFT
ate->Papa caviar the ate $
- LEFTARC
ate->Papa, caviar->the caviar ate $
- RIGHTARC
ate->Papa, caviar->the, ate->caviar ate $
- RIGHTARC
ate->Papa, caviar->the, ate->caviar, $->ate
Learning An Oracle (Predictor)
Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
- Choose LEFTARC if it produces a correct head-dependent relation given the reference
parse and the current configuration
- Choose RIGHTARC if
- it produces a correct head-dependent relation given the reference parse and
- all of the dependents of the word at the top of the stack have already been
assigned
- Otherwise, choose SHIFT
Training the Predictor
Predict action t give configuration s t = φ(s)
Training the Predictor
Predict action t give configuration s t = φ(s) Extract features of the configuration Examples: word forms, lemmas, POS, morphological features
Training the Predictor
Predict action t give configuration s t = φ(s) Extract features of the configuration Examples: word forms, lemmas, POS, morphological features How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks
Papa ate the caviar
If time permits, work through on board
state {[root], [words], [] } while state≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack
Parse steps for “Papa ate the caviar”
Deps Stack Word Buffer Action
- $
Papa ate the caviar SHIFT
- Papa $
ate the caviar SHIFT
- ate Papa $
the caviar LEFTARC ate->Papa ate $ caviar SHIFT ate->Papa the ate $
- SHIFT
ate->Papa caviar the ate $
- LEFTARC
ate->Papa, caviar-> the caviar ate $
- RIGHTARC
ate->Papa, caviar-> the, ate->caviar ate $
- RIGHTARC
ate->Papa, caviar-> the, ate->caviar, $->ate
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Q: What is the time complexity?
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Q: What is the time complexity? A: Linear
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Q: What is the time complexity? A: Linear Q: What’s potentially problematic?
Arc Standard Parsing
state {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state
Q: What is the time complexity? A: Linear Q: What’s potentially problematic? A: This is a greedy algorithm
Becoming Less Greedy
Beam search Breadth-first search strategy (CMSC 471/671) At each stage, keep K options open
Evaluation
Exact Match (per-sentence accuracy) Unlabeled Attachment Score (UAS) Labeled Attachment Score (LS, LAS) Recall/Precision/F1 for particular relation types
Outline
Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics
Parsing as a Core NLP Problem
sentence 1 sentence 2 sentence 3 sentence 4 Parser Grammar
Evaluation
score Other NLP task (entity coref., MT, Q&A, …)
independent
- perations
Gold (correct) reference trees
From Dependencies to Shallow Semantics
From Syntax to Shallow Semantics
Angeli et al. (2015)
“Open Information Extraction”
From Syntax to Shallow Semantics
http://corenlp.run/(constituency & dependency) https://github.com/hltcoe/predpatt http://openie.allenai.org/ http://www.cs.rochester.edu/research/knext/browse/ (constituency trees) http://rtw.ml.cmu.edu/rtw/
Angeli et al. (2015)
“Open Information Extraction” a sampling of efforts