Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: - - PowerPoint PPT Presentation

dependency grammars and parsing
SMART_READER_LITE
LIVE PREVIEW

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: - - PowerPoint PPT Presentation

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics Probabilistic Context Free Grammar 1.0 S NP VP 1.0 PP P NP .4 NP Det


slide-1
SLIDE 1

Dependency Grammars and Parsing

CMSC 473/673 UMBC

slide-2
SLIDE 2

Outline

Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics

slide-3
SLIDE 3

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S → NP VP .4 NP → Det Noun .3 NP → Noun .2 NP → Det AdjP .1 NP → NP PP 1.0 PP → P NP .34 AdjP → Adj Noun .26 VP → V NP .0003 Noun → Baltimore … Q: What are the distributions? What must sum to 1?

A: P(X → Y Z | X)

slide-4
SLIDE 4

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

p(

S NP VP ) *

p( ) * p( ) *

NP

Noun

p( ) *

Noun

Baltimore

VP

Verb

NP

p( ) *

Verb

is

p( )

NP

a great city

product of probabilities of individual rules used in the derivation

slide-5
SLIDE 5

“Papa ate the caviar with a spoon”

S → NP VP NP → Det N NP → NP PP VP → V NP VP → VP PP PP → P NP NP → Papa N → caviar N → spoon V → spoon V → ate P → with Det → the Det → a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

NP VP

6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end S

slide-6
SLIDE 6

CKY Recognizer

Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i → j For Viterbi in HMMs: build table left-to-right For CKY in trees:

  • 1. build smallest-to-largest &
  • 2. left-to-right
slide-7
SLIDE 7

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X → wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X → Y Z : G) } } } } }

X Y Z Y Z

slide-8
SLIDE 8

CKY is Versatile: PCFG Tasks

Task PCFG algorithm name HMM analog Find any parse CKY recognizer none Find the most likely parse (for an

  • bserved sequence)

CKY weighted Viterbi Viterbi Calculate the (log) likelihood of an

  • bserved sequence w1, …, wN

Inside algorithm Forward algorithm Learn the grammar parameters Inside-outside algorithm (EM) Forward- backward/Baum- Welch (EM)

slide-9
SLIDE 9

Outline

Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics

slide-10
SLIDE 10

Structure vs. Word Relations

Constituency trees/analyses: based on structure Dependency analyses: based on word relations

slide-11
SLIDE 11

Remember: (P)CFGs Help Clearly Show Ambiguity

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-12
SLIDE 12

CFGs to Dependencies

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-13
SLIDE 13

CFGs to Dependencies

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-14
SLIDE 14

CFGs to Dependencies

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-15
SLIDE 15

CFGs to Labeled Dependencies

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

nsubj dobj nmod nsubj dobj nmod

slide-16
SLIDE 16

Labeled Dependencies

Word-to-word labeled relations governor (head) dependent

Chris ate nsubj

slide-17
SLIDE 17

Labeled Dependencies

Word-to-word labeled relations governor (head) dependent

Chris ate nsubj de Marneffe et al., 2014

slide-18
SLIDE 18

http://universaldependencies.org/

slide-19
SLIDE 19

(Labeled) Dependency Parse

Directed graphs Vertices: linguistic blobs in a sentence Edges: (labeled) arcs

slide-20
SLIDE 20

(Labeled) Dependency Parse

Directed graphs Vertices: linguistic blobs in a sentence Edges: (labeled) arcs Often directed trees

  • 1. A single root node with no incoming arcs
  • 2. Each vertex except root has exactly one incoming arc
  • 3. Unique path from the root node to each vertex
slide-21
SLIDE 21

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

slide-22
SLIDE 22

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective ✖ Not projective

slide-23
SLIDE 23

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective ✖ Not projective non projective parses capture

  • certain long-range dependencies
  • free word order
slide-24
SLIDE 24

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of each node in a phrase

structure, using “appropriate” head rules.

  • 2. In the dependency structure, make the head of

each non-head child depend on the head of the head-child.

slide-25
SLIDE 25

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

  • 2. In the dependency

structure, make the head

  • f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

slide-26
SLIDE 26

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

  • 2. In the dependency

structure, make the head

  • f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

spoon caviar

slide-27
SLIDE 27

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

  • 2. In the dependency

structure, make the head

  • f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

ate spoon spoon caviar

slide-28
SLIDE 28

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

  • 2. In the dependency

structure, make the head

  • f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

ate spoon spoon caviar ate

slide-29
SLIDE 29

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

  • 1. Mark the head child of

each node in a phrase structure, using “appropriate” head rules.

  • 2. In the dependency

structure, make the head

  • f each non-head child

depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N NP NP PP VP VP S

ate spoon spoon caviar ate ate

slide-30
SLIDE 30

Dependency Post-Processing (Keep Tree Structure)

slide-31
SLIDE 31

Dependency Post-Processing (Get Possible Graph Structure)

Amaranthus, collectively known as amaranth, is a cosmopolitan genus of annual or short-lived perennial plants.

slide-32
SLIDE 32

Outline

Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics

slide-33
SLIDE 33

(Some) Dependency Parsing Algorithms

Dynamic Programming Eisner Algorithm (Eisner 1996) Transition-based Shift-reduce, arc standard Graph-based Maximum spanning tree

slide-34
SLIDE 34

(Some) Dependency Parsing Algorithms

Dynamic Programming Eisner Algorithm (Eisner 1996) Transition-based Shift-reduce, arc standard Graph-based Maximum spanning tree

slide-35
SLIDE 35

Shift-Reduce Parsing

Recall from CMSC 331 Bottom-up Tools: input words, some special root symbol ($), and a stack to hold configurations

slide-36
SLIDE 36

Shift-Reduce Parsing

Recall from CMSC 331 Bottom-up Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – match top two elements of the stack against the grammar (RHS)

Reduce:

– If match occurred, place LHS symbol on the stack

slide-37
SLIDE 37

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

slide-38
SLIDE 38

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem!

slide-39
SLIDE 39

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem! what is valid? Learn it!

slide-40
SLIDE 40

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem! what is valid? Learn it! what are the possible actions?

slide-41
SLIDE 41

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem! what is valid? Learn it! what are the possible actions?

slide-42
SLIDE 42

Shift-Reduce Actions

Possibility Assign the current word as the head of some previously seen word Assign some previously seen word as the head

  • f the current word

Wait processing the current word; add it for later

slide-43
SLIDE 43

Shift-Reduce Actions

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head

  • f the current word

RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

slide-44
SLIDE 44

Shift-Reduce Actions

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head

  • f the current word

RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

“Arc standard”

slide-45
SLIDE 45

Arc Standard Parsing

state  {[root], [words], [] }

slide-46
SLIDE 46

Arc Standard Parsing

state  {[root], [words], [] } while state not final { }

slide-47
SLIDE 47

Arc Standard Parsing

state  {[root], [words], [] } while state not final { t ← ORACLE(state) state ← APPLY(t, state) }

slide-48
SLIDE 48

Arc Standard Parsing

state  {[root], [words], [] } while state not final { t ← ORACLE(state) state ← APPLY(t, state) } return state

slide-49
SLIDE 49

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

slide-50
SLIDE 50

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

slide-51
SLIDE 51

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

slide-52
SLIDE 52

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← PREDICT(state) state ← APPLY(t, state) } return state

what is valid? Learn it!

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

slide-53
SLIDE 53

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations Shift:

– move tokens onto the stack – decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:

– If there’s a valid relation, place head on the stack

decide how? Search problem! what is valid? Learn it! what are the possible actions?

slide-54
SLIDE 54

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

slide-55
SLIDE 55

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

  • Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

slide-56
SLIDE 56

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

  • Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

  • Choose RIGHTARC if
  • it produces a correct head-dependent relation given the reference parse and
slide-57
SLIDE 57

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

  • Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

  • Choose RIGHTARC if
  • it produces a correct head-dependent relation given the reference parse and
  • all of the dependents of the word at the top of the stack have already been

assigned

slide-58
SLIDE 58

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

  • Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

  • Choose RIGHTARC if
  • it produces a correct head-dependent relation given the reference parse and
  • all of the dependents of the word at the top of the stack have already been

assigned

Deps Stack Word Buffer Action

  • $

Papa ate the caviar SHIFT

  • Papa $

ate the caviar SHIFT

  • ate Papa $

the caviar LEFTARC ate->Papa ate $ caviar SHIFT ate->Papa the ate $

  • SHIFT

ate->Papa caviar the ate $

  • LEFTARC

ate->Papa, caviar->the caviar ate $

  • RIGHTARC

ate->Papa, caviar->the, ate->caviar ate $

  • RIGHTARC

ate->Papa, caviar->the, ate->caviar, $->ate

slide-59
SLIDE 59

Learning An Oracle (Predictor)

Training data: dependency treebank Input: configuration Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

  • Choose LEFTARC if it produces a correct head-dependent relation given the reference

parse and the current configuration

  • Choose RIGHTARC if
  • it produces a correct head-dependent relation given the reference parse and
  • all of the dependents of the word at the top of the stack have already been

assigned

  • Otherwise, choose SHIFT
slide-60
SLIDE 60

Training the Predictor

Predict action t give configuration s t = φ(s)

slide-61
SLIDE 61

Training the Predictor

Predict action t give configuration s t = φ(s) Extract features of the configuration Examples: word forms, lemmas, POS, morphological features

slide-62
SLIDE 62

Training the Predictor

Predict action t give configuration s t = φ(s) Extract features of the configuration Examples: word forms, lemmas, POS, morphological features How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks

slide-63
SLIDE 63

Papa ate the caviar

If time permits, work through on board

state  {[root], [words], [] } while state≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Possibility Action Name Action Meaning Assign the current word as the head of some previously seen word LEFTARC Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the stack Assign some previously seen word as the head of the current word RIGHTARC Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack Wait processing the current word; add it for later SHIFT Remove the word from the front of the input buffer and push it onto the stack

slide-64
SLIDE 64

Parse steps for “Papa ate the caviar”

Deps Stack Word Buffer Action

  • $

Papa ate the caviar SHIFT

  • Papa $

ate the caviar SHIFT

  • ate Papa $

the caviar LEFTARC ate->Papa ate $ caviar SHIFT ate->Papa the ate $

  • SHIFT

ate->Papa caviar the ate $

  • LEFTARC

ate->Papa, caviar-> the caviar ate $

  • RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $

  • RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate

slide-65
SLIDE 65

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity?

slide-66
SLIDE 66

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear

slide-67
SLIDE 67

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear Q: What’s potentially problematic?

slide-68
SLIDE 68

Arc Standard Parsing

state  {[root], [words], [] } while state ≠ {[root], [], [(deps)]} { t ← ORACLE(state) state ← APPLY(t, state) } return state

Q: What is the time complexity? A: Linear Q: What’s potentially problematic? A: This is a greedy algorithm

slide-69
SLIDE 69

Becoming Less Greedy

Beam search Breadth-first search strategy (CMSC 471/671) At each stage, keep K options open

slide-70
SLIDE 70

Evaluation

Exact Match (per-sentence accuracy) Unlabeled Attachment Score (UAS) Labeled Attachment Score (LS, LAS) Recall/Precision/F1 for particular relation types

slide-71
SLIDE 71

Outline

Review: PCFGs and CKY Dependency Grammars Dependency Parsing (Shift-reduce) From syntax to semantics

slide-72
SLIDE 72

Parsing as a Core NLP Problem

sentence 1 sentence 2 sentence 3 sentence 4 Parser Grammar

Evaluation

score Other NLP task (entity coref., MT, Q&A, …)

independent

  • perations

Gold (correct) reference trees

slide-73
SLIDE 73

From Dependencies to Shallow Semantics

slide-74
SLIDE 74

From Syntax to Shallow Semantics

Angeli et al. (2015)

“Open Information Extraction”

slide-75
SLIDE 75

From Syntax to Shallow Semantics

http://corenlp.run/(constituency & dependency) https://github.com/hltcoe/predpatt http://openie.allenai.org/ http://www.cs.rochester.edu/research/knext/browse/ (constituency trees) http://rtw.ml.cmu.edu/rtw/

Angeli et al. (2015)

“Open Information Extraction” a sampling of efforts