PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap - - PowerPoint PPT Presentation
PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap - - PowerPoint PPT Presentation
PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap from last time Probabilistic Context Free Grammar 1.0 S NP VP 1.0 PP P NP .4 NP Det Noun .34 AdjP Adj Noun .3 NP Noun .26 VP V NP .2 NP Det AdjP
Recap from last time…
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
1.0 S NP VP .4 NP Det Noun .3 NP Noun .2 NP Det AdjP .1 NP NP PP 1.0 PP P NP .34 AdjP Adj Noun .26 VP V NP .0003 Noun Baltimore … Q: What are the distributions? What must sum to 1?
A: P(X Y Z | X)
Probabilistic Context Free Grammar
p(
S NP VP
Noun
Baltimore
Verb
NP
is a great city
)=
p(
S NP VP ) *
p( ) * p( ) *
NP
Noun
p( ) *
Noun
Baltimore
VP
Verb
NP
p( ) *
Verb
is
p( )
NP
a great city
product of probabilities of individual rules used in the derivation
Probabilistic Context Free Grammar (PCFG) Tasks
Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters
any
CKY Precondition
Grammar must be in Chomsky Normal Form (CNF) non-terminal non-terminal non-terminal non-terminal terminal
X Y Z X a
binary rules can only involve non-terminals unary rules can only involve terminals no ternary (+) rules
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Goal: (S, 0, 7)
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
NP VP
6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end
CKY Recognizer
Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i j
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
X Y Z Y Z
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
Q: What do we return? A: S in T[0][N] Q: How do we get the parse? A: Follow backpointers
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Work through parse on board
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
NP S S
1
V VP VP
2
Det NP NP
3
N
4
P PP
5
Det NP
6
N V
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X Y Z : G) { T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end]) } } } }
CKY Recognizer
T = bool[K][N][N+1] for(j = 1; j ≤ N; ++j) { for(non-terminal X in G if X wordj) { T[X][j-1][j] = True } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X Y Z : G) { T[X][start][end] = T[Y][start][mid] & T[Z][mid][end] } } } }
Probabilistic Context Free Grammar (PCFG) Tasks
Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters
CKY Viterbi Parser
Input: * string of N words * probabilistic grammar in CNF Output: Parse with probability (or None) Data structure: K*N*N table T K non-terminal symbols in the grammar Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[X][i][j] lists most likely constituents beginning with rule X spanning i j
T = WeightedCell[K][N][N+1] for(j = 1; j ≤ N; ++j) { T[X][j-1][j] = argmax { p(X wordj) for non-terminal X in G if X wordj } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X Y Z : G} } } }
CKY Viterbi
“Papa ate the caviar with a spoon”
1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det NP NP
3
.6 N
4
1.0 P PP
5
.5 Det NP
6
.4 N .1 V
1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det .18 NP NP
3
.6 N
4
1.0 P .12 PP
5
.5 Det .12 NP
6
.4 N .1 V
.5 * .6 * .6 = .18
.6 NP Det N
1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det .18 NP NP
3
.6 N
4
1.0 P .12 PP
5
.5 Det .12 NP
6
.4 N .1 V
.0972 .0065 1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det .18 NP NP
3
.6 N
4
1.0 P .12 PP
5
.5 Det .12 NP
6
.4 N .1 V
.0972 .0065 .00351 .00467 1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det .18 NP NP
3
.6 N
4
1.0 P .12 PP
5
.5 Det .12 NP
6
.4 N .1 V
.0972 .0065 .00351 .00467 .000467 1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
“Papa ate the caviar with a spoon”
1 2 3 4 5 6 7
Example from Jason Eisner
1 2 3 4 5 6 7
.1 NP S S
1
.9 V VP VP
2
.5 Det .18 NP NP
3
.6 N
4
1.0 P .12 PP
5
.5 Det .12 NP
6
.4 N .1 V
.0972 .0065 .00467 .000467 1.0 S NP VP .6 NP Det N .3 NP NP PP .6 VP V NP .4 VP VP PP 1.0 PP P NP .1 NP Papa .6 N caviar .4 N spoon .1 V spoon .9 V ate 1.0 P with .5 Det the .5 Det a
T = WeightedCell[K][N][N+1] T[*][*][*] = 0 for(j = 1; j ≤ N; ++j) { T[X][j-1][j] = argmax { p(X wordj) for non-terminal X in G if X wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X Y Z : G} } } }
CKY Viterbi
Adapted from Jason Eisner
T = bool[K][N][N+1] T[*][*][*] = False for(j = 1; j ≤ N; ++j) { T[X][j-1][j] |= { True for non-terminal X in G if X wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] |= { True & T[Y][start][mid] & T[Z][mid][end] for rule X Y Z : G} } } }
CKY Recognizer
Adapted from Jason Eisner
CKY Comparison
Recognizer
T = bool[K][N][N+1] T[*][*][*] = False for(j = 1; j ≤ N; ++j) { T[X][j-1][j] |= { True for non-terminal X in G if X wordj } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] |= { True & T[Y][start][mid] & T[Z][mid][end] for rule X Y Z : G} } } }
Viterbi
T = WeightedCell[K][N][N+1] T[*][*][*] = 0 for(j = 1; j ≤ N; ++j) {
T[X][j-1][j] = argmax { p(X wordj) for non-terminal X in G if X wordj}
} for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X Y Z : G} } } }
T = SemiRingCell[K][N][N+1] T[*][*][*] = ⓪ for(j = 1; j ≤ N; ++j) { T[X][j-1][j] ⊕= { p(X wordj) for non-terminal X in G if X wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] ⊕= [ p(X Y Z) ⊗ T[Y][start][mid] ⊗ T[Z][mid][end] for rule X Y Z : G] } } }
General CKY
Adapted from Jason Eisner
CKY Algorithms
Weights ⊕ ⊗ ⓪ ①
Recognizer Boolean (True/False)
- r
and False True Viterbi [0,1] max * 1
Adapted from Jason Eisner