PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap - - PowerPoint PPT Presentation

pcfgs viterbi cky
SMART_READER_LITE
LIVE PREVIEW

PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap - - PowerPoint PPT Presentation

PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap from last time Probabilistic Context Free Grammar 1.0 S NP VP 1.0 PP P NP .4 NP Det Noun .34 AdjP Adj Noun .3 NP Noun .26 VP V NP .2 NP Det AdjP


slide-1
SLIDE 1

PCFGs: Viterbi CKY

CMSC 473/673 UMBC November 13th, 2017

slide-2
SLIDE 2

Recap from last time…

slide-3
SLIDE 3

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S  NP VP .4 NP  Det Noun .3 NP  Noun .2 NP  Det AdjP .1 NP  NP PP 1.0 PP  P NP .34 AdjP  Adj Noun .26 VP  V NP .0003 Noun  Baltimore … Q: What are the distributions? What must sum to 1?

A: P(X  Y Z | X)

slide-4
SLIDE 4

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

p(

S NP VP ) *

p( ) * p( ) *

NP

Noun

p( ) *

Noun

Baltimore

VP

Verb

NP

p( ) *

Verb

is

p( )

NP

a great city

product of probabilities of individual rules used in the derivation

slide-5
SLIDE 5

Probabilistic Context Free Grammar (PCFG) Tasks

Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters

any

slide-6
SLIDE 6

CKY Precondition

Grammar must be in Chomsky Normal Form (CNF) non-terminal  non-terminal non-terminal non-terminal  terminal

X  Y Z X  a

binary rules can only involve non-terminals unary rules can only involve terminals no ternary (+) rules

slide-7
SLIDE 7

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Goal: (S, 0, 7)

slide-8
SLIDE 8

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

NP VP

6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end

slide-9
SLIDE 9

CKY Recognizer

Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i  j

slide-10
SLIDE 10

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

X Y Z Y Z

slide-11
SLIDE 11

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

Q: What do we return? A: S in T[0][N] Q: How do we get the parse? A: Follow backpointers

slide-12
SLIDE 12

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Work through parse on board

slide-13
SLIDE 13

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-14
SLIDE 14

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-15
SLIDE 15

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-16
SLIDE 16

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-17
SLIDE 17

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-18
SLIDE 18

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-19
SLIDE 19

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

NP S S

1

V VP VP

2

Det NP NP

3

N

4

P PP

5

Det NP

6

N V

slide-20
SLIDE 20

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

slide-21
SLIDE 21

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X  Y Z : G) { T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end]) } } } }

slide-22
SLIDE 22

CKY Recognizer

T = bool[K][N][N+1] for(j = 1; j ≤ N; ++j) { for(non-terminal X in G if X  wordj) { T[X][j-1][j] = True } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X  Y Z : G) { T[X][start][end] = T[Y][start][mid] & T[Z][mid][end] } } } }

slide-23
SLIDE 23

Probabilistic Context Free Grammar (PCFG) Tasks

Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters

slide-24
SLIDE 24

CKY Viterbi Parser

Input: * string of N words * probabilistic grammar in CNF Output: Parse with probability (or None) Data structure: K*N*N table T K non-terminal symbols in the grammar Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[X][i][j] lists most likely constituents beginning with rule X spanning i  j

slide-25
SLIDE 25

T = WeightedCell[K][N][N+1] for(j = 1; j ≤ N; ++j) { T[X][j-1][j] = argmax { p(X  wordj) for non-terminal X in G if X  wordj } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X  Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X  Y Z : G} } } }

CKY Viterbi

slide-26
SLIDE 26

“Papa ate the caviar with a spoon”

1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar

slide-27
SLIDE 27

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det NP NP

3

.6 N

4

1.0 P PP

5

.5 Det NP

6

.4 N .1 V

1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-28
SLIDE 28

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det .18 NP NP

3

.6 N

4

1.0 P .12 PP

5

.5 Det .12 NP

6

.4 N .1 V

.5 * .6 * .6 = .18

.6 NP  Det N

1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-29
SLIDE 29

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det .18 NP NP

3

.6 N

4

1.0 P .12 PP

5

.5 Det .12 NP

6

.4 N .1 V

.0972 .0065 1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-30
SLIDE 30

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det .18 NP NP

3

.6 N

4

1.0 P .12 PP

5

.5 Det .12 NP

6

.4 N .1 V

.0972 .0065 .00351 .00467 1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-31
SLIDE 31

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det .18 NP NP

3

.6 N

4

1.0 P .12 PP

5

.5 Det .12 NP

6

.4 N .1 V

.0972 .0065 .00351 .00467 .000467 1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-32
SLIDE 32

“Papa ate the caviar with a spoon”

1 2 3 4 5 6 7

Example from Jason Eisner

1 2 3 4 5 6 7

.1 NP S S

1

.9 V VP VP

2

.5 Det .18 NP NP

3

.6 N

4

1.0 P .12 PP

5

.5 Det .12 NP

6

.4 N .1 V

.0972 .0065 .00467 .000467 1.0 S  NP VP .6 NP  Det N .3 NP  NP PP .6 VP  V NP .4 VP  VP PP 1.0 PP  P NP .1 NP  Papa .6 N  caviar .4 N  spoon .1 V  spoon .9 V  ate 1.0 P  with .5 Det  the .5 Det  a

slide-33
SLIDE 33

T = WeightedCell[K][N][N+1] T[*][*][*] = 0 for(j = 1; j ≤ N; ++j) { T[X][j-1][j] = argmax { p(X  wordj) for non-terminal X in G if X  wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X  Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X  Y Z : G} } } }

CKY Viterbi

Adapted from Jason Eisner

slide-34
SLIDE 34

T = bool[K][N][N+1] T[*][*][*] = False for(j = 1; j ≤ N; ++j) { T[X][j-1][j] |= { True for non-terminal X in G if X  wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] |= { True & T[Y][start][mid] & T[Z][mid][end] for rule X  Y Z : G} } } }

CKY Recognizer

Adapted from Jason Eisner

slide-35
SLIDE 35

CKY Comparison

Recognizer

T = bool[K][N][N+1] T[*][*][*] = False for(j = 1; j ≤ N; ++j) { T[X][j-1][j] |= { True for non-terminal X in G if X  wordj } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] |= { True & T[Y][start][mid] & T[Z][mid][end] for rule X  Y Z : G} } } }

Viterbi

T = WeightedCell[K][N][N+1] T[*][*][*] = 0 for(j = 1; j ≤ N; ++j) {

T[X][j-1][j] = argmax { p(X  wordj) for non-terminal X in G if X  wordj}

} for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] = argmax { p(X  Y Z) * T[Y][start][mid] * T[Z][mid][end] for rule X  Y Z : G} } } }

slide-36
SLIDE 36

T = SemiRingCell[K][N][N+1] T[*][*][*] = ⓪ for(j = 1; j ≤ N; ++j) { T[X][j-1][j] ⊕= { p(X  wordj) for non-terminal X in G if X  wordj} } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { T[X][start][end] ⊕= [ p(X  Y Z) ⊗ T[Y][start][mid] ⊗ T[Z][mid][end] for rule X  Y Z : G] } } }

General CKY

Adapted from Jason Eisner

slide-37
SLIDE 37

CKY Algorithms

Weights ⊕ ⊗ ⓪ ①

Recognizer Boolean (True/False)

  • r

and False True Viterbi [0,1] max * 1

Adapted from Jason Eisner