1
Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs)
Berlin Chen 2003
References: 1. Speech and Language Processing, chapter 12 2. Foundations of Statistical Natural Language Processing, chapters 11, 12
Probabilistic Context-Free Probabilistic Context-Free Grammars - - PowerPoint PPT Presentation
Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 12 2. Foundations of Statistical Natural Language Processing, chapters 11, 12 1
1
References: 1. Speech and Language Processing, chapter 12 2. Foundations of Statistical Natural Language Processing, chapters 11, 12
2
Parsing as Search
3
4
P(A→β) or P(A→β|A)
1 = → ∀
β A P A
β
Booth, 1969 words syntactic categories lexical categories
5
6
The probability of a particular parse is defined as the product of the probabilities
in the parse tree
7
8
( )
+ j j c k k
word positions in the input string
j kl j kl
j kl j kl j kl
N j w1 …….wk ………..wl ……. wn
c+1 words
9
chain rule context-free & ancestor-free assumptions Place-invariant assumption
10
11
sentence
k j s r j
k i s r i
nV matrix of parameters (when n nonterminals and V terminals ) n3 matrix of parameters (when n nonterminals ) n3+nV parameters
( ) ( )
1
,
= → + →
k k i s r s r j
w N P N N N P
For lexical categories For syntactic categories
12
Collins, 1999 Ney, 1991
N a w1 …….wi ………..wj ……. wn
13
i
w A P →
i i
w A w A iff
*
→ ⇒ C rule
least at is there iff
*
B A w A
ij
→ ⇒ and symbols last the derives and symbols 1 first the derives here j-k C k-i B w +
Choose the maximum among all possibilities A must be a lexical category A must be a syntactic category A B C
i j k k+1
14
A B C
begin end m m+1
Finding the most Likely parse for a sentence set to zero m-word input string n non-terminals O(m3n3)
bookkeeping
15
Training the PCFG
16
Baker 1979 Young 1990
17
G N w P q p
j pq pq j
, , = β
G w N w P q p
m q j pq p j ) 1 ( ) 1 ( 1
, , ,
+ −
= α
18
m G N w P G w N P G w P
m m m m
, 1 ,
1 1 1 1 1 1 1
β = = ⇒ =
G N w P G w N P G N w P k k
m m k j j kk k j
, , ,
1 1 1
= → = = β
j
word-span=1 word-span > 1
19
( )
( ) ( )
( ) ( )
( )
( )
( ) ( )
( )
( ) ( )
( ) ( )
q d d p N N N P G N w P G N w P G N N N P G w N N N w P G N N N w P G N N N P G N N w N w P G N N w N w P G N w P G w N P q p m q p j
s r s r q p d s r j s q d q d r pd pd s r q p d j pq s q d r pd pd s q d r pd j pq q d s q d r pd j pq pd s r q p d j pq s q d r pd s r q p d j pq s q d q d r pd pd s r q p d j pq s q d q d r pd pd j pq pq pq j pq j
, 1 , , , , , , , , , , , , , , , , , , , , , , , , 1 ,
, 1 1 1 , 1 1 1 1 1 , 1 1 , 1 1 1 , 1 1 1
+ × × → = × × = × × = = = = ⇒ = ≤ < ≤ ∀
− = + + − = + + + + − = + − = + + − = + +
β β β
j
context-free & ancestor-free assumptions Place-invariant assumption the binary rule chain rule
20
5 , 4 3 , 2 PP VP VP 5 , 3 2 , 2 NP V VP 5 , 2
PP VP NP V VP
β β β β β → + → = P P
0.7 1.0 0.01296 0.3 0.126 0.18 0.015876
5 , 2 1 , 1 VP NP S 5 , 1
VP NP S
β β β → = P
1.0 0.1 0.015867 0.0015867
begin end
21
G w N P k k G w N w w P G w N w P G N w w w P G N w P G w P
k j j j m k j kk k kk j m k j kk k j j kk m k kk k j j kk m m
→ = = = =
+ − + − + −
, , , , , , , , , ,
) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( 1 1 1
α
1 for , 1 1 , 1
1
≠ = = j m m
j
α α
j
context-free & place-invariant assumptions
N j’s are lexical categories
chain rule
22
j
( )
+ = + = =
− = − − − + − ≠ + = + + + + − − = − + − ≠ + = + + − + − g f p e g p e p e f eq j pq g p e f eq m q e j g f m q e g e q e q f eq g e q j pq f eq m e p g f p e j pq g p e f eq m q p j g f m q e g e q j pq f pe m q p m q j pq p j
N w P N N N P N w w P N w P N N N P N w w P N N N w w P N N N w w P G w N w P q p
, 1 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 , 1 ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( 1 , 1 1 ) 1 ( ) 1 ( ) 1 ( 1 , 1 ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( 1
, , , , , , , , , , , , , , , , , α
( ) (
( ) ( ) (
( )
− → + + → =
− = ≠ + = g f p e g j g f f j g f m q e g g j f f
p e N N N P q e e q N N N P e p
, 1 1 , 1
1 , , , 1 , β α β α
Chain rule & context-free & ancestor-free assumptions
23
g g j f f g e q e q f pe g e q j pq f pe m e p f pe g e q j pq e q f pe g e q j pq f pe m e p f pe g e q j pq e q f pe m e p f pe m e p g e q j pq e q f pe m e p g e q j pq f pe m e e q p g e q j pq f pe m q p
) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( 1 ) 1 ( ) 1 ( ) 1 ( 1
+ + + + − + + + + − + + + − + − + + + − + + + − + + −
24
( )
( )
( )
j pq m j pq m q pq p j pq j pq pq j pq m q p j pq j pq pq m q j pq p j j
1 ) 1 ( 1 1 ) 1 ( 1 1 ) 1 ( 1 1
+ − + − + −
j j j pq m
1
25
i pq i
i pq i
( ) ( ) ( )
......
3 3 3 3 2 2 2 2 1 1 1 1
1 1 1 k q r j i k q r j i k q r j i
N N N N N N N N N
pr pq pr pq pr pq
+ + +
→ → →
Store the
Different combinations
different word ranges
26
p i i
k j k j i q r p n k j i
, 1
< ≤ ≤ ≤
k j k j i q r p n k j i
, 1
< ≤ ≤ ≤
r k j , ,
1
1 , 1 m
The corresponding tree
r k j q p ψ N X
i i pq
, , , , If = =
χ
k q r i pq j pr i pq
N N N N
) 1 (
right left
+
= =
three elements stored
wp ….. wr w(r+1) …wq N i N j N k
27
γ
j j j
The count of number of times a particular rule is used The new probability
28
chosen)
1 i i
+
29
( ) ( )
1 i i
G W P G W P ≥
+
( ) ( )
⇒ ⇒ ⇒ = ⇒ ⇒ = = G w N w N P G w N P G w N w N P G N w P q p q p
m pq j m pq j m j pq m j j
, , , , ,
1 * 1 * 1 * 1 * 1 * 1 1
β α
⇒ G w N P
m 1 * 1
( ) ( )
π β α q p q p G w N w N P
j j m pq j
, , ,
1 * 1 *
= ⇒ ⇒
j
N
( ) ( )
= =
=
m p m p q j j j
q p q p N E
1
, , derivation in the used is π β α
Sum over all regions
could dominate in a sentence
The probability of all possible parses
( )
m , 1
1
β
30
s r j
N N N →
− = + = =
+ → = →
1 1 1 1
, 1 , , used
m p m p q q- p d s r s r j j s r j
q d d p N N N P q p N N N E π β β α
s r j
N N N →
= = − = + = =
+ → = →
m p m p q j j m p m p q q- p d s r s r j j s r j
q p q p q d d p N N N P q p N N N P
1 1 1 1 1
, , , 1 , , ˆ β α β β α
The training formulas for a single sentence.
31
k j
π β α π α
= =
= = = → = →
m h j k h j m h k h h j j k j
h h w w P h h w w w N P h h w N E
1 1
, , , , used
Acts like a indicating function
k j
= = =
m p m p q j j m h j k h j k j
1 1
The training formulas for a single sentence.
32
⇒ + → = ∑ = G W N P q d d p N N N P q p s r j q p f
i q- p d s r s r j j i * 1 1
, 1 , , , , , , β β α
ω
W W W ,...,
1
=
⇒ = G W N P q p q p j q p h
i j j i * 1
, , , , β α
⇒ = = G W N P h h w w P h h k j h g
i j k h j i * 1
, , , , β α
33
The training formulas using all sentences.
s r j
N N N →
= = = = − = + =
= →
ω ω 1 1 1 1 1 1
, , , , , , ˆ
i i m p i m p q i i i m p i m p q i s r j
j q p h s r j q p f N N N P
k j
= = = = =
= →
ω ω 1 1 1 1
, , , , ˆ
i i m p i m p q i i i m h i k j
j q p h k j h g w N P
34
35
NP is a subject in a sentence? NP is an object in a sentence?
Talk about topic or old information Introduce new referents Switchboard: (for declarative sentences) 91% subjects are pronouns (9%: lexical nouns) 66% objects are lexical nouns (34% pronouns)
36
Moscow sent more than 100,000 soldiers into Afghanistan NP →NP PP or VP → VP PP
37
38
Pronouns, proper names, and definite NPs appear more commonly in subject position NPs containing post-head modifiers and bare nouns
39
40
41
Black et al., 1992
42
NP → NP PP
43
VP (dumped) → VBD (dumped) NP (sacks) PP (into) VP (dumped) → VBD (dumped) NP (cats) PP (into) VP (dumped) → VBD (dumped) NP (hats) PP (into) VP (dumped) → VBD (dumped) NP (sacks) PP (above) …….. [3x10-10] [8x10-11] [4x10-10] [1x10-12]
44
incorrect correct
45
n: the syntactic category of a parse-tree node
h(n): the headword of a parse-tree node P(r|VP, dumped): the prob. of the rule P(r|VP, slept): the prob. of the rule
46
i
P(head(n)=sacks|n=VP, h(m(n))=dumped) X(dumped) NP(?sacks?)
The prior probability
47
∈
T n
head-rule probability head-head probability
( ) ( ) ( ) ( )
67 . 9 6 , = = → → = →
β dumped VP C PP NP VBD dumped VP C dumped VP PP NP VBD VP P
( ) ( ) ( ) ( )
9 , = = → → = →
β dumped VP C NP VBD dumped VP C dumped VP NP VBD VP P
( )
( ) ( ) ( ) ( )
22 . 9 2 ... ... )... ( ... , = = → → = ∑ PP dumped X C into PP dumped X C dumped PP into P
( )
( ) ( ) ( ) ( )
... ... )... ( ... , ⇒ = → → = ∑ PP sacks X C into PP sacks X C sacks PP into P Counting from Brown corpus
48
49
50
51
52
Harry eats apples
NP V NP VP/NP S\NP
53
# of correct constituents in candidate parse of a sentence s # of correct constituents in treebank parse of a sentence s # of correct constituents in candidate parse of a sentence s # of total constituents in candidate parse of a sentence s The correct constituent must have the same starting time, ending time, and non-terminal symbol as the “gold standard”
54