PCFG : P robabilistic C ontext F ree G rammars Presenter: Ba Dat - - PowerPoint PPT Presentation

pcfg p robabilistic c ontext f ree g rammars
SMART_READER_LITE
LIVE PREVIEW

PCFG : P robabilistic C ontext F ree G rammars Presenter: Ba Dat - - PowerPoint PPT Presentation

PCFG : P robabilistic C ontext F ree G rammars Presenter: Ba Dat Nguyen Advisor: Dr. Martin Theobald Max-Planck-Institut fr Informatik Saarbrcken, Germany Probabilistic Context Free Grammars 2 / 25 Outline Introduction P


slide-1
SLIDE 1

Presenter: Ba Dat Nguyen Advisor: Dr. Martin Theobald

Max-Planck-Institut für Informatik Saarbrücken, Germany

PCFG: Probabilistic Context Free Grammars

slide-2
SLIDE 2
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

2 / 25

Probabilistic Context Free Grammars

slide-3
SLIDE 3

The World is a big ambiguity

3 / 25

Probabilistic Context Free Grammars

slide-4
SLIDE 4

PCFG is a good way to solve ambiguity problems in syntactic structure field.

Probabilistic Context Free Grammars

Solution

slide-5
SLIDE 5
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

5 / 25

Probabilistic Context Free Grammars

slide-6
SLIDE 6
  • Language
  • Structural
  • Ambiguous
  • Grammar
  • Generalization of regularities in language structures
  • Morphology and syntax

Language and Grammar

6 / 25

Probabilistic Context Free Grammars

slide-7
SLIDE 7
  • Process working out the grammatical structure of

sentences.

  • Basic Parsing Algorithms
  • Parsing Strategies
  • CYK Algorithm
  • Earley Algorithm

Parsing

7 / 25

Probabilistic Context Free Grammars

slide-8
SLIDE 8
  • “She is a nice girl”

S NP VP PRP VBZ NP She is DT JJ NN a nice girl

Example of parsing

8 / 25

Probabilistic Context Free Grammars

slide-9
SLIDE 9
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

9 / 25

Probabilistic Context Free Grammars

slide-10
SLIDE 10

Probabilistic Context Free Grammars

Chomsky hierarchy

  

    A   A

aB A a A  

  • r

ls nontermina and terminals

  • f

strings are γ β, α, terminal a is a ls nontermina are B A, : Where

slide-11
SLIDE 11
  • A Context Free Grammars consists of
  • A set of terminals { }, k = 1,... V
  • A set of nonterminals { }, i = 1,... n
  • A designated start symbol
  • A set of rules { }

where is a sequence of terminals and nonterminals

Probabilistic Context Free Grammars

Context Free Grammars (CFG)

k

w

i

N

1

N

j i

N  

j

slide-12
SLIDE 12

S -> NP VP NP -> NP PP PP -> P NP NP -> astronomers VP -> V NP NP -> ears VP -> VP PP NP -> saw P -> with NP -> stars V -> saw NP -> telescopes

Probabilistic Context Free Grammars

Example of CFG

slide-13
SLIDE 13

S NP VP astronomers V NP saw NP PP Which one stars P NP is better? with ears

Probabilistic Context Free Grammars

S NP VP astronomers VP PP V NP P NP saw stars with ears

Ambiguous sentences

slide-14
SLIDE 14
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

14 / 25

Probabilistic Context Free Grammars

slide-15
SLIDE 15
  • A Probabilistic Context Free Grammars (PCFG)

consists of

  • A CFG
  • A corresponding set of probabilities on rules such that:

Probabilistic Context Free Grammars

Probabilistic CFG

 

j j i

N P 1 ) ( 

i 

slide-16
SLIDE 16

S -> NP VP 1.0 NP -> NP PP 0.4 PP -> P NP 1.0 NP -> astronomers 0.1 VP -> V NP 0.7 NP -> ears 0.18 VP -> VP PP 0.3 NP -> saw 0.04 P -> with 1.0 NP -> stars 0.18 V -> saw 1.0 NP -> telescopes 0.1

Probabilistic Context Free Grammars

Example of PCFG

slide-17
SLIDE 17

NP NP PP stars P NP with ears

Probabilistic Context Free Grammars

Probability of a tree

stars) NP PP, NP NP P PP with P ears P(NP stars) NP PP, NP NP P PP with P(P stars) NP PP, NP NP P P(PP NP PP) stars| NP P(NP NP PP) P(S ears) with, NP P P stars, PP NP PP, NP P(NP                          NP, , | NP, | | NP NP,

slide-18
SLIDE 18
  • Place invariance
  • Context-free
  • Ancestor-free

Probabilistic Context Free Grammars

Assumptions

same ξ) is the k P(N j

c) k(k

 

) ( | (     

j kl j kl

N P hrough l) utside k t anything o N P ) ( | (     

j kl j kl j kl

N P ) ide N nodes outs ancestor any N P

c k k j

w w N

...

slide-19
SLIDE 19

NP NP PP stars P NP with ears

Probabilistic Context Free Grammars

Probability of a tree

ears) P(NP with) P(P ) P P(PP stars) P(NP NP PP) P(S ears) with, NP P P stars, PP NP PP, NP P(NP                NP NP,

slide-20
SLIDE 20

S1.0 NP0.1 VP0.7 astronomers V1.0 NP0.4 saw NP0.18 PP1.0 stars P1.0 NP0.18 with ears

Probabilistic Context Free Grammars

S1.0 NP0.1 VP0.3 astronomers VP0.7 PP1.0 V1.0 NP0.18 P1.0 NP0.18 saw stars with ears

1.0x0.1x0.7x1.0x0.4x 0.18x1.0x1.0x0.18 = 0.0009072 1.0x0.1x0.3x0.7x1.0x 0.18x1.0x1.0x0.18 = 0.0006804

Ambiguity

slide-21
SLIDE 21
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

21 / 25

Probabilistic Context Free Grammars

slide-22
SLIDE 22
  • Given a training set of annotated sentences

C(.) - number of times that a particular rule is used.

Probabilistic Context Free Grammars

Probability of a rule

   

γ j j j

γ) C(N ξ) C(N ξ) P(N

slide-23
SLIDE 23

Probabilistic Context Free Grammars

Probability of a rule

How to calculate if there is no annotated data!

slide-24
SLIDE 24
  • Maximum Likelihood Estimation
  • No known analytic method to choose µ to maximize

P(O | µ)

  • Locally maximize P(O | µ) by an iterative hill-climbing –

special case of Expectation Maximization method.

  • Inside-Outside algorithm is a form of EM using the

inside-outside probabilities estimated from training set.

Probabilistic Context Free Grammars

Maximum Likelihood Estimation

set grammar current

  • f

parameters ) | ( max arg   

 training

O P

slide-25
SLIDE 25
  • We are given
  • A set of training sentences
  • A set of terminals
  • A set of nonterminals
  • Initial probabilities are estimated by rules (perhaps

by randonly chosen)

  • Using inside-outside algorithm to train

Probabilistic Context Free Grammars

Training a PCFG

slide-26
SLIDE 26
  • Outside probability

Inside probability

Probabilistic Context Free Grammars

Inside-Outside probabilities

1

N

j

N

m 1 q q p 1 p 1

w ... w w ... w w ... w

 

) , ( q p

j

 ) , ( q p

j

slide-27
SLIDE 27
  • Inside probability is the probability of sequence

being generated with a tree rooted by node

  • Calculation can be carried out bottom-up

Probabilistic Context Free Grammars

Inside probabilities

) , ( q p

j

q p w

w ...

j

N

) ( ) , (

k j j

w N P k k    ) , 1 ( ) , ( ) ( ) , (

, 1

q d d p N N N P q p

s r s r s r q q d j j

  

 

  

) | ( ) , (

j pq pq j

N w P q p  

slide-28
SLIDE 28
  • Outside probability is the total probability of

beginning with the start symbol and generating all the words outside

Probabilistic Context Free Grammars

Outside probabilities

) , ( q p

j

j pq

N

) 1 , ( ) ( ) , ( ) , 1 ( ) ( ) , ( ) , ( 1 1 1 1 ... ) , , ( ) , (

, 1 1 , 1 1 * , 1 1 , 1

            

  

     

p e N N N P q e e q N N N P e p q p for j ,m) ( and α ,m) ( α w w N N w N w P q p

g j g f g f p e f g g j f g f m q e f j j q p j j pq n q j pq p j

     

slide-29
SLIDE 29

We have:

Probabilistic Context Free Grammars

Inside-Outside Algorithm

      ) , ( ) , ( ) | ( ) ( ) , ( ) , ( ) , (

, 1 * 1 , * , 1 * 1 , * , 1 * 1

q p q p w N w N P w N P Call w N w N P q p q p

j j m q p j m q p j m j j

           

slide-30
SLIDE 30

Probabilistic Context Free Grammars

Inside-Outside Algorithm

      

   

       

    

1 1 1 1 1

) , 1 ( ) , ( ) ( ) , ( ) , ( ) , ( ) , ( ) (

m p m p q q p d s r s r j j j s r j m p m p q j j j

q d d p N N N P q p used N N N N E q p q p used is N E

slide-31
SLIDE 31

Probabilistic Context Free Grammars

Inside-Outside Algorithm

     

          

      

m p m q j j m h j k h j k j m p m q j j m p m p q q p d s r s r j j s r j

q p q p h h w w P h h w N P q p q p q d d p N N N P q p N N N P Therefore

1 1 1 1 1 1 1 1 1

) , ( ) , ( ) , ( ) ( ) , ( ) ( ) , ( ) , ( ) , 1 ( ) , ( ) ( ) , ( ) ( :         

slide-32
SLIDE 32

Probabilistic Context Free Grammars

Inside-Outside Algorithm

) ( ) , ( ) , ( ) , , ( ) ( ) , ( ) ( ) , ( ) , , ( ) ( ) , 1 ( ) , ( ) ( ) , ( ) , , , , ( corpus in the W sentence each For

* 1 * 1 * 1 1 i i j j i i j k h j i i q p d s r s r j j i

W N P q p q p j q p h W N P h h w w P h h k j h g W N P q d d p N N N P q p s r j q p f            

 

      

slide-33
SLIDE 33

Probabilistic Context Free Grammars

Inside-Outside Algorithm

    

            

   

l i m p m p q i l i m h i k j l i m p m p q i l i m p m p q i s r j

i i i i i i i

j q p h k j h g w N P j q p h s r j q p f N N N P

1 1 1 1 1 1 1 1 1 1

) , , ( ) , , ( ) ( ) , , ( ) , , , , ( ) ( have We

slide-34
SLIDE 34
  • Inside-Outside algorithm is quite slow

for each sentence

  • m is the length of the sentence
  • n is the number of nonterminals
  • The algorithm is very sensitive to the initialization of

the parameters.

  • In practice, a PCFG is a worse language model for

English than an n-gram model (n>1).

Probabilistic Context Free Grammars

Discussion

) (

3 3n

m O

slide-35
SLIDE 35
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

35 / 25

Probabilistic Context Free Grammars

slide-36
SLIDE 36

TOP -> S(bought) S(bought) -> NP(week) NP(Marks) VP(bought) NP(week) -> JJ(Last) NN(week) NP(Marks) -> NNP(Marks) VP(bought) -> VB(bought) NP(Brooks) NP(Brooks) -> NNP(Brooks)

  • Adding some lexical features: words and POS inside

non-terminals.

  • Using the head-child of the phrase, which inherits the

head-word from its parent.

Probabilistic Context Free Grammars

More features

slide-37
SLIDE 37

TOP S(bought) NP(week) NP(Marks) VP(bought) JJ NN NNP VB NP(Brooks) Last week Marks bought NNP Brooks

Probabilistic Context Free Grammars

Example

slide-38
SLIDE 38
  • Using some probabilities

– Head constituent label of the phrase – Modifiers to the right of the head – Modifiers to the left of the head

Probabilistic Context Free Grammars

Using Distance

) , | ( h P H P

H

    1 .. 1 1 1 1 1

)) ( )... ( , , , | ) ( (

m i i i i i R

r R r R H h P r R P

    1 .. 1 1 1 1 1

)) ( )... ( , , , | ) ( (

n i i i i i L

l L l L H h P l L P

slide-39
SLIDE 39

Probabilistic Context Free Grammars

Using Distance

) , , | ) ( ( ) , , | ) ( ( ) , | ( )) ( ) ( ) ( ) ( ( distance With ) , , , , | ) ( ( ) , , | ) ( ( ) , | ( )) ( ) ( ) ( ) ( ( : bought VP S week NP P bought VP S Marks NP P bought S VP P bought VP Marks NP week NP bought S P Marks NP bought VP S week NP P bought VP S Marks NP P bought S VP P bought VP Marks NP week NP bought S P example For

l l h l l h

       

slide-40
SLIDE 40
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

40 / 25

Probabilistic Context Free Grammars

slide-41
SLIDE 41

TOP S(bought) NP(week) NP-C(Marks) VP(bought) JJ NN NNP VB NP-C(Brooks) Last week Marks bought NNP Brooks It would be useful to identify “Marks” as a subject and “Last week” as an adjunct!

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

slide-42
SLIDE 42
  • Adding “-C“ suffix to all non-terminals in training data which

satisfy:

  • The nonterminal must be: an NP, SBAR or S whose parent is an S;

an NP, SBAR, S, or VP whose parent is a VP; or S whose parent is an SBAR

  • The non-terminal must not have one of the following semantic tags:

ADV, VOC, BNF, DIR, EXT, LOC, MNR, TMP, CLR or PRP.

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

slide-43
SLIDE 43
  • Using some probabilities
  • Head constituent label of the phrase
  • Left and right subcat frames
  • Modifiers to the right of the head
  • Modifiers to the left of the head

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

) , | ( h P H P

H

) ), 1 ( distance , , , | , ( RC i h P H r R P

i i R

) , , | ( ) , , | ( h H P RC P and h H P LC P

rc lc

) ), 1 ( distance , , , | , ( LC i h P H l L P

i i L

slide-44
SLIDE 44
  • For example

Probabilistic Context Free Grammars

Adding the complement / adjunct distinction

) , , | ) ( ( }) { , , , | )} ( ({ ) , , | } ({ ) , | ( )) ( ) ( ) ( ) ( ( bought VP S week NP P C NP bought VP S Marks C NP P bought VP S C NP P bought S VP P bought VP Marks C NP week NP bought S P

l l lc h

        

slide-45
SLIDE 45
  • Introduction
  • Probabilistic Context Free Grammars
  • Parsing
  • Context Free Grammars
  • Probabilistic Context Free Grammars
  • Inside-Outside Algorithm
  • Extension
  • Distance
  • Complement/ adjunct distinction
  • Traces and Wh-movement

Outline

45 / 25

Probabilistic Context Free Grammars

slide-46
SLIDE 46
  • Adding a “gap“ feature to each non-terminal in the tree and

propagating gaps through the tree until the are finally discharged as a trace complement.

  • For example

(1) NP

  • >

NP SBAR(+gap) (2) SBAR(+gap)

  • >

WHNP S-C(+gap) (3) S(+gap)

  • >

NP-C VP(+gap) (4) VP(+gap)

  • >

VB TRACE NP

Probabilistic Context Free Grammars

Traces and Wh-movement

slide-47
SLIDE 47

NP(store) NP(store) SBAR(that)(+gap) The store WHNP(that) S(bought)(+gap) WDT NP-C(Marks) VP(bought)(+gap) that Marks VBD TRACE NP(week) bought last week

Probabilistic Context Free Grammars

Traces and Wh-movement

) , , | ) ( ( }) , { , , , | ( ) , , | } ({ ) , , | ( ) , | ( )) ( ) ( ) )( ( ( VB bought VP week NP P gap C NP VB bought VP TRACE P VB bought VP C NP P VB bought VP Right P bought VP VB P week NP TRACE bought VB gap bought VP P

R R RC G h l

         

slide-48
SLIDE 48

Probabilistic Context Free Grammars

Experiment

parse treebank in the t constituen a with boundaries t constituen ate viol which ts constituen

  • f

number Brackets Crossing parse k in treeban ts constituen

  • f

number parse proposed in ts constituen correct

  • f

number recall Labelled parse proposed in ts constituen

  • f

number parse proposed in ts constituen correct

  • f

number Precision Labelled   

23. section

  • n

tested and Treebank Penn the

  • f

portion Journal Street Wall the

  • f

21

  • 02

sections

  • n

trained are Models

slide-49
SLIDE 49
  • 1. Foundations of Statistical Natural Language

Processing

  • 2. Standford parse

http://nlp.stanford.edu:8080/parser/

  • 3. Three Generative, Lexicalised Models for

Statistical Parsing, ACL 97

Probabilistic Context Free Grammars

References

slide-50
SLIDE 50

Thanks!

Probabilistic Context Free Grammars