Probabilistic Parsing: Issues & Improvement LING 571 Deep - - PowerPoint PPT Presentation

probabilistic parsing issues improvement
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Parsing: Issues & Improvement LING 571 Deep - - PowerPoint PPT Presentation

Probabilistic Parsing: Issues & Improvement LING 571 Deep Processing Techniques for NLP October 14, 2019 Shane Steinert-Threlkeld 1 Announcements HW2 grades posted (mean 87) Reference code available in


slide-1
SLIDE 1

Probabilistic Parsing: 
 Issues & Improvement

LING 571 — Deep Processing Techniques for NLP October 14, 2019 Shane Steinert-Threlkeld

1

slide-2
SLIDE 2

Announcements

  • HW2 grades posted (mean 87)
  • Reference code available in
  • /dropbox/19-20/571/hw2/reference_code
  • NB: not needed for HW3; you can assume that all grammars are already in CNF

2

slide-3
SLIDE 3

Homework Tips

  • Use nltk.load for reading grammars; will save you and TA time and

headaches!

  • Run your code on patas to produce the output you submit in TAR file
  • Some discrepancies found that seem due to different environment
  • readme.{txt|pdf}: this should NOT be inside your TAR file, but a

separate upload on Canvas

3

slide-4
SLIDE 4

Notes on HW #3

  • Python’s range has many use cases by manipulating start/end, and step
  • range(n) is equivalent to range(0, n, 1)
  • Reminder: the rhs= argument in NLTK’s grammar.productions()

method only matches the first symbol, not an entire string

  • You’ll want to implement an efficient look-up based on RHS
  • HW3: compare your output to running HW1 parser on the same grammar/

sentences [order of output in ambiguous sentences could differ]

4

slide-5
SLIDE 5

Indigenous Peoples’ Day

  • Seattle/Sealth
  • For those of you taking 550:
  • The Lushootseed spelling [IPA] of Chief Seattle/Sealth:
  • siʔaɫ [ˈsiʔaːɬ]
  • Duwamish — Dxʷdəwʔabš [dxʷdɐwʔabʃ]
  • IPA resources:
  • https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
  • http://web.mit.edu/6.mitx/www/24.900%20IPA/IPAapp.html

5

slide-6
SLIDE 6

Indigenous Peoples’ Day

  • Studying non-English languages gives more holistic insight for NLP tasks
  • Many interesting phenomena in non-Indo-European languages
  • Lushootseed exhibits debatable distinction between verbs and nouns [link to Glottolog

page for more references]

  • ʔux̌ʷ ti sbiaw


goes that-which is-a-coyote
 “The/a coyote goes”

  • sbiaw ti ʔux̌ʷ


is-a-coyote that-which goes
 “The one who goes is a coyote”

  • (Translation distinction provided for clarity — semantically equivalent)
  • Lillooet Salish quantification has repercussions for e.g. English (Matthewson 2001)

6

via Beck, 2013

slide-7
SLIDE 7

Indigenous Peoples’ Day

  • UW American Indian Studies Courses
  • (Sometimes including language courses, e.g. Southern Lushootseed)
  • At the new Burke Museum on campus:
  • https://www.burkemuseum.org/calendar/indigenous-peoples-day

7

slide-8
SLIDE 8

PCFG Induction

8

slide-9
SLIDE 9

Learning Probabilities

  • Simplest way:
  • Use treebank of parsed sentences
  • To compute probability of a rule, count:
  • Number of times a nonterminal is expanded: Σ𝛿 Count(𝛽→𝛿)
  • Number of times a nonterminal is expanded by a given rule: Count(𝛽→𝛾)
  • Alternative: Learn probabilities by re-estimating
  • (Later)

9

P(α → β|α) = Count(α → β) ∑γ Count(α → γ) = Count(α → β) Count(α)

slide-10
SLIDE 10

Inducing a PCFG

10

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . .

slide-11
SLIDE 11

Inducing a PCFG

11

S → NPVP . 1 S → * 1 S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . .

slide-12
SLIDE 12

Inducing a PCFG

12

S → NP VP . 1 NP→ NNP NNP 1 S → * 1 NP→ * 1 S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . .

slide-13
SLIDE 13

Inducing a PCFG

13

S → NP VP . 1 NP → NNP NNP 1 VP → VBZ NP 1 S → * 1 NP→ * 1 VP → * 1 S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . .

slide-14
SLIDE 14

Inducing a PCFG

14

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP → NNP NNP 1 VP → VBZ NP 1 NP→ NP PP 1 S → * 1 NP→ * 2 VP → * 1

slide-15
SLIDE 15

Inducing a PCFG

15

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP → NNP NNP 1 VP → VBZ NP 1 NP→ NP PP 1 PP→ IN NP 1 S → * 1 NP→ * 2 VP → * 1 PP→ * 1

slide-16
SLIDE 16

Inducing a PCFG

16

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP → NNP NNP 1 VP → VBZ NP 1 NP→ NP PP 1 PP→ IN NP 1 NP→ NP , NP 1 S → * 1 NP→ * 3 VP → * 1 PP→ * 1

slide-17
SLIDE 17

Inducing a PCFG

17

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP→ NNP NNP 2 VP → VBZ NP 1 NP→ NP PP 1 PP→ IN NP 1 NP→ NP , NP 1 S → * 1 NP→ * 4 VP → * 1 PP→ * 1

slide-18
SLIDE 18

Inducing a PCFG

18

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP→ NNP NNP 2 VP → VBZ NP 1 NP→ NP PP 1 PP→ IN NP 1 NP→ NP , NP 1

NP→ DT NNP VBG NN

1 S → * 1 NP→ * 5 VP → * 1 PP→ * 1

slide-19
SLIDE 19

Inducing a PCFG

19

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP→ NNP NNP 2 VP → VBZ NP 1 NP→ NP PP 1 PP→ IN NP 1 NP→ NP , NP 1

NP→ DT NNP VBG NN

1 S → * 1 NP→ * 5 VP → * 1 PP→ * 1

slide-20
SLIDE 20

Inducing a PCFG

20

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP→ NNP NNP 2/5 VP → VBZ NP 1 NP→ NP PP 1/5 PP→ IN NP 1 NP→ NP , NP 1/5

NP→ DT NNP VBG NN

1/5 S → * 1 NP→ * 5 VP → * 1 PP→ * 1

slide-21
SLIDE 21

Inducing a PCFG

21

S NP NNP Mr. NNP Vinken VP VBZ is NP NP NN chairman PP IN

  • f

NP NP NNP Elsevier NNP N.V. , , NP DT the NNP Dutch VBG publishing NN group . . S → NP VP . 1 NP→ NNP NNP 0.4 VP → VBZ NP 1 NP→ NP PP 0.2 PP→ IN NP 1 NP→ NP , NP 0.2

NP→ DT NNP VBG NN

0.2 S → * 1 NP→ * 5 VP → * 1 PP→ * 1

slide-22
SLIDE 22

Problems with PCFGs

22

slide-23
SLIDE 23

Problems with PCFGs

  • Independence Assumption
  • Assume that rule probabilities are independent
  • Lack of Lexical Conditioning
  • Lexical items should influence the choice of analysis

23

slide-24
SLIDE 24

Issues with PCFGs: Independence Assumption

  • Context Free ⇒ Independence Assumption
  • Rule expansion is context-independent
  • Allows us to multiply probabilities
  • If we have two rules:
  • NP → DT NN [0.28]
  • NP → PRP [0.25]
  • What does this new data tell us?
  • NP → DT NN [0.09 if NPΘ=subject else 0.66]
  • NP → PRP [0.91 if NPΘ=subject else 0.34]

24

Semantic Role of NPs in Switchboard Corpus Pronomial Non-Pronomial

Subject 91% 9% Object 34% 66%

…Can try parent annotation

slide-25
SLIDE 25

(“into a bin” = location of sacks after dumping) OK!

Issues with PCFGs: Lexical Conditioning

25

*

(“into a bin” = *the sacks which were located in PP) not OK

S NP NNS workers VP VBD dumped NP NNS sacks PP P into NP DT a NN bin S NP NNS workers VP VBD dumped NP NNS sacks PP P into NP DT a NN bin

slide-26
SLIDE 26

Issues with PCFGs: Lexical Conditioning

26

*

S NP NNS workers VP VBD dumped NP NNS sacks PP P into NP DT a NN bin

(“into a bin” = *the sacks which were located in PP) not OK

S NP NNS workers VP VBD dumped NP NNS sacks PP P in NP DT a NN bin

(“in a bin” = location of sacks before dumping) OK!

slide-27
SLIDE 27
  • workers dumped sacks into a bin
  • into should prefer modifying dumped
  • into should disprefer modifying sacks
  • fishermen caught tons of herring
  • of should prefer modifying tons
  • of should disprefer modifying caught

27

Issues with PCFGs: Lexical Conditioning

slide-28
SLIDE 28

Issues with PCFGs: Coordination Ambiguity

28

NP NP NP Noun dogs PP Prep in NP Noun houses Conj and NP Noun cats NP NP Noun dogs PP Prep in NP NP Noun houses Conj and NP Noun cats

slide-29
SLIDE 29

Issues with PCFGs: Coordination Ambiguity

29

NP NP NP Noun dogs PP Prep in NP Noun houses Conj and NP Noun cats NP NP Noun dogs PP Prep in NP NP Noun houses Conj and NP Noun cats

NP → NP Conj NP NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats” NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → NP Conj NP NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats”

Same Rules!

slide-30
SLIDE 30

Issues with PCFGs: Coordination Ambiguity

30

NP NP NP Noun dogs PP Prep in NP Noun houses Conj and NP Noun cats NP NP Noun dogs PP Prep in NP NP Noun houses Conj and NP Noun cats

NP → NP Conj NP NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats” NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → NP Conj NP NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats”

Same Rules!

slide-31
SLIDE 31

Issues with PCFGs: Coordination Ambiguity

31

NP NP NP Noun dogs PP Prep in NP Noun houses Conj and NP Noun cats NP NP Noun dogs PP Prep in NP NP Noun houses Conj and NP Noun cats

NP → NP Conj NP NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats” NP → NP PP Noun → “dogs” PP → Prep NP Prep → “in” NP → NP Conj NP NP → Noun Noun → “houses” Conj → “and” NP → Noun Noun → “cats”

Same Rules!

slide-32
SLIDE 32

Improving PCFGs

32

slide-33
SLIDE 33

Improving PCFGs

  • Parent Annotation
  • Lexicalization
  • Markovization
  • Reranking

33

slide-34
SLIDE 34

Improving PCFGs: Parent Annotation

  • To handle the NP → PRP [0.91 if NPΘ=subject else 0.34]

34

S aaa a ! ! ! ! NP Pron I VP aaa ! ! ! V prefer NP HH H

  • Det

a Nom bb b " " " Nom NN flight PP c c # # IN

  • n

NP NNP TWA

Annotate each node with its parent

slide-35
SLIDE 35
  • To handle the NP → PRP [0.91 if NPΘ=subject else 0.34]

Improving PCFGs: Parent Annotation

35

S PPPP P ⇣ ⇣ ⇣ ⇣ ⇣ NPS PronNP I VPS PPPP ⇣ ⇣ ⇣ ⇣ VV P prefer NPV P PPPP ⇣ ⇣ ⇣ ⇣ DetNP a NomNP aaa a ! ! ! ! NomNom NNNom flight PPNom bb b " " " INP P

  • n

NPP P NNPNP TWA

Annotate each node with its parent

slide-36
SLIDE 36

S PPPP P ⇣ ⇣ ⇣ ⇣ ⇣ NPS PronNP I VPS PPPP ⇣ ⇣ ⇣ ⇣ VV P prefer NPV P PPPP ⇣ ⇣ ⇣ ⇣ DetNP a NomNP aaa a ! ! ! ! NomNom NNNom flight PPNom bb b " " " INP P

  • n

NPP P NNPNP TWA

  • To handle the NP → PRP [0.91 if NPΘ=subject else 0.34]

Improving PCFGs: Parent Annotation

36

Annotate each node with its parent

slide-37
SLIDE 37
  • Advantages:
  • Captures structural dependencies in grammar
  • Disadvantages:
  • Explodes number of rules in grammar
  • Same problem with subcategorization
  • Results in sparsity problems
  • Strategies to find an optimal number of splits
  • Petrov et al (2006)

Improving PCFGs: Parent Annotation

37

slide-38
SLIDE 38

Improving PCFGs

  • Parent Annotation
  • Lexicalization
  • Markovization
  • Reranking

38

slide-39
SLIDE 39

Improving PCFGs: Lexical “Heads”

  • Remember back to syntax intro (Lecture #1)
  • Phrases are “headed” by key words
  • VP are headed by V
  • NP by NN, NNS, PRON
  • PP by PREP
  • We can take advantage of this in our grammar!

39

slide-40
SLIDE 40

Improving PCFGs: Lexical Dependencies

  • As we’ve seen, some rules should be conditioned on certain words
  • Proposal: annotate nonterminals with lexical head

VP → VBD NP PP VP(dumped) → VBD(dumped) NP(sacks) PP(into)

  • Additionally: annotate with lexical head + POS

VP(dumped, VBD) → VBD(dumped, VBD) NP(sacks, NNS) PP(into, IN)

40

slide-41
SLIDE 41

Lexicalized Parse Tree

41 Internal Rules Lexical Rules TOP → S(prefer, V) Pron(I, Pron) → I S(prefer, V) → NP(I, Pron) VP(prefer, V) V(prefer, V) → prefer NP(I, Pron) → Pron(I, Pron) Det(a, Det) → a VP(prefer, V) → V(prefer, V) NP(flight, NN) NN(flight, NN) → flight NP(flight, NN) → Det(a, Det) Nom(flight, NN) IN(on, IN) →

  • n

PP(on, IN) → IN(on, IN) NP(TWA, NNP) NNP(NWA, NNP) → TWA

TOP S[prefer, V] `````` ` NP[I, Pron] Pron[I, Pron] I VP[prefer, V] `````` ` V[prefer, V] prefer NP[flight, NN] `````` ` Det[a, Det] a Nom[flight, NN] XXXXX X ⇠ ⇠ ⇠ ⇠ ⇠ ⇠ Nom[flight, NN] NN[flight, NN] flight PP[on, IN] PPP P ⇣ ⇣ ⇣ ⇣ IN[on, IN]

  • n

NP[TWA, NNP] NNP[TWA, NNP] TWA

slide-42
SLIDE 42

Lexicalized Parse Tree

42 Internal Rules Lexical Rules TOP → S(prefer, V) Pron(I, Pron) → I S(prefer, V) → NP(I, Pron) VP(prefer, V) V(prefer, V) → prefer NP(I, Pron) → Pron(I, Pron) Det(a, Det) → a VP(prefer, V) → V(prefer, V) NP(flight, NN) NN(flight, NN) → flight NP(flight, NN) → Det(a, Det) Nom(flight, NN) IN(on, IN) →

  • n

PP(on, IN) → IN(on, IN) NP(TWA, NNP) NNP(NWA, NNP) → TWA

TOP S[prefer, V] `````` ` NP[I, Pron] Pron[I, Pron] I VP[prefer, V] `````` ` V[prefer, V] prefer NP[flight, NN] `````` ` Det[a, Det] a Nom[flight, NN] XXXXX X ⇠ ⇠ ⇠ ⇠ ⇠ ⇠ Nom[flight, NN] NN[flight, NN] flight PP[on, IN] PPP P ⇣ ⇣ ⇣ ⇣ IN[on, IN]

  • n

NP[TWA, NNP] NNP[TWA, NNP] TWA

slide-43
SLIDE 43

Lexicalized Parse Tree

43 Internal Rules Lexical Rules TOP → S(prefer, V) Pron(I, Pron) → I S(prefer, V) → NP(I, Pron) VP(prefer, V) V(prefer, V) → prefer NP(I, Pron) → Pron(I, Pron) Det(a, Det) → a VP(prefer, V) → V(prefer, V) NP(flight, NN) NN(flight, NN) → flight NP(flight, NN) → Det(a, Det) Nom(flight, NN) IN(on, IN) →

  • n

PP(on, IN) → IN(on, IN) NP(TWA, NNP) NNP(NWA, NNP) → TWA

TOP S[prefer, V] `````` ` NP[I, Pron] Pron[I, Pron] I VP[prefer, V] `````` ` V[prefer, V] prefer NP[flight, NN] `````` ` Det[a, Det] a Nom[flight, NN] XXXXX X ⇠ ⇠ ⇠ ⇠ ⇠ ⇠ Nom[flight, NN] NN[flight, NN] flight PP[on, IN] PPP P ⇣ ⇣ ⇣ ⇣ IN[on, IN]

  • n

NP[TWA, NNP] NNP[TWA, NNP] TWA

slide-44
SLIDE 44

Lexicalized Parse Tree

44 Internal Rules Lexical Rules TOP → S(prefer, V) Pron(I, Pron) → I S(prefer, V) → NP(I, Pron) VP(prefer, V) V(prefer, V) → prefer NP(I, Pron) → Pron(I, Pron) Det(a, Det) → a VP(prefer, V) → V(prefer, V) NP(flight, NN) NN(flight, NN) → flight NP(flight, NN) → Det(a, Det) Nom(flight, NN) IN(on, IN) →

  • n

PP(on, IN) → IN(on, IN) NP(TWA, NNP) NNP(NWA, NNP) → TWA

TOP S[prefer, V] `````` ` NP[I, Pron] Pron[I, Pron] I VP[prefer, V] `````` ` V[prefer, V] prefer NP[flight, NN] `````` ` Det[a, Det] a Nom[flight, NN] XXXXX X ⇠ ⇠ ⇠ ⇠ ⇠ ⇠ Nom[flight, NN] NN[flight, NN] flight PP[on, IN] PPP P ⇣ ⇣ ⇣ ⇣ IN[on, IN]

  • n

NP[TWA, NNP] NNP[TWA, NNP] TWA

slide-45
SLIDE 45

Improving PCFGs: Lexical Dependencies

45

S[dumped, VBD] hhhhhhhhhh h ( ( ( ( ( ( ( ( ( ( ( NP[workers, NNS] NNS[workers, NNS] workers VP[dumped, VBD] hhhhhhhhhh h   ( ( ( ( ( ( ( ( ( ( ( VBD[dumped, VBD] dumped NP[sacks, NNS] NNS[sacks, NNS] sacks PP[into, P] XXXX X ⇠ ⇠ ⇠ ⇠ ⇠ P[into, P] into NP[bin, NN] aaa a ! ! ! ! DT[a, DT] a NN[bin, NN] bin

  • Upshot: heads propagate up tree:
  • VP → VBD(dumped, VBD) NP(sacks, NNS) PP(into, P)
  • NP → NNS(sacks, NNS) PP(into, P)

✔ ✘

slide-46
SLIDE 46

Improving PCFGs: Lexical Dependencies

  • Downside:
  • Rules far too specialized — will be sparse
  • Solution:
  • Assume conditional independence
  • Create more rules

46

slide-47
SLIDE 47

Improving PCFGs: Collins Parser

  • Proposal:
  • LHS → LeftOfHead … Head … RightOfHead
  • Instead of calculating P(EntireRule), which is sparse:
  • Calculate:
  • Probability that LHS has nonterminal phrase H given head-word hw…
  • × Probability of modifiers to the left given head-word hw…
  • × Probability of modifiers to the right given head-word hw…

47

slide-48
SLIDE 48

Collins Parser Example

48

S NP NNS workers VP VBD dumped NP NNS sacks PP P into NP DT a NN bin S NP NNS workers VP VBD dumped NP NNS sacks PP P into NP DT a NN bin

*

slide-49
SLIDE 49

Collins Parser Example

49

P(VP → VBD NP|VP, dumped)

= Count (VP (dumped) → VBD NP) ∑β Count (VP (dumped) → β) = 1 9 = 0.11

= 0 PR(into|PP, sacks)

= Count (X (sacks) → … PP (into) …) ∑β Count (X (sacks) → … PP …)

P(VP → VBD NP PP|VP, dumped)

= Count (VP (dumped) → VBD NP PP) ∑β Count (VP (dumped) → β) = 6 9 = 0.67

PR(into|PP, dumped)

= Count (X (dumped) → … PP (into) …) ∑β Count (X (dumped) → … PP …) = 2 9 = 0.22

slide-50
SLIDE 50

Improving PCFGs

  • Parent Annotation
  • Lexicalization
  • Markovization
  • Reranking

50

slide-51
SLIDE 51

CNF Factorization & Markovization

  • CNF Factorization:
  • Converts n-ary branching to binary branching
  • Can maintain information about original structure
  • Neighborhood history and parent

51

slide-52
SLIDE 52

Different Markov Orders

52

NP aaa a

  • !

! ! ! DT JJ NN NNS

Original

NP aaa ! ! ! DT NP:JJ+NN+NNS HH H

  • JJ

NP:NN+NNS c c # # NN NNS

Order 3

NP HH H

  • DT

NP:JJ+NN HH H

  • JJ

NP:NN+NNS c c # # NN NNS

Order 2

NP HH H

  • DT

NP:JJ bb " " JJ NP:NN c c # # NN NNS

Order 1

NP HH H

  • DT

NP bb " " JJ NP c c # # NN NNS

Order 0

slide-53
SLIDE 53

Markovization and Costs

53

PCFG Time(s) Words/s |V| |P| LR LP F1

Right-factored 4848 6.7 10105 23220 69.2 73.8 71.5 Right-factored, Markov order-2 1302 24.9 2492 11659 68.8 73.8 71.3 Right-factored, Markov order-1 445 72.7 564 6354 68.0 730 70.5 Right-factored, Markov order-0 206 157.1 99 3803 61.2 65.5 63.3 Parent-annotated, Right-factored, Markov order-2 7510 4.3 5876 22444 76.2 78.3 77.2 from Mohri & Roark 2006

slide-54
SLIDE 54

Improving PCFGs

  • Parent Annotation
  • Lexicalization
  • Markovization
  • Reranking

54

slide-55
SLIDE 55

Reranking

  • Issue: Locality
  • PCFG probabilities associated with rewrite rules
  • Context-free grammars are, well, context-free
  • Previous approaches create new rules to incorporate context
  • Need approach that incorporates broader, global info

55

slide-56
SLIDE 56

Discriminative Parse Reranking

  • General approach:
  • Parse using (L)PCFG
  • Obtain top-N parses
  • Re-rank top-N using better features
  • Use discriminative model (e.g. MaxEnt) to rerank with features:
  • right-branching vs. left-branching
  • speaker identity
  • conjunctive parallelism
  • fragment frequency

56

slide-57
SLIDE 57

Reranking Effectiveness

  • How can reranking improve?
  • Results from Collins and Koo (2005), with 50-best
  • “Oracle” is to automatically choose the correct parse if in N-best

57

System Accuracy

Baseline 0.897 Oracle 0.968 Discriminative 0.917

slide-58
SLIDE 58

Improving PCFGs:
 Tradeoffs

  • Pros:
  • Increased accuracy/specificity
  • e.g. Lexicalization, Parent annotation, Markovization,

etc

  • Cons:
  • Explode grammar size
  • Increased processing time
  • Increased data requirements
  • How can we balance?

58

slide-59
SLIDE 59

Improving PCFGs: Efficiency

  • Beam thresholding
  • Heuristic Filtering

59

slide-60
SLIDE 60

Efficiency

  • PCKY is |G|· n3
  • Grammar can be huge
  • Grammar can be extremely ambiguous
  • Hundreds of analyses not unusual
  • …but only care about best parses
  • Can we use this to improve efficiency?

60

slide-61
SLIDE 61

Beam Thresholding

  • Inspired by Beam Search
  • Assume low probability parses unlikely to yield high probability overall
  • Keep only top k most probable partial parses
  • Retain only k choices per cell
  • For large grammars, maybe 50-100
  • For small grammars, 5 or 10

61

slide-62
SLIDE 62

Heuristic Filtering

  • Intuition: Some rules/partial parses unlikely to create best parse
  • Proposal: Don’t store these in table.
  • Exclude:
  • Low frequency: (singletons)
  • Low probability: constituents X s.t. P(X) < 10-200
  • Low relative probability:
  • Exclude X if there exists Y s.t. P(Y) > 100 × P(X)

62