Natural Language Processing Parsing III Dan Klein UC Berkeley 1 - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 - - PowerPoint PPT Presentation

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 Unsupervised Tagging 2 Unsupervised Tagging? AKA part of speech induction Task: Raw sentences in Tagged sentences out Obvious thing to do: Start


slide-1
SLIDE 1

1

Natural Language Processing

Parsing III

Dan Klein – UC Berkeley

slide-2
SLIDE 2

2

Unsupervised Tagging

slide-3
SLIDE 3

3

Unsupervised Tagging?

  • AKA part‐of‐speech induction
  • Task:
  • Raw sentences in
  • Tagged sentences out
  • Obvious thing to do:
  • Start with a (mostly) uniform HMM
  • Run EM
  • Inspect results
slide-4
SLIDE 4

4

EM for HMMs: Process

  • Alternate between recomputing distributions over hidden variables (the

tags) and reestimating parameters

  • Crucial step: we want to tally up how many (fractional) counts of each

kind of transition and emission we have under current params:

  • Same quantities we needed to train a CRF!
slide-5
SLIDE 5

5

Merialdo: Setup

  • Some (discouraging) experiments [Merialdo 94]
  • Setup:
  • You know the set of allowable tags for each word
  • Fix k training examples to their true labels
  • Learn P(w|t) on these examples
  • Learn P(t|t‐1,t‐2) on these examples
  • On n examples, re‐estimate with EM
  • Note: we know allowed tags but not frequencies
slide-6
SLIDE 6

6

Merialdo: Results

slide-7
SLIDE 7

7

Latent Variable PCFGs

slide-8
SLIDE 8

8

  • Annotation refines base treebank symbols to improve

statistical fit of the grammar

  • Parent annotation [Johnson ’98]

The Game of Designing a Grammar

slide-9
SLIDE 9

9

  • Annotation refines base treebank symbols to improve

statistical fit of the grammar

  • Parent annotation [Johnson ’98]
  • Head lexicalization [Collins ’99, Charniak ’00]

The Game of Designing a Grammar

slide-10
SLIDE 10

10

  • Annotation refines base treebank symbols to improve

statistical fit of the grammar

  • Parent annotation [Johnson ’98]
  • Head lexicalization [Collins ’99, Charniak ’00]
  • Automatic clustering?

The Game of Designing a Grammar

slide-11
SLIDE 11

11

Latent Variable Grammars

Parse Tree Sentence Parameters ... Derivations

slide-12
SLIDE 12

12

Forward

Learning Latent Annotations

EM algorithm: X1 X2 X7 X4 X5 X6 X3

He was right .

  • Brackets are known
  • Base categories are known
  • Only induce subcategories

Just like Forward‐Backward for HMMs.

Backward

slide-13
SLIDE 13

13

Refinement of the DT tag

DT DT-1 DT-2 DT-3 DT-4

slide-14
SLIDE 14

14

Hierarchical refinement

slide-15
SLIDE 15

15

Hierarchical Estimation Results

74 76 78 80 82 84 86 88 90 100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols Parsing accuracy (F1)

Model F1 Flat Training 87.3 Hierarchical Training 88.4

slide-16
SLIDE 16

16

Refinement of the , tag

  • Splitting all categories equally is wasteful:
slide-17
SLIDE 17

17

Adaptive Splitting

  • Want to split complex categories more
  • Idea: split everything, roll back splits which

were least useful

slide-18
SLIDE 18

18

Adaptive Splitting Results

Model F1 Previous 88.4 With 50% Merging 89.5

slide-19
SLIDE 19

19

5 10 15 20 25 30 35 40 NP VP PP ADVP S ADJP SBAR QP WHNP PRN NX SINV PRT WHPP SQ CONJP FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X ROOT LST

Number of Phrasal Subcategories

slide-20
SLIDE 20

20

Number of Lexical Subcategories

10 20 30 40 50 60 70 NNP JJ NNS NN VBN RB VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB

  • LRB-

. EX WP$ WDT

  • RRB-

'' FW RBS TO $ UH , `` SYM RP LS #

slide-21
SLIDE 21

21

Learned Splits

  • Proper Nouns (NNP):
  • Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street PRP-0 It He I PRP-1 it he they PRP-2 it them him

slide-22
SLIDE 22

22

  • Relative adverbs (RBR):
  • Cardinal Numbers (CD):

RBR-0 further lower higher RBR-1 more less More RBR-2 earlier Earlier later CD-7

  • ne

two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34

Learned Splits

slide-23
SLIDE 23

23

Final Results (Accuracy)

≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative) 90.1 89.6 Split / Merge 90.6 90.1 GER Dubey ‘05 76.3

  • Split / Merge

80.8 80.1 CHN Chiang et al. ‘02 80.0 76.6 Split / Merge 86.3 83.4 Still higher numbers from reranking / self-training methods

slide-24
SLIDE 24

24

Efficient Parsing for Hierarchical Grammars

slide-25
SLIDE 25

25

Coarse‐to‐Fine Inference

  • Example: PP attachment

?????????

slide-26
SLIDE 26

26

Hierarchical Pruning

… QP NP VP …

coarse: split in two:

… QP1 QP2 NP1 NP2 VP1 VP2 … … QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …

split in four: split in eight: …

… … … … … … … … … … … … … … … …

slide-27
SLIDE 27

27

Bracket Posteriors

slide-28
SLIDE 28

28

1621 min 111 min 35 min

15 min

(no search error)

slide-29
SLIDE 29

29

Other Syntactic Models

slide-30
SLIDE 30

30

Parse Reranking

  • Assume the number of parses is very small
  • We can represent each parse T as an arbitrary feature vector (T)
  • Typically, all local rules are features
  • Also non‐local features, like how right‐branching the overall tree is
  • [Charniak and Johnson 05] gives a rich set of features
slide-31
SLIDE 31

31

K‐Best Parsing

[Huang and Chiang 05, Pauls, Klein, Quirk 10]

slide-32
SLIDE 32

32

Dependency Parsing

  • Lexicalized parsers can be seen as producing dependency trees
  • Each local binary tree corresponds to an attachment in the dependency

graph

questioned lawyer witness the the

slide-33
SLIDE 33

33

Dependency Parsing

  • Pure dependency parsing is only cubic [Eisner 99]
  • Some work on non‐projective dependencies
  • Common in, e.g. Czech parsing
  • Can do with MST algorithms [McDonald and Pereira 05]

Y[h] Z[h’] X[h] i h k h’ j h h’ h h k h’

slide-34
SLIDE 34

34

Shift‐Reduce Parsers

  • Another way to derive a tree:
  • Parsing
  • No useful dynamic programming search
  • Can still use beam search [Ratnaparkhi 97]
slide-35
SLIDE 35

35

Data‐oriented parsing:

  • Rewrite large (possibly lexicalized) subtrees in a single step
  • Formally, a tree‐insertion grammar
  • Derivational ambiguity whether subtrees were generated atomically
  • r compositionally
  • Most probable parse is NP‐complete
slide-36
SLIDE 36

36

TIG: Insertion

slide-37
SLIDE 37

37

Tree‐adjoining grammars

  • Start with local trees
  • Can insert structure

with adjunction

  • perators
  • Mildly context‐

sensitive

  • Models long‐distance

dependencies naturally

  • … as well as other

weird stuff that CFGs don’t capture well (e.g. cross‐serial dependencies)

slide-38
SLIDE 38

38

TAG: Long Distance

slide-39
SLIDE 39

39

CCG Parsing

  • Combinatory

Categorial Grammar

  • Fully (mono‐)

lexicalized grammar

  • Categories encode

argument sequences

  • Very closely related

to the lambda calculus (more later)

  • Can have spurious

ambiguities (why?)