Natural Language Processing Parsing III Dan Klein UC Berkeley 1 - PowerPoint PPT Presentation

Natural Language Processing Parsing III Dan Klein – UC Berkeley 1

Unsupervised Tagging 2

Unsupervised Tagging?  AKA part ‐ of ‐ speech induction  Task:  Raw sentences in  Tagged sentences out  Obvious thing to do:  Start with a (mostly) uniform HMM  Run EM  Inspect results 3

EM for HMMs: Process  Alternate between recomputing distributions over hidden variables (the tags) and reestimating parameters  Crucial step: we want to tally up how many (fractional) counts of each kind of transition and emission we have under current params:  Same quantities we needed to train a CRF! 4

Merialdo: Setup  Some (discouraging) experiments [Merialdo 94]  Setup:  You know the set of allowable tags for each word  Fix k training examples to their true labels  Learn P(w|t) on these examples  Learn P(t|t ‐ 1 ,t ‐ 2 ) on these examples  On n examples, re ‐ estimate with EM  Note: we know allowed tags but not frequencies 5

Merialdo: Results 6

Latent Variable PCFGs 7

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98] 8

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00] 9

The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering? 10

Latent Variable Grammars ... Parse Tree Parameters Derivations Sentence 11

Learning Latent Annotations Forward EM algorithm:  Brackets are known  Base categories are known X 1  Only induce subcategories X 7 X 2 X 4 X 3 X 5 X 6 . He was right Just like Forward ‐ Backward for HMMs. Backward 12

Refinement of the DT tag DT DT-2 DT-1 DT-3 DT-4 13

Hierarchical refinement 14

Hierarchical Estimation Results 90 88 Parsing accuracy (F1) 86 84 82 80 78 76 74 Model F1 100 300 500 700 900 1100 1300 1500 1700 Flat Training 87.3 Total Number of grammar symbols Hierarchical Training 88.4 15

Refinement of the , tag  Splitting all categories equally is wasteful: 16

Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful 17

Adaptive Splitting Results Model F1 Previous 88.4 With 50% Merging 89.5 18

10 15 20 25 30 35 40 0 5 NP VP PP Number of Phrasal Subcategories ADVP S ADJP SBAR QP WHNP PRN NX SINV PRT WHPP SQ CONJP FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X ROOT LST 19

10 20 30 40 50 60 70 0 NNP JJ NNS NN VBN RB Number of Lexical Subcategories VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB -LRB- . EX WP$ WDT -RRB- '' FW RBS TO $ UH , `` SYM RP LS # 20

Learned Splits  Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street  Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him 21

Learned Splits  Relative adverbs (RBR): RBR-0 further lower higher RBR-1 more less More RBR-2 earlier Earlier later  Cardinal Numbers (CD): CD-7 one two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34 22

Final Results (Accuracy) ≤ 40 words all F1 F1 Charniak&Johnson ‘05 (generative) 90.1 89.6 ENG Split / Merge 90.6 90.1 Dubey ‘05 76.3 - GER Split / Merge 80.8 80.1 Chiang et al. ‘02 80.0 76.6 CHN Split / Merge 86.3 83.4 Still higher numbers from reranking / self-training methods 23

Efficient Parsing for Hierarchical Grammars 24

Coarse ‐ to ‐ Fine Inference  Example: PP attachment ????????? 25

Hierarchical Pruning coarse: … QP NP VP … split in two: … QP1 QP2 NP1 NP2 VP1 VP2 … split in four: … QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 … split in eight: … … … … … … … … … … … … … … … … … 26

Bracket Posteriors 27

1621 min 111 min 35 min 15 min (no search error) 28

Other Syntactic Models 29

Parse Reranking  Assume the number of parses is very small We can represent each parse T as an arbitrary feature vector  (T)   Typically, all local rules are features  Also non ‐ local features, like how right ‐ branching the overall tree is  [Charniak and Johnson 05] gives a rich set of features 30

[Huang and Chiang 05, K ‐ Best Parsing Pauls, Klein, Quirk 10] 31

Dependency Parsing  Lexicalized parsers can be seen as producing dependency trees questioned lawyer witness the the  Each local binary tree corresponds to an attachment in the dependency graph 32

Dependency Parsing  Pure dependency parsing is only cubic [Eisner 99] X[h] h Y[h] Z[h’] h h’ i h k h’ j h k h’  Some work on non ‐ projective dependencies  Common in, e.g. Czech parsing  Can do with MST algorithms [McDonald and Pereira 05] 33

Shift ‐ Reduce Parsers  Another way to derive a tree:  Parsing  No useful dynamic programming search  Can still use beam search [Ratnaparkhi 97] 34

Data ‐ oriented parsing:  Rewrite large (possibly lexicalized) subtrees in a single step  Formally, a tree ‐ insertion grammar  Derivational ambiguity whether subtrees were generated atomically or compositionally  Most probable parse is NP ‐ complete 35

TIG: Insertion 36

Tree ‐ adjoining grammars  Start with local trees  Can insert structure with adjunction operators  Mildly context ‐ sensitive  Models long ‐ distance dependencies naturally  … as well as other weird stuff that CFGs don’t capture well (e.g. cross ‐ serial dependencies) 37

TAG: Long Distance 38

CCG Parsing  Combinatory Categorial Grammar  Fully (mono ‐ ) lexicalized grammar  Categories encode argument sequences  Very closely related to the lambda calculus (more later)  Can have spurious ambiguities (why?) 39

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 - PowerPoint PPT Presentation

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 Unsupervised Tagging 2 Unsupervised Tagging? AKA part of speech induction Task: Raw sentences in Tagged sentences out Obvious thing to do: Start

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

Risk Assessment and Allocation for Effective Project Delivery and Management June 27, 2012

A very good morning to Shareholders, Chairman Tan Sri Mohd Hassan Marican, Directors, ladies and

Transport Focus 2016 Bus Passenger Survey Briefing 22 March 2017 - Liverpool Presentation of BPS

UFO 2 : A Unified Framework towards Omni-supervised Object Detection Zhongzheng Ren, Zhiding Yu,

Structured Probability Spaces Guy Van den Broeck UCLA Stats Seminar Jan 17, 2017 Outline 1.

Planning by Rewriting Jos-Luis Ambite University of Southern California Information Sciences

2018 Underwriting Outlook Underwriting in The Absence of State LIHTC - Key Areas of Crit ical

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit