2 Implicit Negative Evidence that it does not sufficiently - PDF document

Guiding Unsupervised Grammar Induction Using Contrastive Estimation ∗ Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street, Baltimore, MD 21218 USA { nasmith,jason } @cs.jhu.edu Abstract modeling pose different challenges and are evaluated differ- ently. We regard traditional natural language grammar induc- We describe a novel training criterion for proba- tion evaluated against a treebank (also known as unsupervised bilistic grammar induction models, contrastive es- parsing) as just another task ; we call it M ATCH L INGUIST . timation [Smith and Eisner, 2005], which can be A grammar induced for punctuation restoration or language interpreted as exploiting implicit negative evidence modeling for speech recognition might look strange to a lin- and includes a wide class of likelihood-based ob- guist, yet do better on those tasks. By the same token, tra- jective functions. This criterion is a generaliza- ditional treebank-style linguistic annotations may not be the tion of the function maximized by the Expectation- best kind of syntax for language modeling. Maximization algorithm [Dempster et al. , 1977]. But without fully-observed data, how might one tell a CE is a natural fit for log-linear models, which can learner to focus on one task or another? We propose that this include arbitrary features but for which EM is com- is conveyed in the choice of an objective function that guides putationally difficult. We show that, using the same a statistical learner toward the right kinds of grammars for features, log-linear dependency grammar models the task at hand. We offer a flexible class of “contrastive” ob- trained using CE can drastically outperform EM- jective functions within which something appropriate may be trained generative models on the task of match- designed for existing and novel tasks. ing human linguistic annotations (the M ATCH L IN - In this paper, we evaluate our learned models on M ATCH - GUIST task). The selection of an implicit negative L INGUIST , which is a crucial task for natural language en- evidence class—a “neighborhood”—appropriate to gineering. Automatic natural language grammar induction a given task has strong implications, but a good would bridge the gap between resource limitations (anno- neighborhood one can target the objective of gram- tated treebanks are expensive, domain-specific, and language- mar induction to a specific application. specific) and the promise of exploiting syntactic structure in many applications. We argue that M ATCH L INGUIST , just like other tasks, requires guidance. 1 Introduction For example, M ATCH L INGUIST is decidedly different from the task that is explicitly solved by the Expectation- Grammars are formal objects with many applications. They Maximization algorithm [Dempster et al. , 1977]: M AXI - become particularly interesting when they allow ambiguity MIZE L IKELIHOOD . EM tries to fit the numerical parameters (cf. programming language grammars), introducing the no- of a (fixed) statistical model of hidden structure to the train- tion that one grammar may be preferable to another for a par- ing data. To recover traditional or useful syntactic structure, ticular use. Given an induced grammar, a researcher could try it is not enough to maximize training data likelihood [Car- to apply it cleverly to her task and then measure its helpful- roll and Charniak, 1992, inter alia ], and EM is notorious for ness on that task. This paper turns that scenario around. mediocre results. Our results suggest that part of the reason Given a task, our question is how to induce a grammar— EM performs badly is that it offers very little guidance to the from unannotated data—that is especially appropriate for the learner. The alternative we propose is contrastive estimation . task. Different grammars are likely to be better for differ- It is within the same statistical modeling paradigm as EM, but ent tasks. In natural language engineering, for example, ap- generalizes it by defining a notion of learner guidance. plications like automatic essay grading, punctuation correc- Contrastive estimation makes use of a set of examples that tion, spelling correction, machine translation, and language are similar in some way to an observed example (its neighborhood ), but mostly perturbed or damaged in a particular ∗ This work was supported by a Fannie and John Hertz Founda- way. CE requires the learner to move probability mass to tion Fellowship to the first author and NSF ITR grant IIS-0313193 a given example, taking only from the example’s neighbor- to the second author. The views expressed are not necessarily en- hood. The neighborhood of a particular example is defined by dorsed by the sponsors. The authors also thank colleagues at CLSP the neighborhood function ; different neighborhood functions and two anonymous reviewers for comments on this work.

2 Implicit Negative Evidence that it does not sufficiently - PDF document

Guiding Unsupervised Grammar Induction Using Contrastive Estimation Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street, Baltimore, MD

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

The Negative Marker in Romanian Negative Concord Gianina Iord achioaia Seminar f ur

Negative Equity Taxation and Disclosure October 19, 2009 Negative Equity History: State of Ohio

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Mountain Sheep Evidence Evidence 2: Horn Growth Evidence

Chapter 6 Evidence Chapter 6. Audit Evidence Why does the auditor need evidence ? 1.

EVIDENCE EVIDENCE- -BASED HEALTH CARE BASED HEALTH CARE BASED HEALTH CARE EVIDENCE EVIDENCE

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action & Adjunct

Theory and Practice Implicit Leadership Theory (ILT) Romance of Leadership (RoL) Implicit

CPSC 121: Models of Computation Module 9: Proof Techniques (part 2) Mathematical Induction

CSCI 136 Data Structures & Advanced Programming Mathematical Induction Fall 2020

Monte Carlo method for kinetic chemotaxis model and its applications on traveling pulse and

Long-term Monitoring of Accreting Pulsars with Fermi GBM Mark H. Finger, Elif Beklen, P .

Union-find These slides are not fully polished: - some transitions are rough - some topics are

Finger Search Searching in a sorted array 2 3 5 7 8 11 13 14 15 17 18 20 24 25 26 28 29 31

H ( C ) = (1 . 0 log 2 1 . 0 + 0 . 0 log 2 0 . 0) = 0 . 0 4 Wednesday, 11 Sep. 2019 Machine

COL351: Slides for Lecture Component 18 Thanks to Miles Jones, Russell Impagliazzo, and Sanjoy

2 Implicit Negative Evidence that it does not sufficiently - PDF document

Guiding Unsupervised Grammar Induction Using Contrastive Estimation Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street, Baltimore, MD

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Implicit Surfaces Implicit Surfaces An implicit surface is simply an iso-contour CIS 781 of a

Implicit Bias: Transcript Inclusive Teaching Series: Implicit Bias Welcome to the third module of

Implicit Extremes and Implicit MaxStable Laws Stilian Stoev ( sstoev@umich.edu ) University of

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Implicit Surfaces CPSC 599.86 / 601.86 Sonny Chan University of Calgary (some board work happened

The Negative Marker in Romanian Negative Concord Gianina Iord achioaia Seminar f ur

Negative Equity Taxation and Disclosure October 19, 2009 Negative Equity History: State of Ohio

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Mountain Sheep Evidence Evidence 2: Horn Growth Evidence

Chapter 6 Evidence Chapter 6. Audit Evidence Why does the auditor need evidence ? 1.

EVIDENCE EVIDENCE- -BASED HEALTH CARE BASED HEALTH CARE BASED HEALTH CARE EVIDENCE EVIDENCE

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action &amp; Adjunct

Theory and Practice Implicit Leadership Theory (ILT) Romance of Leadership (RoL) Implicit

CPSC 121: Models of Computation Module 9: Proof Techniques (part 2) Mathematical Induction

CSCI 136 Data Structures &amp; Advanced Programming Mathematical Induction Fall 2020

Monte Carlo method for kinetic chemotaxis model and its applications on traveling pulse and

Long-term Monitoring of Accreting Pulsars with Fermi GBM Mark H. Finger, Elif Beklen, P .

Union-find These slides are not fully polished: - some transitions are rough - some topics are

Finger Search Searching in a sorted array 2 3 5 7 8 11 13 14 15 17 18 20 24 25 26 28 29 31

H ( C ) = (1 . 0 log 2 1 . 0 + 0 . 0 log 2 0 . 0) = 0 . 0 4 Wednesday, 11 Sep. 2019 Machine

COL351: Slides for Lecture Component 18 Thanks to Miles Jones, Russell Impagliazzo, and Sanjoy

Implicit Bias and Race Mikah K. Thompson, Esq. Director of Affirmative Action & Adjunct

CSCI 136 Data Structures & Advanced Programming Mathematical Induction Fall 2020