2 implicit negative evidence
play

2 Implicit Negative Evidence that it does not sufficiently - PDF document

Guiding Unsupervised Grammar Induction Using Contrastive Estimation Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street, Baltimore, MD


  1. Guiding Unsupervised Grammar Induction Using Contrastive Estimation ∗ Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University 3400 North Charles Street, Baltimore, MD 21218 USA { nasmith,jason } @cs.jhu.edu Abstract modeling pose different challenges and are evaluated differ- ently. We regard traditional natural language grammar induc- We describe a novel training criterion for proba- tion evaluated against a treebank (also known as unsupervised bilistic grammar induction models, contrastive es- parsing) as just another task ; we call it M ATCH L INGUIST . timation [Smith and Eisner, 2005], which can be A grammar induced for punctuation restoration or language interpreted as exploiting implicit negative evidence modeling for speech recognition might look strange to a lin- and includes a wide class of likelihood-based ob- guist, yet do better on those tasks. By the same token, tra- jective functions. This criterion is a generaliza- ditional treebank-style linguistic annotations may not be the tion of the function maximized by the Expectation- best kind of syntax for language modeling. Maximization algorithm [Dempster et al. , 1977]. But without fully-observed data, how might one tell a CE is a natural fit for log-linear models, which can learner to focus on one task or another? We propose that this include arbitrary features but for which EM is com- is conveyed in the choice of an objective function that guides putationally difficult. We show that, using the same a statistical learner toward the right kinds of grammars for features, log-linear dependency grammar models the task at hand. We offer a flexible class of “contrastive” ob- trained using CE can drastically outperform EM- jective functions within which something appropriate may be trained generative models on the task of match- designed for existing and novel tasks. ing human linguistic annotations (the M ATCH L IN - In this paper, we evaluate our learned models on M ATCH - GUIST task). The selection of an implicit negative L INGUIST , which is a crucial task for natural language en- evidence class—a “neighborhood”—appropriate to gineering. Automatic natural language grammar induction a given task has strong implications, but a good would bridge the gap between resource limitations (anno- neighborhood one can target the objective of gram- tated treebanks are expensive, domain-specific, and language- mar induction to a specific application. specific) and the promise of exploiting syntactic structure in many applications. We argue that M ATCH L INGUIST , just like other tasks, requires guidance. 1 Introduction For example, M ATCH L INGUIST is decidedly different from the task that is explicitly solved by the Expectation- Grammars are formal objects with many applications. They Maximization algorithm [Dempster et al. , 1977]: M AXI - become particularly interesting when they allow ambiguity MIZE L IKELIHOOD . EM tries to fit the numerical parameters (cf. programming language grammars), introducing the no- of a (fixed) statistical model of hidden structure to the train- tion that one grammar may be preferable to another for a par- ing data. To recover traditional or useful syntactic structure, ticular use. Given an induced grammar, a researcher could try it is not enough to maximize training data likelihood [Car- to apply it cleverly to her task and then measure its helpful- roll and Charniak, 1992, inter alia ], and EM is notorious for ness on that task. This paper turns that scenario around. mediocre results. Our results suggest that part of the reason Given a task, our question is how to induce a grammar— EM performs badly is that it offers very little guidance to the from unannotated data—that is especially appropriate for the learner. The alternative we propose is contrastive estimation . task. Different grammars are likely to be better for differ- It is within the same statistical modeling paradigm as EM, but ent tasks. In natural language engineering, for example, ap- generalizes it by defining a notion of learner guidance. plications like automatic essay grading, punctuation correc- Contrastive estimation makes use of a set of examples that tion, spelling correction, machine translation, and language are similar in some way to an observed example (its neigh- borhood ), but mostly perturbed or damaged in a particular ∗ This work was supported by a Fannie and John Hertz Founda- way. CE requires the learner to move probability mass to tion Fellowship to the first author and NSF ITR grant IIS-0313193 a given example, taking only from the example’s neighbor- to the second author. The views expressed are not necessarily en- hood. The neighborhood of a particular example is defined by dorsed by the sponsors. The authors also thank colleagues at CLSP the neighborhood function ; different neighborhood functions and two anonymous reviewers for comments on this work.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend