Why NLP Needs Theoretical Syntax (It in Fact Already Uses It) Owen - - PowerPoint PPT Presentation

why nlp needs theoretical syntax
SMART_READER_LITE
LIVE PREVIEW

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It) Owen - - PowerPoint PPT Presentation

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It) Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu Key Issue: Representation Aravind Joshi to statisticians


slide-1
SLIDE 1

Why NLP Needs Theoretical Syntax

(It in Fact Already Uses It)

Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu

slide-2
SLIDE 2

Key Issue: Representation

  • Aravind Joshi to statisticians (adapted):

“You know how to count, but we tell you what to count”

  • Linguistic representations are not naturally
  • ccurring!
  • They are devised by linguists
  • Example: English Penn Treebank

– Beatrice Santorini (thesis: historical syntax of Yiddish) – Lots of linguistic theory went into the PTB – PTB annotation manual is a comprehensive descriptive grammar of English

slide-3
SLIDE 3

What Sort of Representations for Syntax?

  • Syntax: links between text and meaning
  • Text consists of words -> lexical models

– Lexicalized formalisms – Note: bi- and monolexical versions of CFG

  • Need to link to meaning (for example, PropBank)

– Extended domain of locality to locate predicate- argument structure – Note: importance of dashtags etc in PTB II

  • Tree Adjoining Grammar! (but CCG is also cool,

and LFG has its own appeal)

slide-4
SLIDE 4

Why isn’t everyone using TAG?

  • The PTB is not annotated with a TAG
  • Need to do linguistic interpretation on PTB to

extract TAG (Chen 2001, Fei 2001)

  • This is not surprising: all linguistic

representations need to be interpreted (Rambow 2010)

– Extraction of (P)CFG is simple and requires little interpretation – Extraction of bilexical (P)CFG is not, requires head percolation, which is interpretation

slide-5
SLIDE 5

Why isn’t everyone using TAG Parsers?

  • Unclear how well they are performing

– PS evaluation irrelevant

  • MICA parser (Bangalore et al 2009):

– high 80s on a linguistically motivated predicate-argument structure dependency – MALT does slightly better on same representation – But MICA output comes fully interpreted, MALT does not

  • Once we have a good syntactic pred-arg structure,

tasks like semantic role labeling (PropBank) are easier

– 95% on args given gold pred-arg structure (Chen and Rambow 2002)

slide-6
SLIDE 6

What Have We Learned About TAG Parsing?

  • Large TAG grammar not easy to manage

computationally (MICA: 5000 trees, 1,200 used in parsing)

  • Small TAG grammars lose too much information
  • Need to investigate:

– Dynamic creation of TAG grammars (trees created in response to need) (note: LTAG-spinal Shen 2006) – “Bushes”: underspecified trees – Metagrammars (Kinyon 2003)

slide-7
SLIDE 7

What about All Those Other Languages?

  • Can’t do treebanks for 3,000 languages
  • Need to understand cross-linguistic variation

and use that understanding in computational models

– Cross-linguistic variation: theoretical syntax – Models: NLP – Link: metagrammars for TAG

slide-8
SLIDE 8

Summary

  • Treebanks already encode insights from

theoretical syntax

  • Require interpretation for non-trivial models
  • Applications other than Parseval require richer

representations (and richer evaluations)

  • But probably English is not the right language

to argue for the need for richer syntactic knowledge

  • Real coming bottleneck: NLP for 3,000

languages