Why NLP Needs Theoretical Syntax (It in Fact Already Uses It) Owen - - PowerPoint PPT Presentation

▶

Jun 26, 2023 163 likes •254 views

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It) Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu Key Issue: Representation Aravind Joshi to statisticians

SLIDE 1

Why NLP Needs Theoretical Syntax

(It in Fact Already Uses It)

Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu

SLIDE 2

Key Issue: Representation

Aravind Joshi to statisticians (adapted):

“You know how to count, but we tell you what to count”

Linguistic representations are not naturally
ccurring!
They are devised by linguists
Example: English Penn Treebank

– Beatrice Santorini (thesis: historical syntax of Yiddish) – Lots of linguistic theory went into the PTB – PTB annotation manual is a comprehensive descriptive grammar of English

SLIDE 3

What Sort of Representations for Syntax?

Syntax: links between text and meaning
Text consists of words -> lexical models

– Lexicalized formalisms – Note: bi- and monolexical versions of CFG

Need to link to meaning (for example, PropBank)

– Extended domain of locality to locate predicate- argument structure – Note: importance of dashtags etc in PTB II

Tree Adjoining Grammar! (but CCG is also cool,

and LFG has its own appeal)

SLIDE 4

Why isn’t everyone using TAG?

The PTB is not annotated with a TAG
Need to do linguistic interpretation on PTB to

extract TAG (Chen 2001, Fei 2001)

This is not surprising: all linguistic

representations need to be interpreted (Rambow 2010)

– Extraction of (P)CFG is simple and requires little interpretation – Extraction of bilexical (P)CFG is not, requires head percolation, which is interpretation

SLIDE 5

Why isn’t everyone using TAG Parsers?

Unclear how well they are performing

– PS evaluation irrelevant

MICA parser (Bangalore et al 2009):

– high 80s on a linguistically motivated predicate-argument structure dependency – MALT does slightly better on same representation – But MICA output comes fully interpreted, MALT does not

Once we have a good syntactic pred-arg structure,

tasks like semantic role labeling (PropBank) are easier

– 95% on args given gold pred-arg structure (Chen and Rambow 2002)

SLIDE 6

What Have We Learned About TAG Parsing?

Large TAG grammar not easy to manage

computationally (MICA: 5000 trees, 1,200 used in parsing)

Small TAG grammars lose too much information
Need to investigate:

– Dynamic creation of TAG grammars (trees created in response to need) (note: LTAG-spinal Shen 2006) – “Bushes”: underspecified trees – Metagrammars (Kinyon 2003)

SLIDE 7

What about All Those Other Languages?

Can’t do treebanks for 3,000 languages
Need to understand cross-linguistic variation

and use that understanding in computational models

– Cross-linguistic variation: theoretical syntax – Models: NLP – Link: metagrammars for TAG

SLIDE 8

Summary

Treebanks already encode insights from

theoretical syntax

Require interpretation for non-trivial models
Applications other than Parseval require richer

representations (and richer evaluations)

But probably English is not the right language

to argue for the need for richer syntactic knowledge

Real coming bottleneck: NLP for 3,000