Matching needs and resources: How NLP can help theoretical - - PowerPoint PPT Presentation

matching needs and resources how nlp can help theoretical
SMART_READER_LITE
LIVE PREVIEW

Matching needs and resources: How NLP can help theoretical - - PowerPoint PPT Presentation

Matching needs and resources: How NLP can help theoretical linguistics Alexis Dimitriadis Utrecht institute of Linguistics OTS Utrecht University Motivation Insight: Linguistic research could be carried out more efficiently (or: with


slide-1
SLIDE 1

Matching needs and resources: How NLP can help theoretical linguistics

Alexis Dimitriadis

Utrecht institute of Linguistics OTS Utrecht University

slide-2
SLIDE 2

Motivation

  • Insight: Linguistic research could be carried out

more efficiently (or: with better results) if all that NLP horsepower could be brought to bear. But how?

  • Some linguists have both theoretical and NLP

knowledge, and can choose the techniques to apply. But most theoretical linguists are not self-sufficient in this respect:

  • Theoretical linguists should want help from NLP

.

  • Computational linguists should want to help.
slide-3
SLIDE 3

How NLP could help I

Best-case scenario

  • Collaboration, to tackle a research question

interesting to both sides. Examples: Language genealogy, learning algorithms (e.g., learning OT constraint ranking).

  • However: suitable research problems are limited.
  • Many theoretical projects have modest (i.e., boring)

computational needs. A large corpus of English is enough for a lot of linguists...

slide-4
SLIDE 4

Some collaborative research topics

  • Computational cladistics for Indo-European language
  • families. (Ringe, Warnow and Taylor 2002)
  • Algorithms for learning constraint rankings for

Optimality-Theoretic systems (phonology, stress, syntax...) (Tesar and Smolensky 2000)

  • Cognitive modeling of various linguistic phenomena

(cf. Workshop 5)

  • Studying the semantics of lexical entailments, using a

new custom-developed corpus. (Winter et al.,

  • ngoing)
slide-5
SLIDE 5

How NLP could help II

Lending a helping hand

  • Freebies Use of existing corpora, parsers etc.
  • Altruism The computational linguist undertakes to

help the theorist. Examples: Linguist’s Search Engine (Resnik et al. 2005), Natural Language Toolkit (Bird et al. 2009). Problems: Existing resources are often hard to use for the uninitiated. Altruism is a limited resource. How is it most effective?

slide-6
SLIDE 6

How NLP could help III

Learning from NLP methodology

  • Data-driven orientation
  • Reproducibility, inter-annotator consistency
  • Testing against a corpus of data
  • Evaluation metrics
slide-7
SLIDE 7

Utilizing concrete resources

and technical stumbling blocks

  • Many existing, boring tools and resources would be

useful to a theoretical linguist: corpora, parsers, web crawlers...

  • However, linguists typically lack the necessary

technical know-how, compilers, or even knowledge that such tools exist.

slide-8
SLIDE 8

The Linguist’s Search Engine

Resnik and Elkiss (2005)

  • Searching the web by word, POS and syntactic
  • structure. Search engine results are parsed and

filtered with enriched queries.

  • Stored core corpus for fast results; supplemented

with real-time crawling and parsing if desired.

  • User-friendly, easy to use web interface, designed for

the theoretical linguist. Graphical query-by-example query construction.

slide-9
SLIDE 9

The Linguist’s Search Engine

Resnik and Elkiss (2005)

  • Searching the web by word, POS and syntactic
  • structure. Search engine results are parsed and

filtered with enriched queries.

  • Stored core corpus for fast results; supplemented

with real-time crawling and parsing if desired.

  • User-friendly, easy to use web interface, designed for

the theoretical linguist. Graphical query-by-example query construction.

  • Defunct. Too much work to create and support such

special tools?

slide-10
SLIDE 10

The Natural Language Toolkit

Bird et al. (2009)

  • A collection of Python modules for text analysis and

various NLP tasks

  • Interactive command-line environment for interactive

linguistic exploration

  • Relatively little technical skill required
  • Documented in a very accessible book (Bird et al.

2009) targeted to the “ordinary working linguist”

slide-11
SLIDE 11

Benefits of the no-frills approach

  • Command line tools are easier to write and maintain.

Also better for scripting, creating workflows, etc.

  • An integrated tool is less flexible in the tasks it can

carry out. A little scripting unlocks the power of computational techniques.

  • Command line tools still need to be reasonably easy

to install and use, and should be well documented at an accessible level.

slide-12
SLIDE 12

Benefits of the no-frills approach

  • Command line tools are easier to write and maintain.

Also better for scripting, creating workflows, etc.

  • An integrated tool is less flexible in the tasks it can

carry out. A little scripting unlocks the power of computational techniques.

  • Command line tools still need to be reasonably easy

to install and use, and should be well documented at an accessible level.

  • The NLTK and similar resources will still be beyond

the reach of linguists unable, or unwilling, to make the required time investment.

  • Is this a big problem? I don’t believe it is.
slide-13
SLIDE 13

Conclusions I

  • To benefit from NLP

, theoretical linguists must incorporate its methods and values in their work habits.

  • Joint research is the best scenario, but is limited in
  • reach. Many needs are simpler.
  • Tool creation is not a strength of theoretical linguists.

Making resources available is great– and the tools don’t need to be point-and-click.

slide-14
SLIDE 14

Conclusions II

  • We theoretical linguists need to help ourselves, by

adopting methodological insights and learning to use the associated techniques and tools.

  • Hopefully, computationally savvy theorists will go on

to formulate more research questions suitable for real collaboration with computational linguists.