matching needs and resources how nlp can help theoretical
play

Matching needs and resources: How NLP can help theoretical - PowerPoint PPT Presentation

Matching needs and resources: How NLP can help theoretical linguistics Alexis Dimitriadis Utrecht institute of Linguistics OTS Utrecht University Motivation Insight: Linguistic research could be carried out more efficiently (or: with


  1. Matching needs and resources: How NLP can help theoretical linguistics Alexis Dimitriadis Utrecht institute of Linguistics OTS Utrecht University

  2. Motivation • Insight: Linguistic research could be carried out more efficiently (or: with better results) if all that NLP horsepower could be brought to bear. But how? • Some linguists have both theoretical and NLP knowledge, and can choose the techniques to apply. But most theoretical linguists are not self-sufficient in this respect: • Theoretical linguists should want help from NLP . • Computational linguists should want to help.

  3. How NLP could help I Best-case scenario • Collaboration, to tackle a research question interesting to both sides. Examples: Language genealogy, learning algorithms (e.g., learning OT constraint ranking). • However: suitable research problems are limited. • Many theoretical projects have modest (i.e., boring) computational needs. A large corpus of English is enough for a lot of linguists...

  4. Some collaborative research topics • Computational cladistics for Indo-European language families. (Ringe, Warnow and Taylor 2002) • Algorithms for learning constraint rankings for Optimality-Theoretic systems (phonology, stress, syntax...) (Tesar and Smolensky 2000) • Cognitive modeling of various linguistic phenomena (cf. Workshop 5) • Studying the semantics of lexical entailments, using a new custom-developed corpus. (Winter et al., ongoing)

  5. How NLP could help II Lending a helping hand • Freebies Use of existing corpora, parsers etc. • Altruism The computational linguist undertakes to help the theorist. Examples: Linguist’s Search Engine (Resnik et al. 2005), Natural Language Toolkit (Bird et al. 2009). Problems: Existing resources are often hard to use for the uninitiated. Altruism is a limited resource. How is it most effective?

  6. How NLP could help III Learning from NLP methodology • Data-driven orientation • Reproducibility, inter-annotator consistency • Testing against a corpus of data • Evaluation metrics

  7. Utilizing concrete resources and technical stumbling blocks • Many existing, boring tools and resources would be useful to a theoretical linguist: corpora, parsers, web crawlers... • However, linguists typically lack the necessary technical know-how, compilers, or even knowledge that such tools exist.

  8. The Linguist’s Search Engine Resnik and Elkiss (2005) • Searching the web by word, POS and syntactic structure. Search engine results are parsed and filtered with enriched queries. • Stored core corpus for fast results; supplemented with real-time crawling and parsing if desired. • User-friendly, easy to use web interface, designed for the theoretical linguist. Graphical query-by-example query construction.

  9. The Linguist’s Search Engine Resnik and Elkiss (2005) • Searching the web by word, POS and syntactic structure. Search engine results are parsed and filtered with enriched queries. • Stored core corpus for fast results; supplemented with real-time crawling and parsing if desired. • User-friendly, easy to use web interface, designed for the theoretical linguist. Graphical query-by-example query construction. • Defunct. Too much work to create and support such special tools?

  10. The Natural Language Toolkit Bird et al. (2009) • A collection of Python modules for text analysis and various NLP tasks • Interactive command-line environment for interactive linguistic exploration • Relatively little technical skill required • Documented in a very accessible book (Bird et al. 2009) targeted to the “ordinary working linguist”

  11. Benefits of the no-frills approach • Command line tools are easier to write and maintain. Also better for scripting, creating workflows, etc. • An integrated tool is less flexible in the tasks it can carry out. A little scripting unlocks the power of computational techniques. • Command line tools still need to be reasonably easy to install and use, and should be well documented at an accessible level.

  12. Benefits of the no-frills approach • Command line tools are easier to write and maintain. Also better for scripting, creating workflows, etc. • An integrated tool is less flexible in the tasks it can carry out. A little scripting unlocks the power of computational techniques. • Command line tools still need to be reasonably easy to install and use, and should be well documented at an accessible level. • The NLTK and similar resources will still be beyond the reach of linguists unable, or unwilling, to make the required time investment. • Is this a big problem? I don’t believe it is.

  13. Conclusions I • To benefit from NLP , theoretical linguists must incorporate its methods and values in their work habits. • Joint research is the best scenario, but is limited in reach. Many needs are simpler. • Tool creation is not a strength of theoretical linguists. Making resources available is great– and the tools don’t need to be point-and-click.

  14. Conclusions II • We theoretical linguists need to help ourselves, by adopting methodological insights and learning to use the associated techniques and tools. • Hopefully, computationally savvy theorists will go on to formulate more research questions suitable for real collaboration with computational linguists.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend