Teaching Unstructured Information Management: Theory and - - PowerPoint PPT Presentation

teaching unstructured information management theory and
SMART_READER_LITE
LIVE PREVIEW

Teaching Unstructured Information Management: Theory and - - PowerPoint PPT Presentation

Teaching Unstructured Information Management: Theory and Applications to Computational Linguistics Students Iryna Gurevych, Christof Mller, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science


slide-1
SLIDE 1

Teaching “Unstructured Information Management: Theory and Applications” to Computational Linguistics Students

Iryna Gurevych, Christof Müller, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science Department Darmstadt University of Technology

slide-2
SLIDE 2

Typical NLP course

  • Project topic

Yet another tokenizer

  • Project results

Unstable software Works only under special preconditions Hard-coded configuration

  • “The software has to be installed in directory foo“
  • “The name of the input file has to be foobar”
slide-3
SLIDE 3

Goals of our NLP course

  • Teach basics in unstructured information management
  • Separate software engineering from NLP

Provide a framework and preprocessing components

  • Enabling students to:

Concentrate on computational linguistics part Work on more challenging/motivating tasks

Using UIMA to reach these goals

slide-4
SLIDE 4

Student projects

  • 6. Wrap up, Q&A
  • 5. CPEs & PEAR packages
  • 4. Consumers & Readers
  • 3. Annotators
  • 2. UIMA basics
  • 1. Lecture

Course outline

  • Compact seminar

6 sessions 4 hours each

  • Course requirements (MA level)

Participation Implement a practical project Deliver results as PEAR package Write a course paper

slide-5
SLIDE 5

Student projects

  • Suitable task were defined in collaboration with lecturers
  • Selected projects:

Annotating Wikipedia articles Extracting lexical semantic information from blogs Named entity recognition Sentiment detection Word sense disambiguation

slide-6
SLIDE 6

Annotating Wikipedia Articles

  • Annotate structural elements in Wikipedia articles

Sections, paragraphs, lists, bold terms, ...

  • Visualize annotations
  • Wikipedia API is provided to retrieve articles

Wikipedia article reader Structural elements annotator Visualizer

UIMA reader UIMA analysis engine UIMA consumer

slide-7
SLIDE 7
  • Analyze blogs
  • Find keywords
  • Detect semantic relations between keywords

Desired output:

Lexical Semantic Information from Blogs

slide-8
SLIDE 8

Lexical Semantic Information from Blogs

UIMA components as proposed by the students.

slide-9
SLIDE 9

Named Entity Recognition

  • Hybrid approach: rules + gazetteers
  • Preprocessing components were provided
  • GermaNet and Wikipedia are accessed as UIMA resources
slide-10
SLIDE 10

Sentiment Detection

  • Detect sentiment expressions and link them with the judged

entity

  • Preprocessing components were provided
  • Robust NER component is required, but not yet available for

UIMA

  • Used GATE-UIMA interoperability layer to integrate ANNIE

tool

Text input reader Sentiment Detector Result writer

UIMA reader UIMA analysis engine UIMA consumer

NER

UIMA-GATE GATE-UIMA

GATE component

slide-11
SLIDE 11

Word Sense Disambiguation

  • Implements the WSD approach by Patwardhan and Pedersen

(2006)

  • Necessary word glosses are generated using GermaNet
  • GermaNet is accessed as a UIMA resource
  • Preprocessing components were provided

Text input reader WSD Result writer

UIMA reader UIMA analysis engines UIMA consumer

Provided preprocessing components

slide-12
SLIDE 12

Lessons Learned

  • Advantages of using UIMA

Provide necessary preprocessing tools Enables more challenging/motivating tasks Uniform structure of project results (PEAR package) Students can concentrate on their core competences Focus is on modeling rather than programming

  • Challenges

Complexity of UIMA architecture Motivate students

  • Possible solution

Provide a preconfigured work environment vs. Learn UIMA

slide-13
SLIDE 13
  • Acknowledgments:
  • Prof. Erhard Hinrichs for his idea to offer the course

ISCL students participating

  • Jonathan Khoo, Niels Ott, Sladjana Pavlovic, Maria Tchalakova, Bela

Usabaev, Desislava Zhekova, Ramon Ziai

http://www.ukp.tu-darmstadt.de/

Thank you very much! Thank you very much!