Resources for Computational Linguistics Annotation Tools: RSTTool - - PowerPoint PPT Presentation

resources for computational linguistics
SMART_READER_LITE
LIVE PREVIEW

Resources for Computational Linguistics Annotation Tools: RSTTool - - PowerPoint PPT Presentation

Resources for Computational Linguistics Annotation Tools: RSTTool &MMAX Presentation by Ekaterina Biehl 1/25 In the Last Session Corpus Linguistic; Corpus; Unannotated; Annotated; Annotation; Levels of Annotation;


slide-1
SLIDE 1

1/25

Resources for Computational Linguistics

Annotation Tools: RSTTool &MMAX Presentation by Ekaterina Biehl

slide-2
SLIDE 2

2/25

In the Last Session

  • Corpus Linguistic;
  • Corpus;
  • Unannotated;
  • Annotated;
  • Annotation;
  • Levels of Annotation;
  • POS- Tagging;
  • Grammatical Parsing;
  • Semantic Tagging;
  • Discoursal and Text Annotation

(RST, Discourse Tags, AnaphoricAnnotation);

  • Prosodic Annotation....
slide-3
SLIDE 3

3/25

So, We need annotation tools!

slide-4
SLIDE 4

4/25

Annotation Tools

What is important?

  • should be able to do your task;
  • speed, stability, and practical usability;
  • ready and easy to use;
  • standardized input/output format (XML).
slide-5
SLIDE 5

5/25

Today' Session

  • 1. Text analysis => RSTTool;
  • 2. Multi-level annotation => MMAX;
  • 3. Examples.
slide-6
SLIDE 6

6/25

Questions

  • 1. What is to be annotated?

2.What are the markables?

  • 3. What is the guideline and

annotation scheme?

slide-7
SLIDE 7

7/25

RSTTool

Michael O'Donnell

slide-8
SLIDE 8

8/25

RSTTool

Graphical interface to facilitate the marking up of the RST structure of text =>

  • segmentation of text;
  • graphical linking of the segments into an

RST Tree.

slide-9
SLIDE 9

9/25

RST = Rhetorical Structure Theory

  • offers an explanation to COHERENT texts
  • i.e. with no gaps and non-sequitures;
  • describes text structure by means of

„BUILDING BLOCKS“ at

  • principal level

„nuclearity“ <> „relations“;

  • second level

schemas.

slide-10
SLIDE 10

10/25

„Nuclearity“

  • Mononuclear: Nucleus => Satellite
  • Multi-nuclear: Span = Other Span
slide-11
SLIDE 11

11/25

Rhetorical Structure Theory

Example of the analysis tree:

slide-12
SLIDE 12

12/25

RSTTool

Example of the annotation

slide-13
SLIDE 13

13/25

RSTTool: Summary

  • 1. Annotation tool for particular purpose;

2.Tree visualisation;

  • 3. Graphical interface;
  • 4. Analysis time reduction;
  • 5. Statistics.
slide-14
SLIDE 14

14/25

MMAX

was developed at EML Reserch, Heidelberg

Christoph Müller

  • Dr. Michael Strube
slide-15
SLIDE 15

15/25

MMAX

  • „light-weight and highly customizable annotation tool“

(Müller & Strube (2001a, 2001b, 2003);

  • supports the multi-level annotation of (potentially

multi-modal) corpora;

  • based on the concept of markables carrying attributes

and standing in certain relations to each other.

slide-16
SLIDE 16

16/25

Consepts: Markable

  • carries the annotation information;
  • can be defined on arbitrary levels of linguistics

annotation;

  • is an entity that can consists of arbitrary sets of

elements from the data base;

slide-17
SLIDE 17

17/25

Consepts: Markable II

  • can represent multiple levels of linguistic

description;

  • can be overlapping or discontinious.
  • the principle of STAND-OFF annotation
slide-18
SLIDE 18

18/25

Concepts: Attributes

  • markables can have arbitrarily many attributes (name-

value pairs);

  • nominal attributes which have a closed set of possible

values.

slide-19
SLIDE 19

19/25

Concepts: Relations

  • relations between markables:
  • member-relation : markables having the same value

in an attribute;

  • pointer-relation: directed relations between a source

markable and arbitrarily many target markables.

slide-20
SLIDE 20

20/25

Guideline => Annotation Scheme

  • Guideline=Instruction
  • Annotation scheme = formal guideline:
  • describes which phenomena are to be

annotated using which set of attributes;

  • defines all attributes for a linguistic level.
slide-21
SLIDE 21

21/25

MMAX: the Tool

  • is written in Java;
  • XML;
  • consists of main annotation window, Search

window, attribute window.

slide-22
SLIDE 22

22/25

EXtensible Markup Language

  • XML is a markup language much like HTML;
  • XML was designed to describe data and to focus on what

data is;

  • XML tags are not predefined. You must define your own

tags

  • XML does not DO anything. XML was created to

structure, store and to send information.

slide-23
SLIDE 23

23/25

MMAX: the Tool

Example of the annotation

slide-24
SLIDE 24

24/25

MMAX: Summary

  • annotation as set of simple concepts based
  • n the notion of „markable“;
  • almost any kind of annotation can be done;
  • multiple levels;
  • stand-off annotation;
  • can express highly customizable annotation

schemes;

  • is compartible with ISO standard (ISO TC37

SC4).

slide-25
SLIDE 25

25/25

Conclusion:

 RSTTool:

 Rhetorical Structure Theory;  tree visualization;  dedicated annotation tool.

 MMAX:

 flexile tool for almost any kind of annotation;  annotation refers to markables;  simple and customisable.

slide-26
SLIDE 26

26/25

References:

Corpus Linguistics- Annotation:

  • http://www.coli.uni-saarland.de/courses/korbay/Complingres/Slides/corpora.pdf
  • http://bowland-files.lancs.ac.uk/monkey/ihe/linguistics/contents.htm
  • http://coli.lili.uni-bielefeld.de/forschung/xbrac/pdf/xbrac-dipperetal-sfb.pdf - search='MMAX%2

RSTTool:

  • http://www.wagsoft.com/RSTTool
  • http://www.sfu.ca/rst

MMAX:

  • http://www.eml-research.de/english/research/nlp/download/sigdial03.pdf
  • http://www.eml-research.de/english/research/nlp/download/mmax.php

XML:

  • http://www.w3schools.com/

ISO:

  • http://www.tc37sc4.org

TEI

  • http://www.tei-c.org/