WebAnno: a flexible, web-based annotation tool for CLARIN Richard - - PowerPoint PPT Presentation

webanno a flexible web based
SMART_READER_LITE
LIVE PREVIEW

WebAnno: a flexible, web-based annotation tool for CLARIN Richard - - PowerPoint PPT Presentation

WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho , Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International. If you are


slide-1
SLIDE 1

WebAnno: a flexible, web-based annotation tool for CLARIN

Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam

#WebAnno

This work is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International. If you are interested in using this material under different conditions, please contact us.

slide-2
SLIDE 2

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 2

WebAnno – an annotation tool for text

Team tool

Allows a distributed team of annotators to work on a corpus Supports different roles within the team (e.g. user / manager)

Flexible

Multi-layer annotation with configurable annotation layers Different annotation modes including correction and learning modes

Web-based

Available to annotators everywhere, no installation effort All configuration performed through the web interface

Platform independent

Platform independent Java-based application

Open source

Allows the community to participate

slide-3
SLIDE 3

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 3

WebAnno – an annotation tool for CLARIN

Developed based on the requirements of CLARIN F-AG 7…

Dipper et al. NoSta-D: A corpus of German non-standard varieties. Non-Standard Data Sources in Corpus-Based Research (2013): 69-76. Benikova et al. NoSta-D Named Entity Annotation for German: Guidelines and Dataset. Proceedings of LREC. 2014.

… but also used beyond F-AG 7

Pedersen et al. Semantic Annotation of the Danish CLARIN Reference Corpus. Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Sem. Annotation. 2014.

… used and recognized beyond CLARIN

Search “WebAnno” on Google Scholar See our public users mailing list

WebAnno is the first annotation tool to supporting WebLicht TCF

Worked with TCF developers to improve TCF support updating files!

WebAnno team is constantly in touch with the community

Visit http://webanno.googlecode.com after the talk to participate in our survey!

slide-4
SLIDE 4

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 4

Annotation examples

Part-of-Speech & syntactic dependencies Co-reference Named entities

slide-5
SLIDE 5

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 5

Main Menu

Annotate texts from scratch Review and correct previously annotated documents Employ integrated machine learning capabilities Compare annotations from different annotators and merge them Assign workload to annotators and monitor their progress Create new projects Create new user accounts

slide-6
SLIDE 6

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 6

Workflow of a WebAnno project

d

t

EXPORT FINAL DATASET

slide-7
SLIDE 7

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 7

Curation

highlight sentences with disagreement display annotators color-coded agreement curator’s editor

slide-8
SLIDE 8

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 8

WebAnno offers various built-in annotation layers

User can immediately start annotating Only linguistic layers Layer semantics are known

Custom layers allow WebAnno to be adapted to unforeseen tasks

Adapt to non-linguistic annotation tasks Adapt to unforeseen linguistic annotation tasks Layer semantics are unknown

Import/export of annotated data

Layers with known semantics convert from/to many formats (TCF, CoNLL, …) Layers with unknown semantics convert from/to generic formats (XMI, …)

Built-in layers vs. custom layers

slide-9
SLIDE 9

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 9

Layer types

Existing built-in layers were generalized into three layer types Span layer – POS, lemma, named entity, … Relation layer – Syntactic dependencies, …

Attaches to span annotations Directed, reversible arcs

Chain layer – Co-reference chains, …

Undirected arcs

Layers can be further customized using “behaviours”

Character-based or token-based Single/multiple token Crossing of sentence boundaries Stacking

slide-10
SLIDE 10

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 10

Custom layer examples

Person (span) / Relationship (relation) Semantic predicates and arguments (span/relation)

slide-11
SLIDE 11

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 11

Custom layer examples

Person (span) / Relationship (relation) Semantic predicates and arguments (span/relation)

slide-12
SLIDE 12

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 12

Custom layer configuration

Control behavior Features Layers Controlled vocabulary

slide-13
SLIDE 13

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 13

Integrated machine-learning

Annotating data from scratch is more work than correcting WebAnno learns from pre-annotated data and makes suggestions Accept suggestions with a single click Correct suggestions to improve training data

slide-14
SLIDE 14

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 14

Example: Chunking

Chunk training data Part-of-Speech training data POS-tagger model Chunker model POS tagged text Chunk suggestions Chunk- annotated documents

Externally pre- annotated primary data Data annotated in WebAnno Externally pre- annotated secondary data

slide-15
SLIDE 15

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 15

Automation configuration

Primary training data Secondary training data Training data example

slide-16
SLIDE 16

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 16

Deploy WebAnno as you need it

personal workstation

  • n-premise group server

cloud-based group server CLARIN infrastructure service to come… migrate projects click to start

webanno-standalone.jar

slide-17
SLIDE 17

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 17

Where we want to go from here…

Extend the scope of WebAnno

Support for slot-based annotation layers (semantic annotations) Tagset constraints Support for more built-in linguistic layers

Improve continuously based on user feedback

More efficient annotation interface Support for additional corpus formats … your feedback?

Deploy as a CLARIN infrastructure service

CLARIN AAI support Reduce administrative overhead for operators Self-service for project managers

slide-18
SLIDE 18

24.10.2014 | Computer Science Department | UKP Lab | Richard Eckart de Castilho 18

#WebAnno

http://webanno.googlecode.com

V i s i t m e i n t h e d e m

  • s

e s s i

  • n

!