The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University - - PowerPoint PPT Presentation

the nite xml toolkit
SMART_READER_LITE
LIVE PREVIEW

The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University - - PowerPoint PPT Presentation

The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University of Edinburgh Dialogue Interest Group Dec 2009 Kilgour&Carletta NXT A toy example of linguistic data Kilgour&Carletta NXT NITE XML Toolkit Open source toolkit for


slide-1
SLIDE 1

The NITE XML Toolkit

Jonathan Kilgour and Jean Carletta

University of Edinburgh

Dialogue Interest Group Dec 2009

Kilgour&Carletta NXT

slide-2
SLIDE 2

A toy example of linguistic data

Kilgour&Carletta NXT

slide-3
SLIDE 3

NITE XML Toolkit

Open source toolkit for handling annotations with temporal

  • rdering and full structural relations

Data storage format designed to support distributed corpus development

Libraries for data handling, query, and writing graphical user interfaces Configurable end user browsing and annotation tools for common tasks Command line utilities for analysis, feature extraction

Kilgour&Carletta NXT

slide-4
SLIDE 4

47.0 48.0 49.0

t (s)

ph t ph hh ph ae ph v ph t ph ax ph d ph iy ph l ph w ph ih ph dh ph ih ph t ph dh ph ah ph g ph dh ph ah ph ah ph v ph er ph m ph ih ph n ph t ph d ph ah ph z ph en syl n syl s syl n syl p syl n syl p syl n syl p syl p syl p syl p syl n word have VB word to TO word deal VB word with IN word it PRP word the DT word the DT word government NN word doesn’t VBZ-RB phrase disfl nt NP nt NP markable

  • rganisation

med-gen kontrast contrast kontrast contrast kontrast backgd da statement kontrast backgd markable non-concrete

  • ld

nt EDITED nt PP nt VP nt VP nt VP nt VP nt S nt S nt NP accent nuclear accent plain accent nuclear phrase minor phrase major trace movement target source repair reparandum disfluency sil

* * * * * * * * * * * * * * * * * *

word does VBZ phonword doesn’t 47.96-48.18

*

phon word n’t RB phon syl n word the DT phonword the 47.48-47.61

*

phon

Kilgour&Carletta NXT

slide-5
SLIDE 5

47.0 48.0 49.0

t (s)

ph t ph hh ph ae ph v ph t ph ax ph d ph iy ph l ph w ph ih ph dh ph ih ph t ph dh ph ah ph g ph dh ph ah ph ah ph v ph er ph m ph ih ph n ph t ph d ph ah ph z ph en syl n syl s syl n syl p syl n syl p syl n syl p syl p syl p syl p syl n word have VB word to TO word deal VB word with IN word it PRP word the DT word the DT word government NN word doesn’t VBZ-RB phrase disfl nt NP nt NP markable

  • rganisation

med-gen kontrast contrast kontrast contrast kontrast backgd da statement kontrast backgd markable non-concrete

  • ld

nt EDITED nt PP nt VP nt VP nt VP nt VP nt S nt S nt NP accent nuclear accent plain accent nuclear phrase minor phrase major trace movement target source repair reparandum disfluency sil

* * * * * * * * * * * * * * * * * *

word does VBZ phonword doesn’t 47.96-48.18

*

phon word n’t RB phon syl n word the DT phonword the 47.48-47.61

*

phon

Kilgour&Carletta NXT

slide-6
SLIDE 6

Community support

stand-off annotation using multiple files under version control dependency structure for keeping track of which annotations rely on which versions of which other annotations multiple competing annotations for the same thing (different humans for a reliability assessment, different automatic processes for a competition) logical query language - because this is the only way to analyse this kind of data

Kilgour&Carletta NXT

slide-7
SLIDE 7

What’s wrong with NXT

Flexibility makes it harder to just start using it

need to formally describe corpus structure some users struggle with logic

no indexing locations within still images or video frames Not enough packaging (connection to automatic tools, authoring corpus structure description) Not ”sold” enough, not known very well in America

Kilgour&Carletta NXT

slide-8
SLIDE 8

Butterflies: deixis

Kilgour&Carletta NXT

slide-9
SLIDE 9

Butterflies: Bible studies

Kilgour&Carletta NXT

slide-10
SLIDE 10

Butterflies: movie review analysis

Kilgour&Carletta NXT

slide-11
SLIDE 11

Butterflies: dialogue system strategy

Kilgour&Carletta NXT

slide-12
SLIDE 12

Butterflies: eyetracking

Kilgour&Carletta NXT

slide-13
SLIDE 13

Butterflies: eyetracking

Kilgour&Carletta NXT

slide-14
SLIDE 14

Flock of birds

Kilgour&Carletta NXT

slide-15
SLIDE 15

Google Earth mashup

Kilgour&Carletta NXT