Multi-Source Adjustment of Multi-Layer Annotation: the Bits of - - PowerPoint PPT Presentation

multi source adjustment of multi layer annotation the
SMART_READER_LITE
LIVE PREVIEW

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of - - PowerPoint PPT Presentation

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20 January 2012 http://gmb.let.rug.nl 1/1 Multi-Source Adjustment of


slide-1
SLIDE 1

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Kilian Evang 20 January 2012

http://gmb.let.rug.nl 1/1

slide-2
SLIDE 2

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

The Goal

◮ Groningen Meaning Bank (GMB) project: build corpus of 100,000 semantically annotated texts ◮ manual annotation too expensive ◮ bootstrapping approach: ⊲ use state-of-the-art NLP toolchain to produce first approximation ⊲ collect data from various sources to incrementally correct and refine annotation

http://gmb.let.rug.nl 2/1

slide-3
SLIDE 3

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 3/1

slide-4
SLIDE 4

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 4/1

slide-5
SLIDE 5

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 5/1

slide-6
SLIDE 6

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 6/1

slide-7
SLIDE 7

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 7/1

slide-8
SLIDE 8

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Bits of Wisdom

◮ not changes (diffs, patches, corrections), but ◮ assertions (facts, constraints), e.g. ⊲ there is a token boundary at character offset 5 ⊲ the POS tag of the token between offsets 4 and 7 is MD ◮ not necessarily correct ◮ encode expert wisdom, collective wisdom, or automatic wisdom

http://gmb.let.rug.nl 8/1

slide-9
SLIDE 9

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Boundary Bows

◮ applicable output type: tokenized text ◮ boundary(token, +, 152) ⊲ read: there is a token boundary at character offset 152 ⊲ ... the Popular Movement for the Liberation of Angola ( MPLA Š ), led by ... ◮ boundary(sentence, −, 179) ⊲ read: there is no sentence boundary at character offset 179 ⊲ Macrumors.„com , a website that ... ◮ application: insert or remove one token or sentence boundary if needed

http://gmb.let.rug.nl 9/1

slide-10
SLIDE 10

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Tag Bows

◮ applicable output type: tokenized and tagged text ◮ Example Bow: tag(pos, VBZ, 616, 623) ⊲ read: the token between character offsets 616 and 623 has the POS tag VBZ ◮ application: change the tag if needed; do nothing if the token does not exist

http://gmb.let.rug.nl 10/1

slide-11
SLIDE 11

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

How Bows are created

◮ GMB Explorer: editing interface for experts, edits go straight to Bow DB, toolchain re-runs on save ◮ Wordrobe: multiple choice, answers generate Bow by majority vote ◮ External tools: scripts extract Bows from tool output

http://gmb.let.rug.nl 11/1

slide-12
SLIDE 12

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Judging Bows

◮ Bows may contradict each other, judge component decides which one to apply ◮ currently: preference given to expert Bow, most recent Bow ◮ future: ⊲ use existing voting techniques ⊲ use confidence scores output by external tools ⊲ use conflicts for active learning ⊲ ...

http://gmb.let.rug.nl 12/1

slide-13
SLIDE 13

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

http://gmb.let.rug.nl 13/1

slide-14
SLIDE 14

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach

Summary

◮ NLP toolchain + feedback sources ◮ feedback stored in database as Bows ◮ judging and application interleaved with toolchain

http://gmb.let.rug.nl 14/1