multi source adjustment of multi layer annotation the
play

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of - PowerPoint PPT Presentation

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20 January 2012 http://gmb.let.rug.nl 1/1 Multi-Source Adjustment of


  1. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20 January 2012 http://gmb.let.rug.nl 1/1

  2. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach The Goal ◮ Groningen Meaning Bank (GMB) project: build corpus of 100,000 semantically annotated texts ◮ manual annotation too expensive ◮ bootstrapping approach: ⊲ use state-of-the-art NLP toolchain to produce first approximation ⊲ collect data from various sources to incrementally correct and refine annotation http://gmb.let.rug.nl 2/1

  3. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 3/1

  4. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 4/1

  5. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 5/1

  6. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 6/1

  7. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 7/1

  8. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Bits of Wisdom ◮ not changes (diffs, patches, corrections), but ◮ assertions (facts, constraints), e.g. ⊲ there is a token boundary at character offset 5 ⊲ the POS tag of the token between offsets 4 and 7 is MD ◮ not necessarily correct ◮ encode expert wisdom, collective wisdom, or automatic wisdom http://gmb.let.rug.nl 8/1

  9. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Boundary Bow s ◮ applicable output type: tokenized text ◮ boundary ( token , + , 152) ⊲ read: there is a token boundary at character offset 152 ⊲ ... the Popular Movement for the Liberation of Angola ( MPLA Š ), led by ... ◮ boundary ( sentence , − , 179) ⊲ read: there is no sentence boundary at character offset 179 ⊲ Macrumors. „ com , a website that ... ◮ application: insert or remove one token or sentence boundary if needed http://gmb.let.rug.nl 9/1

  10. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Tag Bow s ◮ applicable output type: tokenized and tagged text ◮ Example Bow : tag ( pos , VBZ , 616 , 623) ⊲ read: the token between character offsets 616 and 623 has the POS tag VBZ ◮ application: change the tag if needed; do nothing if the token does not exist http://gmb.let.rug.nl 10/1

  11. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach How Bow s are created ◮ GMB Explorer: editing interface for experts, edits go straight to Bow DB, toolchain re-runs on save ◮ Wordrobe: multiple choice, answers generate Bow by majority vote ◮ External tools: scripts extract Bow s from tool output http://gmb.let.rug.nl 11/1

  12. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Judging Bow s ◮ Bow s may contradict each other, judge component decides which one to apply ◮ currently: preference given to expert Bow , most recent Bow ◮ future: ⊲ use existing voting techniques ⊲ use confidence scores output by external tools ⊲ use conflicts for active learning ⊲ ... http://gmb.let.rug.nl 12/1

  13. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach http://gmb.let.rug.nl 13/1

  14. Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Summary ◮ NLP toolchain + feedback sources ◮ feedback stored in database as Bow s ◮ judging and application interleaved with toolchain http://gmb.let.rug.nl 14/1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend