A Framework and Tool for Collaborative Extraction of Reliable - - PowerPoint PPT Presentation

a framework and tool for collaborative extraction of
SMART_READER_LITE
LIVE PREVIEW

A Framework and Tool for Collaborative Extraction of Reliable - - PowerPoint PPT Presentation

A Framework and Tool for Collaborative Extraction of Reliable Information A Framework and Tool for Collaborative Extraction of Reliable Information Graham Neubig 1 , Shinsuke Mori 2 , Masahiro Mizukami 1 1 Nara Institute of Science and Technology


slide-1
SLIDE 1

1

A Framework and Tool for Collaborative Extraction of Reliable Information

A Framework and Tool for Collaborative Extraction of Reliable Information

Graham Neubig1, Shinsuke Mori2, Masahiro Mizukami1

1Nara Institute of Science and Technology 2Kyoto University

slide-2
SLIDE 2

2

A Framework and Tool for Collaborative Extraction of Reliable Information

Background

slide-3
SLIDE 3

3

A Framework and Tool for Collaborative Extraction of Reliable Information

What is Information Extraction?

  • Find useful information from large amounts of noise

Information Source (e.g. Internet)

Info about Hobbies Word-of-mouth Information

  • Info. about

Events

slide-4
SLIDE 4

4

A Framework and Tool for Collaborative Extraction of Reliable Information

Information Source (e.g. Internet) Information Extraction in Times of Crisis

  • Noise is particularly prevalent in times of crisis

Provision of Safety Info. Requests for Safety Info Evacuation Shelters/ Rescue Supplies

ANPI_NLP Project [Neubig+ 11] #99japan Project [Aida+ 13]

slide-5
SLIDE 5

5

A Framework and Tool for Collaborative Extraction of Reliable Information

Necessities for Crisis-time Information Extraction

  • Speed
  • Necessary to provide information ASAP to those in need
  • Absolute Reliability
  • Provision of mistaken information could be deadly
  • In general, info will likely require confirmation before

consumption

  • Difficult to Predict Needs
  • Wildfire → Wind, Earthquake → Diapers, Radiation
  • Many volunteers! [Starbird+10, Neubig+11]
  • Challenge: How do we let volunteers work efficiently as

possible to provide reliable information quickly?

slide-6
SLIDE 6

6

A Framework and Tool for Collaborative Extraction of Reliable Information

This Work

  • We propose a method for efficient extraction of reliable

information:

  • Use machine learning (relevance feedback) to decide

which examples to show to annotators

  • Web-based collaborative interface to allow multiple

annotators to work on a single task

  • Evaluation on data from Twitter
  • Toolkit freely available open source

webigator: http://www.phontron.com/webigator

slide-7
SLIDE 7

7

A Framework and Tool for Collaborative Extraction of Reliable Information

Information Extraction Framework

slide-8
SLIDE 8

8

A Framework and Tool for Collaborative Extraction of Reliable Information

Information Extraction Task

They really need to open more evacuation areas in Sendai! They are distributing water at Ishinomaki High School today. I was able to fill up my car at the gas station at XXX. Got to the evacuation center, but I'm almost out of battery!

  • Information filtering: Remove documents with no

actionable information

  • Information extraction: Identify which terms fill slots

(e.g. status, location)

  • For Twitter, documents are small but numerous, so

filtering is a challenge

I was able to fill up my car at the gas station at XXX. They are distributing water at Ishinomaki High School today.

slide-9
SLIDE 9

9

A Framework and Tool for Collaborative Extraction of Reliable Information

Information Filtering as Classification

  • Binary classification of “useful or not?”
  • Define features, use machine learning to learn weights
  • Notable for large proportion of negative examples
  • x

x x x x x x x x x x x x x x x x x x Normal Classification Filtering

  • x

x x x x x

Pos. Neg.

slide-10
SLIDE 10

10

A Framework and Tool for Collaborative Extraction of Reliable Information

Constructing a Classifier Requires Lots of Data

  • x

x x x x x

  • x

x x x x x Little Data Lots of Data Bold = Lots of Data

slide-11
SLIDE 11

11

A Framework and Tool for Collaborative Extraction of Reliable Information

Active Learning

  • Way to create a good classifier efficiently
  • Choose examples to annotate based on predictions
  • x

x x x x x

Positive Negative

slide-12
SLIDE 12

12

A Framework and Tool for Collaborative Extraction of Reliable Information

Active Learning

  • x

x x x x x

  • Way to create a good classifier efficiently
  • Choose examples to annotate based on predictions
slide-13
SLIDE 13

13

A Framework and Tool for Collaborative Extraction of Reliable Information

Active Learning

  • x

x x x x x

  • Way to create a good classifier efficiently
  • Choose examples to annotate based on predictions
slide-14
SLIDE 14

14

A Framework and Tool for Collaborative Extraction of Reliable Information

Active Learning

  • x

x x x x x

  • Way to create a good classifier efficiently
  • Choose examples to annotate based on predictions
slide-15
SLIDE 15

15

A Framework and Tool for Collaborative Extraction of Reliable Information

Active Learning

  • x

x x x x x

  • Way to create a good classifier efficiently
  • Choose examples to annotate based on predictions
slide-16
SLIDE 16

16

A Framework and Tool for Collaborative Extraction of Reliable Information

Problems with Unbalanced Data

  • In information extraction, almost everything is negative
  • x

x x x x x x x x x x x x x x x x x x

slide-17
SLIDE 17

17

A Framework and Tool for Collaborative Extraction of Reliable Information

Problems with Unbalanced Data

  • In information extraction, almost everything is negative
  • x

x x x x x x x x x x x x x x x x x x

slide-18
SLIDE 18

18

A Framework and Tool for Collaborative Extraction of Reliable Information

Problems with Unbalanced Data

  • In information extraction, almost everything is negative
  • x

x x x x x x x x x x x x x x x x x x

slide-19
SLIDE 19

19

A Framework and Tool for Collaborative Extraction of Reliable Information

Problems with Unbalanced Data

  • In information extraction, almost everything is negative
  • x

x x x x x x x x x x x x x x x x x x

slide-20
SLIDE 20

20

A Framework and Tool for Collaborative Extraction of Reliable Information

Problems with Unbalanced Data

  • In information extraction, almost everything is negative
  • x

x x x x x x x x x x x x x x x x x x

slide-21
SLIDE 21

21

A Framework and Tool for Collaborative Extraction of Reliable Information

Our Simple Fix

  • Small change to example selection criterion

Standard: Select low confidence examples Proposed: Select examples with high probability of being positive

  • Effective when final human check is necessary
  • Labeling a positive example =

finding a highly reliable piece of information

slide-22
SLIDE 22

22

A Framework and Tool for Collaborative Extraction of Reliable Information

Our Simple Fix

  • Finds many positive examples quickly
  • Using these positive examples, learn characteristics

that help pick out more

  • x

x x x x x x x x x x x x x x x x x x

slide-23
SLIDE 23

23

A Framework and Tool for Collaborative Extraction of Reliable Information

Scaling Up

slide-24
SLIDE 24

24

A Framework and Tool for Collaborative Extraction of Reliable Information

Too Much Data!

  • e.g. Twitter after the Great East Japan Earthquake =

peak of 1237 tweets/second

  • Problems with:
  • Viewing even the high scoring tweets with one person
  • Rescoring every tweet after each round of learning
slide-25
SLIDE 25

25

A Framework and Tool for Collaborative Extraction of Reliable Information

Collaborative Web-based Interface

  • Allow multiple annotators to cooperate

Workers Web UI Web UI Web UI

Server

Display Text Submit Label Text Retrieval/ Scoring Internet Information List

slide-26
SLIDE 26

26

A Framework and Tool for Collaborative Extraction of Reliable Information

Web Interface

slide-27
SLIDE 27

27

A Framework and Tool for Collaborative Extraction of Reliable Information

Efficiency Improvements

1) Simple keyword search filter 2) Rescoring policy

  • Maintain a sorted list of highly scored examples
  • When retrieving next example:
  • Choose the example highest in the cache, rescore
  • After rescoring, still better than second best, return
  • Otherwise, return to beginning

Type Keywords

Evacuation/Supplies evacuation area, water supplies, food supplies Safety Info Request contact, cannot, waiting Safety Info Provision contact, safe

slide-28
SLIDE 28

28

A Framework and Tool for Collaborative Extraction of Reliable Information

Experiments

slide-29
SLIDE 29

29

A Framework and Tool for Collaborative Extraction of Reliable Information

Evaluation

  • Compared Methods:
  • Keyword search
  • Proposed learning-based method
  • Target:
  • 179M tweets week after Great East Japan Earthquake
  • Three types of info: evacuation/rescue supplies, safety

info request, safety info provision

  • Evaluation measure:
  • Amount of reliable information extracted in 30 mins.
  • Use shared Google Doc as repository for information
slide-30
SLIDE 30

30

A Framework and Tool for Collaborative Extraction of Reliable Information

Effect of Learning

  • Experiments with one

annotator for three tasks

  • Observable increase in

amount of information extracted and accuracy

  • Some tasks easier than
  • thers

5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 10 20 30 40 50 60 70 80 5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 10 20 30 40 50 60 70 80

Rescue Supplies/Evacuation Areas Filtering Accuracy Information Extracted w/ Learning w/o Learning

5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 10 20 30 40 50 60 70 80

Safety Info. Request Safety Info. Provision

slide-31
SLIDE 31

31

A Framework and Tool for Collaborative Extraction of Reliable Information

Effect of Collaboration

  • Experiments with 1-3 users using same interface
  • As expected, increasing users = increasing efficiency

5 10 15 20 25 30 20 40 60 80 100 120 140 Time (Minutes) Pieces of Information

1 User 2 Users 3 Users

slide-32
SLIDE 32

32

A Framework and Tool for Collaborative Extraction of Reliable Information

Conclusion

  • A method for information filtering that focuses on

positive examples

  • More effective than simple keyword search
  • Remaining challenges:
  • Identification/clustering of duplicates
  • Application to identification of slots as well

webigator: http://www.phontron.com/webigator