Combining Crowd and Expert Labels using Decision Theoretic Active - - PowerPoint PPT Presentation

combining crowd and expert labels using decision
SMART_READER_LITE
LIVE PREVIEW

Combining Crowd and Expert Labels using Decision Theoretic Active - - PowerPoint PPT Presentation

Combining Crowd and Expert Labels using Decision Theoretic Active Learning An T. Nguyen 1 Byron C. Wallace Matthew Lease University of Texas at Austin HCOMP, 2015 1 Presenter The Problem: Label Collection Have some unlabeled data. Want


slide-1
SLIDE 1

Combining Crowd and Expert Labels using Decision Theoretic Active Learning

An T. Nguyen 1 Byron C. Wallace Matthew Lease

University of Texas at Austin

HCOMP, 2015

1Presenter

slide-2
SLIDE 2

The Problem: Label Collection

◮ Have some unlabeled data. ◮ Want labels ◮ of high quality at low cost.

slide-3
SLIDE 3

The Problem: Label Collection

◮ Have some unlabeled data. ◮ Want labels ◮ of high quality at low cost.

Finite Pool Setting

◮ Care about label quality of current data. ◮ Dont care (much) about future data.

slide-4
SLIDE 4

Some Solutions

slide-5
SLIDE 5

Some Solutions

◮ Hire a domain expert to give labels.

slide-6
SLIDE 6

Some Solutions

◮ Hire a domain expert to give labels. ◮ Crowdsource the labeling.

slide-7
SLIDE 7

Some Solutions

◮ Hire a domain expert to give labels. ◮ Crowdsource the labeling. ◮ Build a Prediction Model (Classifier).

slide-8
SLIDE 8

Some Solutions

◮ Hire a domain expert to give labels. ◮ Crowdsource the labeling. ◮ Build a Prediction Model (Classifier).

Our work: A principled way to combine these:

slide-9
SLIDE 9

Some Solutions

◮ Hire a domain expert to give labels. ◮ Crowdsource the labeling. ◮ Build a Prediction Model (Classifier).

Our work: A principled way to combine these:

◮ Which item ? Which labeler? ◮ How to use classifier ?

slide-10
SLIDE 10

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning.

slide-11
SLIDE 11

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning. ◮ Select item to get label by

slide-12
SLIDE 12

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning. ◮ Select item to get label by

  • 1. Consider each item
  • 2. Consider each possible label.
slide-13
SLIDE 13

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning. ◮ Select item to get label by

  • 1. Consider each item
  • 2. Consider each possible label.
  • 3. Add that (item, label) to the training set
  • 4. Retrain and Evaluate.
slide-14
SLIDE 14

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning. ◮ Select item to get label by

  • 1. Consider each item
  • 2. Consider each possible label.
  • 3. Add that (item, label) to the training set
  • 4. Retrain and Evaluate.
  • 5. Weight outcomes by (predictive) probabilities
  • 6. Select one with best expected outcome.
slide-15
SLIDE 15

Method: Previous work

Roy and McCallum 2001

◮ ‘Optimal’ Active Learning. ◮ Select item to get label by

  • 1. Consider each item
  • 2. Consider each possible label.
  • 3. Add that (item, label) to the training set
  • 4. Retrain and Evaluate.
  • 5. Weight outcomes by (predictive) probabilities
  • 6. Select one with best expected outcome.

◮ Basically one-step look-ahead ◮ A (perhaps) better name: Decision Theoretic Active Learning.

slide-16
SLIDE 16

Method: Our ideas

The key idea: Extend their algorithm to include expert/crowd/classifier.

slide-17
SLIDE 17

Method: Our ideas

The key idea: Extend their algorithm to include expert/crowd/classifier.

◮ Consider (item, label, labeler).

slide-18
SLIDE 18

Method: Our ideas

The key idea: Extend their algorithm to include expert/crowd/classifier.

◮ Consider (item, label, labeler). ◮ Have a Crowd Accuracy Model:

Pr(True L|Crowd L) =?

slide-19
SLIDE 19

Method: Our ideas

The key idea: Extend their algorithm to include expert/crowd/classifier.

◮ Consider (item, label, labeler). ◮ Have a Crowd Accuracy Model:

Pr(True L|Crowd L) =? Strategy: Loss Prediction/Minimizaion

◮ Loss for expert labels = 0 ◮ Predict Loss for crowd labels ◮ Predict Loss for classifier’s prediction

slide-20
SLIDE 20

Method: Our ideas

The key idea: Extend their algorithm to include expert/crowd/classifier.

◮ Consider (item, label, labeler). ◮ Have a Crowd Accuracy Model:

Pr(True L|Crowd L) =? Strategy: Loss Prediction/Minimizaion

◮ Loss for expert labels = 0 ◮ Predict Loss for crowd labels ◮ Predict Loss for classifier’s prediction ◮ Predict Loss Reduction after adding a label by a labeler.

Decision Criteria: Loss Reduction/Cost

slide-21
SLIDE 21

Evaluation: Application

Evidence Based Medicine (EBM)

aims to inform patient care using the entirety of the evidence.

slide-22
SLIDE 22

Evaluation: Application

Evidence Based Medicine (EBM)

aims to inform patient care using the entirety of the evidence.

Biomedical Citation Screening

is the first step in EBM: identify relevant citations (paper abstracts, titles, keywords ...).

slide-23
SLIDE 23

Evaluation: Application

Evidence Based Medicine (EBM)

aims to inform patient care using the entirety of the evidence.

Biomedical Citation Screening

is the first step in EBM: identify relevant citations (paper abstracts, titles, keywords ...). Two characteristics:

◮ Very imbalanced (2-15% positive). ◮ Recall a lot more important than Precision.

slide-24
SLIDE 24

Evaluation: Application

Evidence Based Medicine (EBM)

aims to inform patient care using the entirety of the evidence.

Biomedical Citation Screening

is the first step in EBM: identify relevant citations (paper abstracts, titles, keywords ...). Two characteristics:

◮ Very imbalanced (2-15% positive). ◮ Recall a lot more important than Precision.

The expert

◮ MD, specialist ◮ very expensive, paid 100 times a crowdworker.

slide-25
SLIDE 25

Evaluation: Data

Four Biomedical Citation Screening Datasets

slide-26
SLIDE 26

Evaluation: Data

Four Biomedical Citation Screening Datasets

◮ Have expert gold labels. ◮ Have crowd labels (5 for each item) ... ◮ collected via Amazon Mechanical Turk.

slide-27
SLIDE 27

Evaluation: Data

Four Biomedical Citation Screening Datasets

◮ Have expert gold labels. ◮ Have crowd labels (5 for each item) ... ◮ collected via Amazon Mechanical Turk.

Strategy to use

  • 1. Test/Refine our methods using only the First & Second.
slide-28
SLIDE 28

Evaluation: Data

Four Biomedical Citation Screening Datasets

◮ Have expert gold labels. ◮ Have crowd labels (5 for each item) ... ◮ collected via Amazon Mechanical Turk.

Strategy to use

  • 1. Test/Refine our methods using only the First & Second.
  • 2. Finalize all details (e.g. hyper-parameters).
slide-29
SLIDE 29

Evaluation: Data

Four Biomedical Citation Screening Datasets

◮ Have expert gold labels. ◮ Have crowd labels (5 for each item) ... ◮ collected via Amazon Mechanical Turk.

Strategy to use

  • 1. Test/Refine our methods using only the First & Second.
  • 2. Finalize all details (e.g. hyper-parameters).
  • 3. Test on the Third & Forth.
slide-30
SLIDE 30

Evaluation: Data

Four Biomedical Citation Screening Datasets

◮ Have expert gold labels. ◮ Have crowd labels (5 for each item) ... ◮ collected via Amazon Mechanical Turk.

Strategy to use

  • 1. Test/Refine our methods using only the First & Second.
  • 2. Finalize all details (e.g. hyper-parameters).
  • 3. Test on the Third & Forth.
  • 4. Purpose: See how it performs on real future data.
slide-31
SLIDE 31

Evaluation: Setup

Active Learning Baseline: Uncertainty Sampling (US)

Select item with probability closest to 0.5

slide-32
SLIDE 32

Evaluation: Setup

Active Learning Baseline: Uncertainty Sampling (US)

Select item with probability closest to 0.5

Compare Four Algorithms

◮ US-Crowd: use only crowd labels. ◮ US-Expert: use only experts. ◮ US-Crowd+Expert: Crowd first. Expert if disagree. ◮ Decision Theory: our method.

slide-33
SLIDE 33

Evaluation: Metric

Compare collected labels vs. gold labels

slide-34
SLIDE 34

Evaluation: Metric

Compare collected labels vs. gold labels

Collected labels includes:

◮ Expert labels. ◮ Crowd (Majority Voting) ◮ Classifier predictions (trained on crowd & expert labels)

slide-35
SLIDE 35

Evaluation: Metric

Compare collected labels vs. gold labels

Collected labels includes:

◮ Expert labels. ◮ Crowd (Majority Voting) ◮ Classifier predictions (trained on crowd & expert labels)

We present: Cost-Loss Learning Curve

◮ One Expert Label = 100, One Crowd Label = 1. ◮ Loss = # False Positive + 10 # False Negative.

slide-36
SLIDE 36

Evaluation: Result: First Dataset

slide-37
SLIDE 37

Evaluation: Result: Second Dataset

slide-38
SLIDE 38

Evaluation: Result: Third (real future) Dataset

slide-39
SLIDE 39

Evaluation: Result: Forth (real future) Dataset

slide-40
SLIDE 40

Discussion

Our method

◮ Overall effective. Consistenly good in the beginning. ◮ On ‘real future datasets’: lose slightly at some points.

slide-41
SLIDE 41

Discussion

Our method

◮ Overall effective. Consistenly good in the beginning. ◮ On ‘real future datasets’: lose slightly at some points.

Future work

◮ Better worker model. ◮ Multi-step lookahead. ◮ Quality Assurance/Guarantee.

slide-42
SLIDE 42

Summary

We have presented

◮ High level ideas of our method. ◮ Evaluation and Results

slide-43
SLIDE 43

Summary

We have presented

◮ High level ideas of our method. ◮ Evaluation and Results

We have omitted

◮ Full algorithms. Impplementation details. ◮ Heuristics to make this fast. ◮ Crowd Model. Active Sampling Correction. ◮ More results.

slide-44
SLIDE 44

Summary

We have presented

◮ High level ideas of our method. ◮ Evaluation and Results

We have omitted

◮ Full algorithms. Impplementation details. ◮ Heuristics to make this fast. ◮ Crowd Model. Active Sampling Correction. ◮ More results. ◮ See the paper.

slide-45
SLIDE 45

Summary

We have presented

◮ High level ideas of our method. ◮ Evaluation and Results

We have omitted

◮ Full algorithms. Impplementation details. ◮ Heuristics to make this fast. ◮ Crowd Model. Active Sampling Correction. ◮ More results. ◮ See the paper.

Question?

slide-46
SLIDE 46

References I

Roy, Nicholas and Andrew McCallum (2001). “Toward Optimal Active Learning through Sampling Estimation of Error Reduction”. In: In Proc. 18th International Conf. on Machine Learning.