A Correlated Worker Model for Grouped, Imbalanced and Multitask Data - - PowerPoint PPT Presentation

a correlated worker model for grouped imbalanced and
SMART_READER_LITE
LIVE PREVIEW

A Correlated Worker Model for Grouped, Imbalanced and Multitask Data - - PowerPoint PPT Presentation

A Correlated Worker Model for Grouped, Imbalanced and Multitask Data An T. Nguyen 1 Byron C. Wallace Matthew Lease University of Texas at Austin UAI 2016 1 Presenter 1 Overview A model of workers in crowdsourcing. 2 Overview A model


slide-1
SLIDE 1

A Correlated Worker Model for Grouped, Imbalanced and Multitask Data

An T. Nguyen 1 Byron C. Wallace Matthew Lease

University of Texas at Austin

UAI 2016

1Presenter

1

slide-2
SLIDE 2

Overview

◮ A model of workers in crowdsourcing.

2

slide-3
SLIDE 3

Overview

◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality.

2

slide-4
SLIDE 4

Overview

◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning.

2

slide-5
SLIDE 5

Overview

◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning. ◮ Apply to two datasets:

◮ Biomed Citation Screening: imbalanced, grouped.

2

slide-6
SLIDE 6

Overview

◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning. ◮ Apply to two datasets:

◮ Biomed Citation Screening: imbalanced, grouped. ◮ Galaxy Classification: multiple tasks.

2

slide-7
SLIDE 7

Background

◮ Crowdsourcing: collect labels quickly at low cost.

3

slide-8
SLIDE 8

Background

◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality.

3

slide-9
SLIDE 9

Background

◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them.

3

slide-10
SLIDE 10

Background

◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them. ◮ Most previous work: improve (the estimates of) labels.

3

slide-11
SLIDE 11

Background

◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them. ◮ Most previous work: improve (the estimates of) labels. ◮ Our work: improve (the estimates of) worker qualities.

3

slide-12
SLIDE 12

Motivation

for estimating worker qualities

4

slide-13
SLIDE 13

Motivation

for estimating worker qualities

Diagnostic insights.

4

slide-14
SLIDE 14

Motivation

for estimating worker qualities

Diagnostic insights. Help workers improve.

4

slide-15
SLIDE 15

Motivation

for estimating worker qualities

Diagnostic insights. Help workers improve. Intelligent task routing (assign work to workers).

4

slide-16
SLIDE 16

Worker Quality Measure

Accurary: simple but not enough.

5

slide-17
SLIDE 17

Worker Quality Measure

Accurary: simple but not enough. → Confusion matrix: Pr(worker label|true label)

5

slide-18
SLIDE 18

Worker Quality Measure

Accurary: simple but not enough. → Confusion matrix: Pr(worker label|true label) Binary task (this work):

◮ Sensitivity:

Pr(positive|positive).

◮ Specificity:

Pr(negative|negative).

5

slide-19
SLIDE 19

Setting

Input

◮ Crowd labels for each instance. ◮ No instance-level features (future work).

6

slide-20
SLIDE 20

Setting

Input

◮ Crowd labels for each instance. ◮ No instance-level features (future work).

Output

◮ For each worker: sensitivity and specificity.

6

slide-21
SLIDE 21

Setting

Input

◮ Crowd labels for each instance. ◮ No instance-level features (future work).

Output

◮ For each worker: sensitivity and specificity.

  • Eval. Metric

◮ RMSE on sen. and spe.

6

slide-22
SLIDE 22

Setting

Input

◮ Crowd labels for each instance. ◮ No instance-level features (future work).

Output

◮ For each worker: sensitivity and specificity.

  • Eval. Metric

◮ RMSE on sen. and spe. ◮ gold sen. spe.: gold labels in whole dataset.

6

slide-23
SLIDE 23

Challenges

Sparsity: many workers do only a few instances.

7

slide-24
SLIDE 24

Challenges

Sparsity: many workers do only a few instances. Data is imbalanced:

◮ A lot more negative than positive ◮ Difficult to estimate sensitivity

7

slide-25
SLIDE 25

Idea

Transfer knowledge of worker quality

◮ Between classes. ◮ Within group. ◮ In multiple tasks.

8

slide-26
SLIDE 26

Previous models

(Raykar et. al. 2010; Liu & Wang 2012; Kim & Ghahramani 2012)

Hidden vars:

◮ True label for each instance. ◮ Confusion mat. (sen. + spe.) for each worker.

9

slide-27
SLIDE 27

Previous models

(Raykar et. al. 2010; Liu & Wang 2012; Kim & Ghahramani 2012)

Hidden vars:

◮ True label for each instance. ◮ Confusion mat. (sen. + spe.) for each worker.

Assumptions:

◮ Sen. & Spe. are independent params. ◮ A single group of workers. ◮ Multiple tasks: independent models.

9

slide-28
SLIDE 28

Our Model

Assumptions:

◮ Sen. & Spe. are correlated. ◮ Multiple groups of workers (group membership is known). ◮ Sen. & Spe. in multiple tasks are correlated.

10

slide-29
SLIDE 29

The Base Model

(i indexes instances, j indexes workers)

Uj, Vj ∼ N(µ, C)

11

slide-30
SLIDE 30

The Base Model

(i indexes instances, j indexes workers)

Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ)

11

slide-31
SLIDE 31

The Base Model

(i indexes instances, j indexes workers)

Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ) Lij|Zi = 1 ∼ Ber(S(Uj))

11

slide-32
SLIDE 32

The Base Model

(i indexes instances, j indexes workers)

Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ) Lij|Zi = 1 ∼ Ber(S(Uj)) Lij|Zi = 0 ∼ Ber(S(Vj))

11

slide-33
SLIDE 33

Extensions

  • 1. Worker Groups:

◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).

12

slide-34
SLIDE 34

Extensions

  • 1. Worker Groups:

◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).

  • 2. Multiple tasks:

◮ Assume two tasks.

12

slide-35
SLIDE 35

Extensions

  • 1. Worker Groups:

◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).

  • 2. Multiple tasks:

◮ Assume two tasks. ◮ (Sen1, Spe1) correlates with (Sen2, Spe2).

12

slide-36
SLIDE 36

Extensions

  • 1. Worker Groups:

◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).

  • 2. Multiple tasks:

◮ Assume two tasks. ◮ (Sen1, Spe1) correlates with (Sen2, Spe2). ◮ (U1, V1, U2, V2) ∼ N(µ, C)

12

slide-37
SLIDE 37

Inference

For the Base Model

Approach: Variational EM

◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L).

13

slide-38
SLIDE 38

Inference

For the Base Model

Approach: Variational EM

◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.

13

slide-39
SLIDE 39

Inference

For the Base Model

Approach: Variational EM

◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.

Variational Inference:

◮ Approximate the (complex) posterior Pr(|)... ◮ ... by a simpler function q.

13

slide-40
SLIDE 40

Inference

For the Base Model

Approach: Variational EM

◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.

Variational Inference:

◮ Approximate the (complex) posterior Pr(|)... ◮ ... by a simpler function q. ◮ Minimize KL(q||p) ... ◮ ... equivalent to maximize a log-likelihood lower bound.

13

slide-41
SLIDE 41

Inference

Meanfield Assumptions:

◮ q factorizes:

q(U1..m, V1..m, Z1..n) =

m

  • j=1

q(Uj)q(Vj)

n

  • i=1

q(Zi)

14

slide-42
SLIDE 42

Inference

Meanfield Assumptions:

◮ q factorizes:

q(U1..m, V1..m, Z1..n) =

m

  • j=1

q(Uj)q(Vj)

n

  • i=1

q(Zi)

◮ Factors:

q(Uj) = N(˜ µuj, ˜ σ2

uj)

q(Vj) = N(˜ µvj, ˜ σ2

vj)

q(Zi) = Ber(˜ θi)

14

slide-43
SLIDE 43

Inference

Meanfield Assumptions:

◮ q factorizes:

q(U1..m, V1..m, Z1..n) =

m

  • j=1

q(Uj)q(Vj)

n

  • i=1

q(Zi)

◮ Factors:

q(Uj) = N(˜ µuj, ˜ σ2

uj)

q(Vj) = N(˜ µvj, ˜ σ2

vj)

q(Zi) = Ber(˜ θi)

◮ Optimize with respect to

{˜ µuj, ˜ σ2

uj, ˜

µvj, ˜ σ2

vj|j = 1...m} and {˜

θi|i = 1...n}

14

slide-44
SLIDE 44

Optimization

Coordinate Descent: update one var at a time.

15

slide-45
SLIDE 45

Optimization

Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp

  • log Ber(1|θ) +
  • EUj∼q(Uj) log Ber(Lij|S(Uj))
  • q∗(Zi = 0) ∝ exp
  • log Ber(0|θ) +
  • EVj∼q(Vj) log Ber(Lij|S(Vj))
  • 15
slide-46
SLIDE 46

Optimization

Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp

  • log Ber(1|θ) +
  • EUj∼q(Uj) log Ber(Lij|S(Uj))
  • q∗(Zi = 0) ∝ exp
  • log Ber(0|θ) +
  • EVj∼q(Vj) log Ber(Lij|S(Vj))
  • Intuition:

◮ Zi ≈ Prior + E(Crowd labels for i)

15

slide-47
SLIDE 47

Optimization

Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp

  • log Ber(1|θ) +
  • EUj∼q(Uj) log Ber(Lij|S(Uj))
  • q∗(Zi = 0) ∝ exp
  • log Ber(0|θ) +
  • EVj∼q(Vj) log Ber(Lij|S(Vj))
  • Intuition:

◮ Zi ≈ Prior + E(Crowd labels for i) ◮ E wrt worker quality.

15

slide-48
SLIDE 48

Optimization

Update Uj: q∗(Uj) ∝ exp

  • EVj∼q(Vj) log N(Uj, Vj|µ, C)+
  • q(Zi = 1) log Ber(Lij|S(Uj))
  • 16
slide-49
SLIDE 49

Optimization

Update Uj: q∗(Uj) ∝ exp

  • EVj∼q(Vj) log N(Uj, Vj|µ, C)+
  • q(Zi = 1) log Ber(Lij|S(Uj))
  • Intuition:

◮ Uj = logit sensitivity of worker j.

16

slide-50
SLIDE 50

Optimization

Update Uj: q∗(Uj) ∝ exp

  • EVj∼q(Vj) log N(Uj, Vj|µ, C)+
  • q(Zi = 1) log Ber(Lij|S(Uj))
  • Intuition:

◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ...

16

slide-51
SLIDE 51

Optimization

Update Uj: q∗(Uj) ∝ exp

  • EVj∼q(Vj) log N(Uj, Vj|µ, C)+
  • q(Zi = 1) log Ber(Lij|S(Uj))
  • Intuition:

◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ... ◮ ... instances that worker j has labeled.

16

slide-52
SLIDE 52

Optimization

Update Uj: q∗(Uj) ∝ exp

  • EVj∼q(Vj) log N(Uj, Vj|µ, C)+
  • q(Zi = 1) log Ber(Lij|S(Uj))
  • Intuition:

◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ... ◮ ... instances that worker j has labeled.

(Similar equation for Vj)

16

slide-53
SLIDE 53

Optimization

Problem: E() difficult to compute.

17

slide-54
SLIDE 54

Optimization

Problem: E() difficult to compute. Solution: Laplace Variational Inference (Wang & Blei, 2013)

◮ Approximate these update equations... ◮ ... by Laplace approximation.

17

slide-55
SLIDE 55

Optimization

Problem: E() difficult to compute. Solution: Laplace Variational Inference (Wang & Blei, 2013)

◮ Approximate these update equations... ◮ ... by Laplace approximation. ◮ Details in the paper.

17

slide-56
SLIDE 56

Learning

E-step: Infer posterior distribution over hidden vars.

18

slide-57
SLIDE 57

Learning

E-step: Infer posterior distribution over hidden vars. M-step: maximize µ, C, θ under posterior.

◮ µ, C: sample mean and Covariance. ◮ θ: average of {˜

θi|i = 1...n}.

18

slide-58
SLIDE 58

Evaluation

Citizen Science:

19

slide-59
SLIDE 59

Evaluation

Citizen Science:

◮ Workers volunteer ... ◮ ... to help science. ◮ Different from traditional crowdsourcing:

◮ large scale. ◮ (usually) higher quality.

19

slide-60
SLIDE 60

Evaluation

Citizen Science:

◮ Workers volunteer ... ◮ ... to help science. ◮ Different from traditional crowdsourcing:

◮ large scale. ◮ (usually) higher quality.

Two real world scenarios:

◮ Biomedical Citation Screening. ◮ Galaxy Morphological Classification.

19

slide-61
SLIDE 61

Scenario 1

Biomedical Citation Screening:

◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.

20

slide-62
SLIDE 62

Scenario 1

Biomedical Citation Screening:

◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.

The RCT dataset:

◮ Identify Randomized Control Trials reports.

20

slide-63
SLIDE 63

Scenario 1

Biomedical Citation Screening:

◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.

The RCT dataset:

◮ Identify Randomized Control Trials reports. ◮ Very imbalanced (3% positive).

20

slide-64
SLIDE 64

Scenario 1

Biomedical Citation Screening:

◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.

The RCT dataset:

◮ Identify Randomized Control Trials reports. ◮ Very imbalanced (3% positive). ◮ Workers: from in 2 groups... ◮ ... experts and novices

20

slide-65
SLIDE 65

Scenario 1

Baselines:

◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).

21

slide-66
SLIDE 66

Scenario 1

Baselines:

◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).

Our method: two versions

◮ Full-Cov: the full model.

21

slide-67
SLIDE 67

Scenario 1

Baselines:

◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).

Our method: two versions

◮ Full-Cov: the full model. ◮ Diag-Cov: constrain C to be diagonal.

21

slide-68
SLIDE 68

Scenario 1

Baselines:

◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).

Our method: two versions

◮ Full-Cov: the full model. ◮ Diag-Cov: constrain C to be diagonal.

◮ only model worker groups ... ◮ ... but not model sen-spec correlation.

21

slide-69
SLIDE 69

Results: Sensitivity

22

slide-70
SLIDE 70

Results: Specificity

23

slide-71
SLIDE 71

Discussion

Our method has two parts: group and correlation.

24

slide-72
SLIDE 72

Discussion

Our method has two parts: group and correlation.

◮ Group provides most improvement.

24

slide-73
SLIDE 73

Discussion

Our method has two parts: group and correlation.

◮ Group provides most improvement. ◮ Correlation gives additional boost for sen.

24

slide-74
SLIDE 74

Scenario 2

Galaxy Morphological Classification:

◮ Motivation: Few astronomers, lot of galaxies.

25

slide-75
SLIDE 75

Scenario 2

Galaxy Morphological Classification:

◮ Motivation: Few astronomers, lot of galaxies.

Galaxy Zoo 2 dataset:

◮ Multiple questions: galaxy shape? number of spiral arms?... ◮ Have volunteers answering questions.

25

slide-76
SLIDE 76

Scenario 2

Setting:

◮ Given all labels in source task ... ◮ ... and some labels in target task. ◮ Predict worker sen. and spe. in target task.

26

slide-77
SLIDE 77

Scenario 2

Setting:

◮ Given all labels in source task ... ◮ ... and some labels in target task. ◮ Predict worker sen. and spe. in target task.

Compare:

◮ Single: only consider target labels. ◮ Accum: merge source labels to target. ◮ Multi: our multi-task model.

26

slide-78
SLIDE 78

Result: Sensitivity

27

slide-79
SLIDE 79

Result: Specificity

28

slide-80
SLIDE 80

Discussion

Multi is surprisingly bad.

◮ Tasks are different, naive merge is bad.

29

slide-81
SLIDE 81

Discussion

Multi is surprisingly bad.

◮ Tasks are different, naive merge is bad.

Our method

◮ has good improvement ... ◮ ... although sometimes modest.

29

slide-82
SLIDE 82

Discussion

Multi is surprisingly bad.

◮ Tasks are different, naive merge is bad.

Our method

◮ has good improvement ... ◮ ... although sometimes modest. ◮ Again, tasks are different... ◮ Many workers better in source task ... ◮ ... but worse in target task.

29

slide-83
SLIDE 83

Discussion

Multi is surprisingly bad.

◮ Tasks are different, naive merge is bad.

Our method

◮ has good improvement ... ◮ ... although sometimes modest. ◮ Again, tasks are different... ◮ Many workers better in source task ... ◮ ... but worse in target task. ◮ Our method still as good as the baseline.

29

slide-84
SLIDE 84

Conclusion

Summary

◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.

30

slide-85
SLIDE 85

Conclusion

Summary

◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.

Future work

◮ Extend: instance-level features. ◮ Application: tasks/instances routing.

30

slide-86
SLIDE 86

Conclusion

Summary

◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.

Future work

◮ Extend: instance-level features. ◮ Application: tasks/instances routing.

Question?

30