SLIDE 1 A Correlated Worker Model for Grouped, Imbalanced and Multitask Data
An T. Nguyen 1 Byron C. Wallace Matthew Lease
University of Texas at Austin
UAI 2016
1Presenter
1
SLIDE 2
Overview
◮ A model of workers in crowdsourcing.
2
SLIDE 3
Overview
◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality.
2
SLIDE 4
Overview
◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning.
2
SLIDE 5 Overview
◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning. ◮ Apply to two datasets:
◮ Biomed Citation Screening: imbalanced, grouped.
2
SLIDE 6 Overview
◮ A model of workers in crowdsourcing. ◮ Idea: Transfer knowledge of worker quality. ◮ Variational EM learning. ◮ Apply to two datasets:
◮ Biomed Citation Screening: imbalanced, grouped. ◮ Galaxy Classification: multiple tasks.
2
SLIDE 7
Background
◮ Crowdsourcing: collect labels quickly at low cost.
3
SLIDE 8
Background
◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality.
3
SLIDE 9
Background
◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them.
3
SLIDE 10
Background
◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them. ◮ Most previous work: improve (the estimates of) labels.
3
SLIDE 11
Background
◮ Crowdsourcing: collect labels quickly at low cost. ◮ But (usually) lower quality. ◮ Common solution: collect 5 labels for each instance ... ◮ ... then aggregate them. ◮ Most previous work: improve (the estimates of) labels. ◮ Our work: improve (the estimates of) worker qualities.
3
SLIDE 12
Motivation
for estimating worker qualities
4
SLIDE 13
Motivation
for estimating worker qualities
Diagnostic insights.
4
SLIDE 14
Motivation
for estimating worker qualities
Diagnostic insights. Help workers improve.
4
SLIDE 15
Motivation
for estimating worker qualities
Diagnostic insights. Help workers improve. Intelligent task routing (assign work to workers).
4
SLIDE 16
Worker Quality Measure
Accurary: simple but not enough.
5
SLIDE 17
Worker Quality Measure
Accurary: simple but not enough. → Confusion matrix: Pr(worker label|true label)
5
SLIDE 18
Worker Quality Measure
Accurary: simple but not enough. → Confusion matrix: Pr(worker label|true label) Binary task (this work):
◮ Sensitivity:
Pr(positive|positive).
◮ Specificity:
Pr(negative|negative).
5
SLIDE 19
Setting
Input
◮ Crowd labels for each instance. ◮ No instance-level features (future work).
6
SLIDE 20
Setting
Input
◮ Crowd labels for each instance. ◮ No instance-level features (future work).
Output
◮ For each worker: sensitivity and specificity.
6
SLIDE 21 Setting
Input
◮ Crowd labels for each instance. ◮ No instance-level features (future work).
Output
◮ For each worker: sensitivity and specificity.
◮ RMSE on sen. and spe.
6
SLIDE 22 Setting
Input
◮ Crowd labels for each instance. ◮ No instance-level features (future work).
Output
◮ For each worker: sensitivity and specificity.
◮ RMSE on sen. and spe. ◮ gold sen. spe.: gold labels in whole dataset.
6
SLIDE 23
Challenges
Sparsity: many workers do only a few instances.
7
SLIDE 24
Challenges
Sparsity: many workers do only a few instances. Data is imbalanced:
◮ A lot more negative than positive ◮ Difficult to estimate sensitivity
7
SLIDE 25
Idea
Transfer knowledge of worker quality
◮ Between classes. ◮ Within group. ◮ In multiple tasks.
8
SLIDE 26
Previous models
(Raykar et. al. 2010; Liu & Wang 2012; Kim & Ghahramani 2012)
Hidden vars:
◮ True label for each instance. ◮ Confusion mat. (sen. + spe.) for each worker.
9
SLIDE 27
Previous models
(Raykar et. al. 2010; Liu & Wang 2012; Kim & Ghahramani 2012)
Hidden vars:
◮ True label for each instance. ◮ Confusion mat. (sen. + spe.) for each worker.
Assumptions:
◮ Sen. & Spe. are independent params. ◮ A single group of workers. ◮ Multiple tasks: independent models.
9
SLIDE 28
Our Model
Assumptions:
◮ Sen. & Spe. are correlated. ◮ Multiple groups of workers (group membership is known). ◮ Sen. & Spe. in multiple tasks are correlated.
10
SLIDE 29
The Base Model
(i indexes instances, j indexes workers)
Uj, Vj ∼ N(µ, C)
11
SLIDE 30
The Base Model
(i indexes instances, j indexes workers)
Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ)
11
SLIDE 31
The Base Model
(i indexes instances, j indexes workers)
Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ) Lij|Zi = 1 ∼ Ber(S(Uj))
11
SLIDE 32
The Base Model
(i indexes instances, j indexes workers)
Uj, Vj ∼ N(µ, C) Zi ∼ Ber(θ) Lij|Zi = 1 ∼ Ber(S(Uj)) Lij|Zi = 0 ∼ Ber(S(Vj))
11
SLIDE 33 Extensions
◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).
12
SLIDE 34 Extensions
◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).
◮ Assume two tasks.
12
SLIDE 35 Extensions
◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).
◮ Assume two tasks. ◮ (Sen1, Spe1) correlates with (Sen2, Spe2).
12
SLIDE 36 Extensions
◮ Know group membership. ◮ Model each group k = a Normal dist (µk, Ck).
◮ Assume two tasks. ◮ (Sen1, Spe1) correlates with (Sen2, Spe2). ◮ (U1, V1, U2, V2) ∼ N(µ, C)
12
SLIDE 37
Inference
For the Base Model
Approach: Variational EM
◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L).
13
SLIDE 38
Inference
For the Base Model
Approach: Variational EM
◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.
13
SLIDE 39
Inference
For the Base Model
Approach: Variational EM
◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.
Variational Inference:
◮ Approximate the (complex) posterior Pr(|)... ◮ ... by a simpler function q.
13
SLIDE 40
Inference
For the Base Model
Approach: Variational EM
◮ E-step: infer Pr(U1..m, V1..m, Z1..n|L). ◮ M-step: maximize parameters µ, C, θ.
Variational Inference:
◮ Approximate the (complex) posterior Pr(|)... ◮ ... by a simpler function q. ◮ Minimize KL(q||p) ... ◮ ... equivalent to maximize a log-likelihood lower bound.
13
SLIDE 41 Inference
Meanfield Assumptions:
◮ q factorizes:
q(U1..m, V1..m, Z1..n) =
m
q(Uj)q(Vj)
n
q(Zi)
14
SLIDE 42 Inference
Meanfield Assumptions:
◮ q factorizes:
q(U1..m, V1..m, Z1..n) =
m
q(Uj)q(Vj)
n
q(Zi)
◮ Factors:
q(Uj) = N(˜ µuj, ˜ σ2
uj)
q(Vj) = N(˜ µvj, ˜ σ2
vj)
q(Zi) = Ber(˜ θi)
14
SLIDE 43 Inference
Meanfield Assumptions:
◮ q factorizes:
q(U1..m, V1..m, Z1..n) =
m
q(Uj)q(Vj)
n
q(Zi)
◮ Factors:
q(Uj) = N(˜ µuj, ˜ σ2
uj)
q(Vj) = N(˜ µvj, ˜ σ2
vj)
q(Zi) = Ber(˜ θi)
◮ Optimize with respect to
{˜ µuj, ˜ σ2
uj, ˜
µvj, ˜ σ2
vj|j = 1...m} and {˜
θi|i = 1...n}
14
SLIDE 44
Optimization
Coordinate Descent: update one var at a time.
15
SLIDE 45 Optimization
Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp
- log Ber(1|θ) +
- EUj∼q(Uj) log Ber(Lij|S(Uj))
- q∗(Zi = 0) ∝ exp
- log Ber(0|θ) +
- EVj∼q(Vj) log Ber(Lij|S(Vj))
- 15
SLIDE 46 Optimization
Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp
- log Ber(1|θ) +
- EUj∼q(Uj) log Ber(Lij|S(Uj))
- q∗(Zi = 0) ∝ exp
- log Ber(0|θ) +
- EVj∼q(Vj) log Ber(Lij|S(Vj))
- Intuition:
◮ Zi ≈ Prior + E(Crowd labels for i)
15
SLIDE 47 Optimization
Coordinate Descent: update one var at a time. Update Zi: q∗(Zi = 1) ∝ exp
- log Ber(1|θ) +
- EUj∼q(Uj) log Ber(Lij|S(Uj))
- q∗(Zi = 0) ∝ exp
- log Ber(0|θ) +
- EVj∼q(Vj) log Ber(Lij|S(Vj))
- Intuition:
◮ Zi ≈ Prior + E(Crowd labels for i) ◮ E wrt worker quality.
15
SLIDE 48 Optimization
Update Uj: q∗(Uj) ∝ exp
- EVj∼q(Vj) log N(Uj, Vj|µ, C)+
- q(Zi = 1) log Ber(Lij|S(Uj))
- 16
SLIDE 49 Optimization
Update Uj: q∗(Uj) ∝ exp
- EVj∼q(Vj) log N(Uj, Vj|µ, C)+
- q(Zi = 1) log Ber(Lij|S(Uj))
- Intuition:
◮ Uj = logit sensitivity of worker j.
16
SLIDE 50 Optimization
Update Uj: q∗(Uj) ∝ exp
- EVj∼q(Vj) log N(Uj, Vj|µ, C)+
- q(Zi = 1) log Ber(Lij|S(Uj))
- Intuition:
◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ...
16
SLIDE 51 Optimization
Update Uj: q∗(Uj) ∝ exp
- EVj∼q(Vj) log N(Uj, Vj|µ, C)+
- q(Zi = 1) log Ber(Lij|S(Uj))
- Intuition:
◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ... ◮ ... instances that worker j has labeled.
16
SLIDE 52 Optimization
Update Uj: q∗(Uj) ∝ exp
- EVj∼q(Vj) log N(Uj, Vj|µ, C)+
- q(Zi = 1) log Ber(Lij|S(Uj))
- Intuition:
◮ Uj = logit sensitivity of worker j. ◮ Uj ≈ E(correlation with specificity) + ... ◮ ... instances that worker j has labeled.
(Similar equation for Vj)
16
SLIDE 53
Optimization
Problem: E() difficult to compute.
17
SLIDE 54
Optimization
Problem: E() difficult to compute. Solution: Laplace Variational Inference (Wang & Blei, 2013)
◮ Approximate these update equations... ◮ ... by Laplace approximation.
17
SLIDE 55
Optimization
Problem: E() difficult to compute. Solution: Laplace Variational Inference (Wang & Blei, 2013)
◮ Approximate these update equations... ◮ ... by Laplace approximation. ◮ Details in the paper.
17
SLIDE 56
Learning
E-step: Infer posterior distribution over hidden vars.
18
SLIDE 57
Learning
E-step: Infer posterior distribution over hidden vars. M-step: maximize µ, C, θ under posterior.
◮ µ, C: sample mean and Covariance. ◮ θ: average of {˜
θi|i = 1...n}.
18
SLIDE 58
Evaluation
Citizen Science:
19
SLIDE 59 Evaluation
Citizen Science:
◮ Workers volunteer ... ◮ ... to help science. ◮ Different from traditional crowdsourcing:
◮ large scale. ◮ (usually) higher quality.
19
SLIDE 60 Evaluation
Citizen Science:
◮ Workers volunteer ... ◮ ... to help science. ◮ Different from traditional crowdsourcing:
◮ large scale. ◮ (usually) higher quality.
Two real world scenarios:
◮ Biomedical Citation Screening. ◮ Galaxy Morphological Classification.
19
SLIDE 61
Scenario 1
Biomedical Citation Screening:
◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.
20
SLIDE 62
Scenario 1
Biomedical Citation Screening:
◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.
The RCT dataset:
◮ Identify Randomized Control Trials reports.
20
SLIDE 63
Scenario 1
Biomedical Citation Screening:
◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.
The RCT dataset:
◮ Identify Randomized Control Trials reports. ◮ Very imbalanced (3% positive).
20
SLIDE 64
Scenario 1
Biomedical Citation Screening:
◮ Motivation: biomedical literature is huge. ◮ Need to find relevant citations.
The RCT dataset:
◮ Identify Randomized Control Trials reports. ◮ Very imbalanced (3% positive). ◮ Workers: from in 2 groups... ◮ ... experts and novices
20
SLIDE 65
Scenario 1
Baselines:
◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).
21
SLIDE 66
Scenario 1
Baselines:
◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).
Our method: two versions
◮ Full-Cov: the full model.
21
SLIDE 67
Scenario 1
Baselines:
◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).
Our method: two versions
◮ Full-Cov: the full model. ◮ Diag-Cov: constrain C to be diagonal.
21
SLIDE 68 Scenario 1
Baselines:
◮ Majority Vote. ◮ Two Coin (Raykar et. al. 2010).
Our method: two versions
◮ Full-Cov: the full model. ◮ Diag-Cov: constrain C to be diagonal.
◮ only model worker groups ... ◮ ... but not model sen-spec correlation.
21
SLIDE 69
Results: Sensitivity
22
SLIDE 70
Results: Specificity
23
SLIDE 71
Discussion
Our method has two parts: group and correlation.
24
SLIDE 72
Discussion
Our method has two parts: group and correlation.
◮ Group provides most improvement.
24
SLIDE 73
Discussion
Our method has two parts: group and correlation.
◮ Group provides most improvement. ◮ Correlation gives additional boost for sen.
24
SLIDE 74
Scenario 2
Galaxy Morphological Classification:
◮ Motivation: Few astronomers, lot of galaxies.
25
SLIDE 75
Scenario 2
Galaxy Morphological Classification:
◮ Motivation: Few astronomers, lot of galaxies.
Galaxy Zoo 2 dataset:
◮ Multiple questions: galaxy shape? number of spiral arms?... ◮ Have volunteers answering questions.
25
SLIDE 76
Scenario 2
Setting:
◮ Given all labels in source task ... ◮ ... and some labels in target task. ◮ Predict worker sen. and spe. in target task.
26
SLIDE 77
Scenario 2
Setting:
◮ Given all labels in source task ... ◮ ... and some labels in target task. ◮ Predict worker sen. and spe. in target task.
Compare:
◮ Single: only consider target labels. ◮ Accum: merge source labels to target. ◮ Multi: our multi-task model.
26
SLIDE 78
Result: Sensitivity
27
SLIDE 79
Result: Specificity
28
SLIDE 80
Discussion
Multi is surprisingly bad.
◮ Tasks are different, naive merge is bad.
29
SLIDE 81
Discussion
Multi is surprisingly bad.
◮ Tasks are different, naive merge is bad.
Our method
◮ has good improvement ... ◮ ... although sometimes modest.
29
SLIDE 82
Discussion
Multi is surprisingly bad.
◮ Tasks are different, naive merge is bad.
Our method
◮ has good improvement ... ◮ ... although sometimes modest. ◮ Again, tasks are different... ◮ Many workers better in source task ... ◮ ... but worse in target task.
29
SLIDE 83
Discussion
Multi is surprisingly bad.
◮ Tasks are different, naive merge is bad.
Our method
◮ has good improvement ... ◮ ... although sometimes modest. ◮ Again, tasks are different... ◮ Many workers better in source task ... ◮ ... but worse in target task. ◮ Our method still as good as the baseline.
29
SLIDE 84
Conclusion
Summary
◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.
30
SLIDE 85
Conclusion
Summary
◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.
Future work
◮ Extend: instance-level features. ◮ Application: tasks/instances routing.
30
SLIDE 86
Conclusion
Summary
◮ Model correlation to transfer knowledge. ◮ Empirically improve estimates of worker quality.
Future work
◮ Extend: instance-level features. ◮ Application: tasks/instances routing.
Question?
30