Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC - - PowerPoint PPT Presentation

bayesian bias mitigation for crowdsourcing
SMART_READER_LITE
LIVE PREVIEW

Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC - - PowerPoint PPT Presentation

University of California, Berkeley Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC Berkeley with Michael I. Jordan 9th of May, 2012 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 1 The Problem of Bias in


slide-1
SLIDE 1

University of California, Berkeley

Bayesian Bias Mitigation for Crowdsourcing

Fabian L. Wauthier, UC Berkeley with Michael I. Jordan 9th of May, 2012

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 1

slide-2
SLIDE 2

The Problem of Bias in Crowdsourcing

◮ Crowdsourcing: collect data from crowd and learn a model.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 2

slide-3
SLIDE 3

The Problem of Bias in Crowdsourcing

◮ Crowdsourcing: collect data from crowd and learn a model. ◮ E.g. Amazon Mechanical Turk

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 2

slide-4
SLIDE 4

The Problem of Bias in Crowdsourcing

◮ Crowdsourcing: collect data from crowd and learn a model. ◮ E.g. Amazon Mechanical Turk ◮ Labelers may be malicious/unhelpful or tasks ambiguous/hard.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 2

slide-5
SLIDE 5

The Problem of Bias in Crowdsourcing

◮ Crowdsourcing: collect data from crowd and learn a model. ◮ E.g. Amazon Mechanical Turk ◮ Labelers may be malicious/unhelpful or tasks ambiguous/hard. ◮ ⇒ Crowdsourced data is biased.

  • Problem is systemic. No easy fixes.
  • Effects on learned models can be significant.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 2

slide-6
SLIDE 6

The Problem of Bias in Crowdsourcing

◮ Crowdsourcing: collect data from crowd and learn a model. ◮ E.g. Amazon Mechanical Turk ◮ Labelers may be malicious/unhelpful or tasks ambiguous/hard. ◮ ⇒ Crowdsourced data is biased.

  • Problem is systemic. No easy fixes.
  • Effects on learned models can be significant.

◮ Problem: Can we still learn from partially biased data?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 2

slide-7
SLIDE 7

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.”

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-8
SLIDE 8

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.” ◮ Human label data would be ambiguous:

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-9
SLIDE 9

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.” ◮ Human label data would be ambiguous:

  • Is the guitar brown or yellow?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-10
SLIDE 10

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.” ◮ Human label data would be ambiguous:

  • Is the guitar brown or yellow?
  • Is it behind or next to the couch?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-11
SLIDE 11

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.” ◮ Human label data would be ambiguous:

  • Is the guitar brown or yellow?
  • Is it behind or next to the couch?

◮ There can be structural differences between labellers.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-12
SLIDE 12

Example: Scene Understanding

◮ “Robot: Get me the brown guitar behind the couch.” ◮ Human label data would be ambiguous:

  • Is the guitar brown or yellow?
  • Is it behind or next to the couch?

◮ There can be structural differences between labellers. ◮ How to learn from this data?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 3

slide-13
SLIDE 13

Current Methodologies

◮ Bias addressed in three stages of a pipeline:

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 4

slide-14
SLIDE 14

Current Methodologies

◮ Bias addressed in three stages of a pipeline:

  • 1. Data collection: Active learning.
  • 2. Data curation: Screening/weighting of data.
  • 3. Learning: Noisy observation model.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 4

slide-15
SLIDE 15

Current Methodologies

◮ Bias addressed in three stages of a pipeline:

  • 1. Data collection: Active learning.
  • 2. Data curation: Screening/weighting of data.
  • 3. Learning: Noisy observation model.

◮ Common Assumptions:

  • There exists a single truth.
  • Can model bias effects as noise.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 4

slide-16
SLIDE 16

Current Methodologies

◮ Bias addressed in three stages of a pipeline:

  • 1. Data collection: Active learning.
  • 2. Data curation: Screening/weighting of data.
  • 3. Learning: Noisy observation model.

◮ Common Assumptions:

  • There exists a single truth.
  • Can model bias effects as noise.

◮ Inappropriate when tasks are subjective or particularly hard.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 4

slide-17
SLIDE 17

Overview

Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 5

slide-18
SLIDE 18

Contribution I: Bayesian Preference Model

Overview

Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 6

slide-19
SLIDE 19

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-20
SLIDE 20

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

  • Model the sources of bias, not just effects.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-21
SLIDE 21

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

  • Model the sources of bias, not just effects.
  • Labelers express accumulated, shared preferences.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-22
SLIDE 22

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

  • Model the sources of bias, not just effects.
  • Labelers express accumulated, shared preferences.

◮ Benefits:

  • Allows multiple inconsistent labellings to coexist.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-23
SLIDE 23

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

  • Model the sources of bias, not just effects.
  • Labelers express accumulated, shared preferences.

◮ Benefits:

  • Allows multiple inconsistent labellings to coexist.
  • Active learning can be coherently integrated.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-24
SLIDE 24

Contribution I: Bayesian Preference Model

Contribution I: Bayesian Preference Model

◮ Unify pipeline steps in a Bayesian model.

  • Model the sources of bias, not just effects.
  • Labelers express accumulated, shared preferences.

◮ Benefits:

  • Allows multiple inconsistent labellings to coexist.
  • Active learning can be coherently integrated.
  • Bayesian inference combines data curation and learning.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 7

slide-25
SLIDE 25

Contribution I: Bayesian Preference Model

Input Data

◮ Tasks i, labelers l. ◮ Example task: “Is the guitar behind or next to the couch?”

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 8

slide-26
SLIDE 26

Contribution I: Bayesian Preference Model

Input Data

◮ Tasks i, labelers l. ◮ Example task: “Is the guitar behind or next to the couch?” ◮ Task covariates xi ∈ Rd, i = 1, . . . , n in X. ◮ Labels are yi,l ∈ {−1, 0, +1}, i = 1, . . . , n; l = 1, . . . , m in Y .

Y X

m n

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 8

slide-27
SLIDE 27

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-28
SLIDE 28

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-29
SLIDE 29

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K. ◮ m × K binary matrix Z models parameter sharing.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-30
SLIDE 30

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K. ◮ m × K binary matrix Z models parameter sharing. ◮ If zl,b = 1, labeler l expresses preference b.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-31
SLIDE 31

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K. ◮ m × K binary matrix Z models parameter sharing. ◮ If zl,b = 1, labeler l expresses preference b. ◮ Parameter βl accumulates preferences:

βl =

  • b

zl,bγb

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-32
SLIDE 32

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K. ◮ m × K binary matrix Z models parameter sharing. ◮ If zl,b = 1, labeler l expresses preference b. ◮ Parameter βl accumulates preferences:

βl =

  • b

zl,bγb

◮ Likelihood:

p(Y |X, Z, γ) =

  • l
  • i:yi,l=0

p(yi,l|β⊤

l xi)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-33
SLIDE 33

Contribution I: Bayesian Preference Model

Bayesian Preference Model

Labelers express accumulated, shared preferences.

◮ Parameter γb models effect of preference b = 1, . . . , K. ◮ m × K binary matrix Z models parameter sharing. ◮ If zl,b = 1, labeler l expresses preference b. ◮ Parameter βl accumulates preferences:

βl =

  • b

zl,bγb

◮ Likelihood:

p(Y |X, Z, γ) =

  • l
  • i:yi,l=0

p(yi,l|β⊤

l xi)

◮ Similar preferences ⇒ similar labelling.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

slide-34
SLIDE 34

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-35
SLIDE 35

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-36
SLIDE 36

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) (2)

Z

m

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-37
SLIDE 37

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2)

Z

m

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-38
SLIDE 38

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-39
SLIDE 39

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-40
SLIDE 40

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-41
SLIDE 41

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-42
SLIDE 42

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-43
SLIDE 43

Contribution I: Bayesian Preference Model

Priors

◮ Prior on γb: p(γb) = N(0, σ2I) for each b. ◮ Prior on Z: fix Z to be m × K.

πb|α ∼ Beta α K , 1

  • , b = 1, . . . , K

(1) zl,b|πb ∼ Bern (πb) , l = 1, . . . , m (2) K

m

Z

◮ As K → ∞, distribution over Z converges to the Indian

Buffet Process (IBP).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

slide-44
SLIDE 44

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-45
SLIDE 45

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-46
SLIDE 46

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

◮ Want to predict labeller l’s labels.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-47
SLIDE 47

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

◮ Want to predict labeller l’s labels. ◮ Labeller l could be in the crowd, or the gold standard.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-48
SLIDE 48

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

◮ Want to predict labeller l’s labels. ◮ Labeller l could be in the crowd, or the gold standard. ◮ Required inference: p(βl|X, Y ), or equivalently

p(zl,b, γb, b = 1, . . . , K|X, Y ).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-49
SLIDE 49

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

◮ Want to predict labeller l’s labels. ◮ Labeller l could be in the crowd, or the gold standard. ◮ Required inference: p(βl|X, Y ), or equivalently

p(zl,b, γb, b = 1, . . . , K|X, Y ).

◮ Model is complex. Exact inference intractable.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-50
SLIDE 50

Contribution I: Bayesian Preference Model

Complete model

p(Y , Z, γ|X) = p(Y |X, Z, γ)p(γ|Z)p(Z)

◮ Recall bias: different labellers can have different β’s

Example: Disagreement if guitar is behind/next to the couch.

◮ Want to predict labeller l’s labels. ◮ Labeller l could be in the crowd, or the gold standard. ◮ Required inference: p(βl|X, Y ), or equivalently

p(zl,b, γb, b = 1, . . . , K|X, Y ).

◮ Model is complex. Exact inference intractable. ◮ Possible alternatives: Gibbs sampling, variational inference,

slice sampling, etc.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

slide-51
SLIDE 51

BBMC Results

Overview

Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 12

slide-52
SLIDE 52

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-53
SLIDE 53

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-54
SLIDE 54

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2). ◮ γb Gaussian b = 1, 2. βl =

b zl,bγb.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-55
SLIDE 55

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2). ◮ γb Gaussian b = 1, 2. βl =

b zl,bγb.

◮ Observation probability ǫ = 0.1.

yi,l =    w.p. (1 − ǫ) +1 w.p. ǫΦ(x⊤

i βl)

−1

  • .w.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-56
SLIDE 56

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2). ◮ γb Gaussian b = 1, 2. βl =

b zl,bγb.

◮ Observation probability ǫ = 0.1.

yi,l =    w.p. (1 − ǫ) +1 w.p. ǫΦ(x⊤

i βl)

−1

  • .w.

◮ Inference: want to recover β1 (say).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-57
SLIDE 57

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2). ◮ γb Gaussian b = 1, 2. βl =

b zl,bγb.

◮ Observation probability ǫ = 0.1.

yi,l =    w.p. (1 − ǫ) +1 w.p. ǫΦ(x⊤

i βl)

−1

  • .w.

◮ Inference: want to recover β1 (say). ◮ Requires p(z1,b, γb, b = 1 . . . , K|, X, Y ).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-58
SLIDE 58

BBMC Results

Results: Synthetic Data

◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix (m = 30, K = 2). ◮ γb Gaussian b = 1, 2. βl =

b zl,bγb.

◮ Observation probability ǫ = 0.1.

yi,l =    w.p. (1 − ǫ) +1 w.p. ǫΦ(x⊤

i βl)

−1

  • .w.

◮ Inference: want to recover β1 (say). ◮ Requires p(z1,b, γb, b = 1 . . . , K|, X, Y ). ◮ For inference set K = 10.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

slide-59
SLIDE 59

BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

slide-60
SLIDE 60

BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps. ◮ Gibbs sequence for γ1,1.

1000 1200 1400 1600 1800 2000 0.55 0.6 0.65 0.7 0.75

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

slide-61
SLIDE 61

BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps. ◮ Gibbs sequence for γ1,1.

1000 1200 1400 1600 1800 2000 0.55 0.6 0.65 0.7 0.75

◮ True and posterior mean of β1 after 1000 iterations burnin.

β1 =     0.6915 0.0754 −0.6815 0.6988     ˆ β1 =     0.6514 0.0535 −0.6473 0.6957     (3)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

slide-62
SLIDE 62

BBMC Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

slide-63
SLIDE 63

BBMC Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per

task, 76 labellers.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

slide-64
SLIDE 64

BBMC Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per

task, 76 labellers.

◮ Want to predict gold standard: compare centroid positions.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

slide-65
SLIDE 65

BBMC Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per

task, 76 labellers.

◮ Want to predict gold standard: compare centroid positions. ◮ All 26 labellers with over 20 labels have error above 0.16.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

slide-66
SLIDE 66

BBMC Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per

task, 76 labellers.

◮ Want to predict gold standard: compare centroid positions. ◮ All 26 labellers with over 20 labels have error above 0.16. ◮ Researcher also labels, and gives 60 gold standard labels.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

slide-67
SLIDE 67

BBMC Results ◮ Averaged log likelihood and error rate on test set. ◮ Our model: BBMC. Algorithm Final Loglik Final Error No Active Learning GOLD −3716 ± 1695 0.0547 ± 0.0102 CONS −421.1 ± 2.6 0.0935 ± 0.0031 BBMC − 219.1 ± 3.1 0.0309 ± 0.0033

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 16

slide-68
SLIDE 68

Contribution II: Approximate Active Learning

Overview

Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 17

slide-69
SLIDE 69

Contribution II: Approximate Active Learning

Active Learning

◮ Want to predict labeller l’s labels. Need βl.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

slide-70
SLIDE 70

Contribution II: Approximate Active Learning

Active Learning

◮ Want to predict labeller l’s labels. Need βl. ◮ Not all labellers are useful to infer βl.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

slide-71
SLIDE 71

Contribution II: Approximate Active Learning

Active Learning

◮ Want to predict labeller l’s labels. Need βl. ◮ Not all labellers are useful to infer βl. ◮ If l and l′ share parameters ⇒ can learn about βl from l′.

βl =

  • b

zl,bγb βl′ =

  • b

zl′,bγb (4)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

slide-72
SLIDE 72

Contribution II: Approximate Active Learning

Active Learning

◮ Want to predict labeller l’s labels. Need βl. ◮ Not all labellers are useful to infer βl. ◮ If l and l′ share parameters ⇒ can learn about βl from l′.

βl =

  • b

zl,bγb βl′ =

  • b

zl′,bγb (4)

◮ Active learning: repeatedly select training data that helps

learning βl.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

slide-73
SLIDE 73

Contribution II: Approximate Active Learning

Active Learning

◮ Want to predict labeller l’s labels. Need βl. ◮ Not all labellers are useful to infer βl. ◮ If l and l′ share parameters ⇒ can learn about βl from l′.

βl =

  • b

zl,bγb βl′ =

  • b

zl′,bγb (4)

◮ Active learning: repeatedly select training data that helps

learning βl.

◮ Goal: cheaper training data, faster learning.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

slide-74
SLIDE 74

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y .

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-75
SLIDE 75

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing,

19

slide-76
SLIDE 76

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • ◮ Examples: U(·) = −Entropy(·). Uµ(·) = |

|Mean(·) − µ| |2

2

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-77
SLIDE 77

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • ◮ Examples: U(·) = −Entropy(·). Uµ(·) = |

|Mean(·) − µ| |2

2

◮ For each (i′, l′) score, need posterior p(β|yi′,l′, X, Y ).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-78
SLIDE 78

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • ◮ Examples: U(·) = −Entropy(·). Uµ(·) = |

|Mean(·) − µ| |2

2

◮ For each (i′, l′) score, need posterior p(β|yi′,l′, X, Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score (i′, l′).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-79
SLIDE 79

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • ◮ Examples: U(·) = −Entropy(·). Uµ(·) = |

|Mean(·) − µ| |2

2

◮ For each (i′, l′) score, need posterior p(β|yi′,l′, X, Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score (i′, l′). ◮ We are already running one Gibbs sampler for basic inference.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-80
SLIDE 80

Contribution II: Approximate Active Learning

Approximate inference and Active Learning

◮ Suppose we start with training data Y . ◮ Query task-labeler pair (i, l) to maximize expected utility of

adding it (i, l) = argmax(i′,l′)Eyi′,l′

  • U
  • p(β|yi′,l′, X, Y )
  • ◮ Examples: U(·) = −Entropy(·). Uµ(·) = |

|Mean(·) − µ| |2

2

◮ For each (i′, l′) score, need posterior p(β|yi′,l′, X, Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score (i′, l′). ◮ We are already running one Gibbs sampler for basic inference. ◮ Problem: Can we avoid running the extra scoring chains?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

slide-81
SLIDE 81

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-82
SLIDE 82

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-83
SLIDE 83

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference βt+

2

βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-84
SLIDE 84

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring βt+

2

βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-85
SLIDE 85

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring βt+

2

ˆ βt−

1

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-86
SLIDE 86

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

βt+

2

ˆ βt−

1

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-87
SLIDE 87

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

  • Run perturbed chain; sample from stationary distribution.
  • Compute U(p(β|yi′,l′, X, Y )).

βt+

2

ˆ βt−

1

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-88
SLIDE 88

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

  • Run perturbed chain; sample from stationary distribution.
  • Compute U(p(β|yi′,l′, X, Y )).

◮ Our method: βt+

2

ˆ βt−

1

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-89
SLIDE 89

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

  • Run perturbed chain; sample from stationary distribution.
  • Compute U(p(β|yi′,l′, X, Y )).

◮ Our method:

  • Get approximate samples of p(β|yi′,l′, X, Y ) by transforming

samples of p(β|X, Y ). βt+

2

ˆ βt−

1

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-90
SLIDE 90

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

  • Run perturbed chain; sample from stationary distribution.
  • Compute U(p(β|yi′,l′, X, Y )).

◮ Our method:

  • Get approximate samples of p(β|yi′,l′, X, Y ) by transforming

samples of p(β|X, Y ). βt+

2

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

ˆ βt+

3

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-91
SLIDE 91

Contribution II: Approximate Active Learning

Contribution II: Approximate Active Learning

◮ Gibbs sampler for p(β|X, Y ) is a Markov chain for inference ◮ Sampler for p(β|yi′,l′, X, Y ) is a perturbed chain for scoring ◮ Na¨

ıve scoring:

  • Run perturbed chain; sample from stationary distribution.
  • Compute U(p(β|yi′,l′, X, Y )).

◮ Our method:

  • Get approximate samples of p(β|yi′,l′, X, Y ) by transforming

samples of p(β|X, Y ).

  • Approximate U(p(β|yi′,l′, X, Y )) from these.

βt+

2

ˆ βt+

1

ˆ βt+

2

ˆ βt βt+

1

βt βt−

1

ˆ βt+

3

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

slide-92
SLIDE 92

Contribution II: Approximate Active Learning

Approximate Scoring for Active Learning

◮ Suppose chain p(βt|βt−1) and a perturbed chain ˆ

p(ˆ βt|ˆ βt−1).

◮ Stationary distributions are p∞(β) and ˆ

p∞(ˆ β).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

slide-93
SLIDE 93

Contribution II: Approximate Active Learning

Approximate Scoring for Active Learning

◮ Suppose chain p(βt|βt−1) and a perturbed chain ˆ

p(ˆ βt|ˆ βt−1).

◮ Stationary distributions are p∞(β) and ˆ

p∞(ˆ β).

◮ Let βs ∼ p∞(β) s = 1, . . . , S, and approximate

ˆ p∞(ˆ β) ≈

  • ˆ

p(ˆ β|β)p∞(β)dβ ≈ 1 S

S

  • s=1

ˆ p(ˆ β|βs).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

slide-94
SLIDE 94

Contribution II: Approximate Active Learning

Approximate Scoring for Active Learning

◮ Suppose chain p(βt|βt−1) and a perturbed chain ˆ

p(ˆ βt|ˆ βt−1).

◮ Stationary distributions are p∞(β) and ˆ

p∞(ˆ β).

◮ Let βs ∼ p∞(β) s = 1, . . . , S, and approximate

ˆ p∞(ˆ β) ≈

  • ˆ

p(ˆ β|β)p∞(β)dβ ≈ 1 S

S

  • s=1

ˆ p(ˆ β|βs).

◮ If p∞(β) = ˆ

p∞(β), the first approximation is exact.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

slide-95
SLIDE 95

Contribution II: Approximate Active Learning

Approximate Scoring for Active Learning

◮ Suppose chain p(βt|βt−1) and a perturbed chain ˆ

p(ˆ βt|ˆ βt−1).

◮ Stationary distributions are p∞(β) and ˆ

p∞(ˆ β).

◮ Let βs ∼ p∞(β) s = 1, . . . , S, and approximate

ˆ p∞(ˆ β) ≈

  • ˆ

p(ˆ β|β)p∞(β)dβ ≈ 1 S

S

  • s=1

ˆ p(ˆ β|βs).

◮ If p∞(β) = ˆ

p∞(β), the first approximation is exact.

◮ Specialize to active learning:

  • Unperturbed chain = Gibbs sampler for p(β|X, Y ).
  • Perturbed chain = Gibbs sampler for p(β|yi′,l′, X, Y ).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

slide-96
SLIDE 96

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Suppose W is n × n, positive, symmetric. P = D−1W .

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

slide-97
SLIDE 97

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Suppose W is n × n, positive, symmetric. P = D−1W . ◮ Stationary distribution is left eigenvector of P. Decompose

A = D−1/2WD−1/2 (5) = V ΛV ⊤, λ1 ≤ λ2 ≤ . . . ≤ λn = 1 (6) p∞ ∝ D1/2vn (7)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

slide-98
SLIDE 98

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Suppose W is n × n, positive, symmetric. P = D−1W . ◮ Stationary distribution is left eigenvector of P. Decompose

A = D−1/2WD−1/2 (5) = V ΛV ⊤, λ1 ≤ λ2 ≤ . . . ≤ λn = 1 (6) p∞ ∝ D1/2vn (7)

◮ Perturb the matrix: ˆ

W = W + dW ≥ 0, with dW 1 = 0.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

slide-99
SLIDE 99

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Suppose W is n × n, positive, symmetric. P = D−1W . ◮ Stationary distribution is left eigenvector of P. Decompose

A = D−1/2WD−1/2 (5) = V ΛV ⊤, λ1 ≤ λ2 ≤ . . . ≤ λn = 1 (6) p∞ ∝ D1/2vn (7)

◮ Perturb the matrix: ˆ

W = W + dW ≥ 0, with dW 1 = 0.

◮ Then ˆ

P = D−1 ˆ W = P + D−1dW = P + dP.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

slide-100
SLIDE 100

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Matrix perturbation theory:

˜ p∞ ≈ p∞ + D1/2  

k=n

vkv⊤

k

1 − λk   dP⊤D−1/2p∞ (8)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 23

slide-101
SLIDE 101

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Matrix perturbation theory:

˜ p∞ ≈ p∞ + D1/2  

k=n

vkv⊤

k

1 − λk   dP⊤D−1/2p∞ (8)

◮ Works for discrete random walks, but not in general.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 23

slide-102
SLIDE 102

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Matrix perturbation theory:

˜ p∞ ≈ p∞ + D1/2  

k=n

vkv⊤

k

1 − λk   dP⊤D−1/2p∞ (8)

◮ Works for discrete random walks, but not in general. ◮ Our method is general and approximates:

ˆ p∞ ≈ ˆ P⊤p∞ = p∞ + dP⊤p∞. (9)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 23

slide-103
SLIDE 103

Contribution II: Approximate Active Learning

Special Case: Discrete Random Walks

◮ Matrix perturbation theory:

˜ p∞ ≈ p∞ + D1/2  

k=n

vkv⊤

k

1 − λk   dP⊤D−1/2p∞ (8)

◮ Works for discrete random walks, but not in general. ◮ Our method is general and approximates:

ˆ p∞ ≈ ˆ P⊤p∞ = p∞ + dP⊤p∞. (9)

◮ If D = I then accuracy depends on spectral gap.

| |˜ p∞ − ˆ p∞| | ≤ max

  • 1,

1 1 − λn−1

  • dP⊤p∞
  • .

(10)

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 23

slide-104
SLIDE 104

Active Learning Results

Overview

Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 24

slide-105
SLIDE 105

Active Learning Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 25

slide-106
SLIDE 106

Active Learning Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Active learning methods can query 100 labels.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 25

slide-107
SLIDE 107

Active Learning Results

Results: Crowdsourced data

◮ Task: Is the triangle to the left or above the rectangle ◮ Active learning methods can query 100 labels. ◮ Here: only query gold standard (could be other labeller).

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 25

slide-108
SLIDE 108

Active Learning Results ◮ Averaged log likelihood and error rate on test set. ◮ BBMC and BBMC-ACT: us with/without active learning. Algorithm Final Loglik Final Error No Active Learning GOLD −3716 ± 1695 0.0547 ± 0.0102 CONS −421.1 ± 2.6 0.0935 ± 0.0031 BBMC −219.1 ± 3.1 0.0309 ± 0.0033

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 26

slide-109
SLIDE 109

Active Learning Results ◮ Averaged log likelihood and error rate on test set. ◮ BBMC and BBMC-ACT: us with/without active learning. Algorithm Final Loglik Final Error No Active Learning GOLD −3716 ± 1695 0.0547 ± 0.0102 CONS −421.1 ± 2.6 0.0935 ± 0.0031 BBMC −219.1 ± 3.1 0.0309 ± 0.0033 Active Learning GOLD-ACT −1957 ± 696 0.0290 ± 0.0037 CONS-ACT −396.1 ± 3.6 0.0906 ± 0.0024 RAND-ACT −186.0 ± 2.2 0.0292 ± 0.0029 DIS-ACT −198.3 ± 5.8 0.0392 ± 0.0052 MCMC-ACT −196.1 ± 6.7 0.0492 ± 0.0050 BBMC-ACT −160.8 ± 3.9 0.0188 ± 0.0018

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 26

slide-110
SLIDE 110

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-111
SLIDE 111

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-112
SLIDE 112

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-113
SLIDE 113

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.
  • Scales well in the number of tasks.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-114
SLIDE 114

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.
  • Scales well in the number of tasks.
  • Performs well when consensus labels cannot be estimated.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-115
SLIDE 115

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.
  • Scales well in the number of tasks.
  • Performs well when consensus labels cannot be estimated.

◮ Approximate active learning for Gibbs sampling inference.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-116
SLIDE 116

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.
  • Scales well in the number of tasks.
  • Performs well when consensus labels cannot be estimated.

◮ Approximate active learning for Gibbs sampling inference.

  • Fast scoring by reusing Gibbs samples.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-117
SLIDE 117

Conclusion

Conclusion

◮ Bayesian model to mitigate label bias.

  • Unifies crowdsourcing pipeline into one model.
  • Labelers express accumulated, shared preferences.
  • Scales well in the number of tasks.
  • Performs well when consensus labels cannot be estimated.

◮ Approximate active learning for Gibbs sampling inference.

  • Fast scoring by reusing Gibbs samples.
  • Outperforms na¨

ıve MCMC scoring.

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 27

slide-118
SLIDE 118

Conclusion

Questions?

Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 28