Probabilistic Modeling for Crowdsourcing Partially-Subjective - PowerPoint PPT Presentation

Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings An T. Nguyen 1 ∗ Matthew Halpern 1 Byron C. Wallace 2 Matthew Lease 1 1 University of Texas at Austin 2 Northeastern University HCOMP 2016 ∗ Presenter 1

Probabilistic Modeling A popular approach to improve labels quality 2

Probabilistic Modeling A popular approach to improve labels quality Dawid & Skene (1979) ◮ Model true labels as hidden variables. ◮ Qualities of workers as parameters. ◮ Estimation: EM algorithm. 2

Probabilistic Modeling A popular approach to improve labels quality Dawid & Skene (1979) ◮ Model true labels as hidden variables. ◮ Qualities of workers as parameters. ◮ Estimation: EM algorithm. Extensions ◮ Bayesian (Kim & Ghahramani 2012) ◮ Communities (Venanzi et. al. 2014) ◮ Instance features (Kamar et. al. 2015) 2

Probabilistic Modeling Common assumption: Single true label for each instance. (i.e. objective task) 3

Probabilistic Modeling Common assumption: Single true label for each instance. (i.e. objective task) Subjective task ? ◮ No single true labels ◮ Gold standard may not be appropriate (Sen et. al., CSCW 2015) 3

Video Rating task Data: ◮ User interaction in smartphone. ◮ Varying hardware configurations (CPU freq. , cores, GPU) Task ◮ Watch a short video ◮ Rate user satisfaction from 1 to 5 ◮ 370 videos, ≈ 50 AMT ratings each. 4

General Setting For each instance: ◮ No single true label ... (i.e. no instance-level gold standard) 5

General Setting For each instance: ◮ No single true label ... (i.e. no instance-level gold standard) ◮ ... but true distribution over true labels. (i.e. gold standard on instance label distribution ) Our data: Instances = Videos, Distribution of ratings. 5

General Setting For each instance: ◮ No single true label ... (i.e. no instance-level gold standard) ◮ ... but true distribution over true labels. (i.e. gold standard on instance label distribution ) Our data: Instances = Videos, Distribution of ratings. Two tasks: ◮ Predict that distribution. ◮ Detect unreliable workers. 5

Model Intuition: 1. Unreliable workers tend to give unreliable ratings. 6

Model Intuition: 1. Unreliable workers tend to give unreliable ratings. 2. Unreliable ratings are independent of instances. (e.g. rate videos without watching) 6

Model Intuition: 1. Unreliable workers tend to give unreliable ratings. 2. Unreliable ratings are independent of instances. (e.g. rate videos without watching) Assumptions: 1. Worker j has param θ j : how often his labels unreliable. 6

Model Intuition: 1. Unreliable workers tend to give unreliable ratings. 2. Unreliable ratings are independent of instances. (e.g. rate videos without watching) Assumptions: 1. Worker j has param θ j : how often his labels unreliable. 2. Rating labels are samples from Normal ( µ, σ ) 6

Model Intuition: 1. Unreliable workers tend to give unreliable ratings. 2. Unreliable ratings are independent of instances. (e.g. rate videos without watching) Assumptions: 1. Worker j has param θ j : how often his labels unreliable. 2. Rating labels are samples from Normal ( µ, σ ) ◮ Unreliable: µ, σ fixed. ◮ Reliable: µ, σ vary with instances. 6

Model (i indexes instances, j indexes workers) A, B Reliable indicator Z ij ∼ Ber ( θ j ) Beta θ j Ber Z ij x i Normal w , v 3 , s dot L ij Instances Workers 7

Model (i indexes instances, j indexes workers) A, B Reliable indicator Z ij ∼ Ber ( θ j ) Beta Labels θ j L ij | Z ij = 0 ∼ N (3 , s ) Ber L ij | Z ij = 1 ∼ N ( µ i , σ 2 i ) Z ij x i Normal w , v 3 , s dot L ij Instances Workers 7

Model (i indexes instances, j indexes workers) A, B Reliable indicator Z ij ∼ Ber ( θ j ) Beta Labels θ j L ij | Z ij = 0 ∼ N (3 , s ) Ber L ij | Z ij = 1 ∼ N ( µ i , σ 2 i ) Models: Features → µ, σ Z ij x i µ i = w T x i Normal w , v σ i = exp( v T x i ) 3 , s dot L ij Instances Workers 7

Model (i indexes instances, j indexes workers) A, B Reliable indicator Z ij ∼ Ber ( θ j ) Beta Labels θ j L ij | Z ij = 0 ∼ N (3 , s ) Ber L ij | Z ij = 1 ∼ N ( µ i , σ 2 i ) Models: Features → µ, σ Z ij x i µ i = w T x i Normal w , v σ i = exp( v T x i ) 3 , s dot Prior L ij θ j ∼ Beta ( A , B ) Instances Workers 7

Learning (For model without prior on θ ) EM algorithm, iterate 8

Learning (For model without prior on θ ) EM algorithm, iterate E-step: Infer posterior over Z ij (analytic solution) M-step: Optimize parameters w , v and θ (BFGS) 8

Learning (For the Bayesian model, with prior on θ ) Closed-form EM not possible 9

Learning (For the Bayesian model, with prior on θ ) Closed-form EM not possible Meanfield: approximate posterior p ( z , θ ) by � � q ( z , θ ) = q ( Z ij ) q ( θ j ) ij j 9

Learning (For the Bayesian model, with prior on θ ) Closed-form EM not possible Meanfield: approximate posterior p ( z , θ ) by � � q ( z , θ ) = q ( Z ij ) q ( θ j ) ij j Minimize KL ( q || p ) using co-ordinate descent. (similar to LDA topic model, details on paper) 9

Evaluation Difficulty: Subjective, don’t know who is reliable. 10

Evaluation Difficulty: Subjective, don’t know who is reliable. Solution: ◮ Assume all labels in data are reliable. ◮ Select p % workers at random. ◮ Change q % their labels to ‘unreliable labels’. 10

Evaluation Difficulty: Subjective, don’t know who is reliable. Solution: ◮ Assume all labels in data are reliable. ◮ Select p % workers at random. ◮ Change q % their labels to ‘unreliable labels’. ◮ p , q are evaluation parameters ( p ∈ { 0 , 5 , 10 , 15 , 20 } , q ∈ { 20 , 40 , 60 , 80 , 100 } ) 10

Evaluation Distribution of ‘unreliable labels’. 11

Evaluation Distribution of ‘unreliable labels’. AMT task ◮ Pretend to be spammer. ◮ Give ratings without watching video. 11

Evaluation Distribution of ‘unreliable labels’. AMT task ◮ Pretend to be spammer. ◮ Give ratings without watching video. Recall our model: ◮ unreliable lab. ∼ N (3 , s ) ◮ i.e. We don’t cheat. 11

Baselines Predict ratings distribution (mean & var) ◮ Two Linear Regression models ... ◮ ... for mean and variance. 12

Baselines Predict ratings distribution (mean & var) ◮ Two Linear Regression models ... ◮ ... for mean and variance. Detect unreliable workers: Average Deviation ◮ Each instance: Deviation from the mean rating. ◮ Each worker: average the deviations. ◮ High AD → unreliable. 12

Results (varying unreliable workers) (Baselines LR2: Linear Regression, AD: Average Deviation NEW: Our Model , B-NEW: Our Bayesian Model ) 13

Observations ◮ Bayesian model (B-NEW) better in prediction... ◮ ... but worse in detecting unreliable workers. 14

Observations ◮ Bayesian model (B-NEW) better in prediction... ◮ ... but worse in detecting unreliable workers. Prior on worker parameter θ ◮ Reduce overfitting of w , v . ◮ Create bias on workers. 14

Observations ◮ Bayesian model (B-NEW) better in prediction... ◮ ... but worse in detecting unreliable workers. Prior on worker parameter θ ◮ Reduce overfitting of w , v . ◮ Create bias on workers. Other experiments ◮ Varying unreliable ratings, training data, number of workers ◮ Similar results (on paper). 14

Discussion ◮ Subjective task: common but little work. ◮ Our method improves prediction & detection. 15

Discussion ◮ Subjective task: common but little work. ◮ Our method improves prediction & detection. Extensions: ◮ Improve recommendation systems. ◮ Other subjective tasks. ◮ More realistic evaluation. ◮ Better learning for Bayesian model. 15

Discussion ◮ Subjective task: common but little work. ◮ Our method improves prediction & detection. Extensions: ◮ Improve recommendation systems. ◮ Other subjective tasks. ◮ More realistic evaluation. ◮ Better learning for Bayesian model. Data + Code on GitHub Acknowledgment: Reviewers, Workers, NSF 15

Discussion ◮ Subjective task: common but little work. ◮ Our method improves prediction & detection. Extensions: ◮ Improve recommendation systems. ◮ Other subjective tasks. ◮ More realistic evaluation. ◮ Better learning for Bayesian model. Data + Code on GitHub Acknowledgment: Reviewers, Workers, NSF (and Angry Birds). Questions? 15

Probabilistic Modeling for Crowdsourcing Partially-Subjective - PowerPoint PPT Presentation

Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings An T. Nguyen 1 Matthew Halpern 1 Byron C. Wallace 2 Matthew Lease 1 1 University of Texas at Austin 2 Northeastern University HCOMP 2016 Presenter 1 Probabilistic

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

A/B Testing Crowdsourcing and Human Computation Instructor: Chris Callison-Burch Website:

Crowdsourcing and Human Computer Interaction Design Crowdsourcing and Human Computation

How Crowdsourcing Enabled Computer Vision Crowdsourcing and Human Computation Instructor: Chris

Rise of Crowdsourcing Crowdsourcing = Harvesting societys wisdom, skill, creativity, and scale

Crowdsourcing and HCI 2: Privacy and Latency Crowdsourcing and Human Computation Instructor:

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Crowdsourcing of Weather Data on Mobile App and Deep Learning Lior Perez 99th AMS annual

Crowdsourcing Cytogenetic Biodosimetry Dose Estimation Crowdsourcing Cytogenetic Biodosimetry Dose

Using CrowdSourcing for Data Analytics Hector Garcia-Molina (work with Steven Whang, Peter

Crowdsourcing and Human Computation Instructor: Chris Callison-Burch Website:

Speech Transcrip-on with Crowdsourcing Crowdsourcing and Human Computa2on Instructor: Chris

A Micro Crowdsourcing Architecture to Localize A Micro Crowdsourcing Architecture to Localize Web

crowdsourcing workflow control Nate Tucker and Perry Green barriers to effective crowdsourcing

Sustaining Ecological Networks and their Services: Network theory of biodiversity and ecosystem

ATMS RDR/TDR/SDR Cal/Val Plan Prepared by W. J. Blackwell and R. V. Leslie NPOESS SOAT / NPP

Supporting IP Multicast Integrated Services in ATM Networks L. Salgarelli and A. Corghi, CEFRIEL

Ethernet Switches layer 2 (frame) forwarding, filtering using LAN addresses Switching:

CONTEXTUAL INTERACTION SUPPORT IN 3D WORLDS DS-RT 2011 Norman Murray University of Salford

Trigger Validation meeting Egamma slice G.Lerner, R. White July 22, 2015 Slide 1/4 July 22,

Measurements of aerosol optical properties at the Station for Measuring EcosystemAtmosphere

ADVANCED ALGORITHMS Lecture 20: Linear Programming 1 ANNOUNCEMENTS HW 4 is due on Monday,