Strategic Classification with Crowdsourcing Yang Liu ( joint work - PowerPoint PPT Presentation

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016

Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic classification f ∗ : R d → {− 1 , +1 } y i = f ∗ ( x i ) , • Observing a set of training data, to learn f n ˜ � f = argmin f ∈F l ( f ( x i ) , y i ) . i =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 2 / 11

Introduction Preliminary Our results Conclusion Strategic classification When data comes from strategic data sources... • Outsource x i to get a label ˜ y i . • Crowdsourcing, survey, human reports etc. Such training data carries noise • Intrinsic : due to limited worker expertise. • Strategic : lack of incentives. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 3 / 11

Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Information elicitation without verification • Peer prediction: SCORE(˜ y i , ˜ y j ) • DG13, RF15, SAFP16, KS16... SCORE(˜ y i , ˜ y j ) • Exerting effort to have a high quality data is usually a good equilibria. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

Introduction Preliminary Our results Conclusion Our method Joint learning and information elicitation: • SCORE(˜ y i , ˜ y j ) ⇒ SCORE(˜ y i , Machine) • ”Machine Prediction” • How to obtain a good machine answer? SCORE(˜ y i , Machine) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 5 / 11

Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

Introduction Preliminary Our results Conclusion Our mechanism For each worker i : • Estimate flipping errors ˜ p i , + , ˜ p i , − based on { x j , ˜ y j } j � = i . • Train ˜ f ∗ l , − i using [Natarajan et al. 13] with data from j � = i . ˜ Agree? ˆ ˜ y i f ∗ l, − i ( x i ) ˜ Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 7 / 11

Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

Strategic Classification with Crowdsourcing Yang Liu ( joint work - PowerPoint PPT Presentation

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016 Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic

A/B Testing Crowdsourcing and Human Computation Instructor: Chris Callison-Burch Website:

Crowdsourcing and Human Computer Interaction Design Crowdsourcing and Human Computation

How Crowdsourcing Enabled Computer Vision Crowdsourcing and Human Computation Instructor: Chris

Rise of Crowdsourcing Crowdsourcing = Harvesting societys wisdom, skill, creativity, and scale

Crowdsourcing and HCI 2: Privacy and Latency Crowdsourcing and Human Computation Instructor:

Crowdsourcing of Weather Data on Mobile App and Deep Learning Lior Perez 99th AMS annual

Crowdsourcing Cytogenetic Biodosimetry Dose Estimation Crowdsourcing Cytogenetic Biodosimetry Dose

Using CrowdSourcing for Data Analytics Hector Garcia-Molina (work with Steven Whang, Peter

Crowdsourcing and Human Computation Instructor: Chris Callison-Burch Website:

Speech Transcrip-on with Crowdsourcing Crowdsourcing and Human Computa2on Instructor: Chris

A Micro Crowdsourcing Architecture to Localize A Micro Crowdsourcing Architecture to Localize Web

crowdsourcing workflow control Nate Tucker and Perry Green barriers to effective crowdsourcing

Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS

Crowdsourcing Projects December 11, 2014 Presented by: Crowdsourcing Consortium for Libraries

Enhancing Online 3D Products through Crowdsourcing Thi Phuong Nghiem, Axel Carlier, Geraldine

Compliance Crowdsourcing: Managing customer audits at scale Craig Erickson, CISSP, CISA Data

IN IN LI LINE E AN AND BAR D BAR GRA GRAPH PHS: S: UNDE DERES RESTIMA TIMATION, TION,

Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud Based mainly on slides by Will

Probability and Statistics for Computer Science How

Unit 5: Inference for categorical variables Lecture 3: Chi-square tests Statistics 101 Thomas

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Biostatistics Preparatory Course: Methods and Computing Lecture 6 Simulations Methods and

Fu Func nctio tions ns on t on the he La Latt ttic ice Huey-Wen Lin University of

Pileup Systematic Studies in The Fermilab Muon g-2 Experiment Meghna Bhattacharya University of