strategic classification with crowdsourcing
play

Strategic Classification with Crowdsourcing Yang Liu ( joint work - PowerPoint PPT Presentation

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016 Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic


  1. Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016

  2. Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic classification f ∗ : R d → {− 1 , +1 } y i = f ∗ ( x i ) , • Observing a set of training data, to learn f n ˜ � f = argmin f ∈F l ( f ( x i ) , y i ) . i =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 2 / 11

  3. Introduction Preliminary Our results Conclusion Strategic classification When data comes from strategic data sources... • Outsource x i to get a label ˜ y i . • Crowdsourcing, survey, human reports etc. Such training data carries noise • Intrinsic : due to limited worker expertise. • Strategic : lack of incentives. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 3 / 11

  4. Introduction Preliminary Our results Conclusion Strategic classification When data comes from strategic data sources... • Outsource x i to get a label ˜ y i . • Crowdsourcing, survey, human reports etc. Such training data carries noise • Intrinsic : due to limited worker expertise. • Strategic : lack of incentives. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 3 / 11

  5. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  6. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  7. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Information elicitation without verification • Peer prediction: SCORE(˜ y i , ˜ y j ) • DG13, RF15, SAFP16, KS16... SCORE(˜ y i , ˜ y j ) • Exerting effort to have a high quality data is usually a good equilibria. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  8. Introduction Preliminary Our results Conclusion Our method Joint learning and information elicitation: • SCORE(˜ y i , ˜ y j ) ⇒ SCORE(˜ y i , Machine) • ”Machine Prediction” • How to obtain a good machine answer? SCORE(˜ y i , Machine) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 5 / 11

  9. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  10. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  11. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  12. Introduction Preliminary Our results Conclusion Our mechanism For each worker i : • Estimate flipping errors ˜ p i , + , ˜ p i , − based on { x j , ˜ y j } j � = i . • Train ˜ f ∗ l , − i using [Natarajan et al. 13] with data from j � = i . ˜ Agree? ˆ ˜ y i f ∗ l, − i ( x i ) ˜ Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 7 / 11

  13. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  14. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  15. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  16. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  17. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  18. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  19. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  20. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend