perspectives on
play

Perspectives on Infrastructure for Crowdsourcing Omar Alonso - PowerPoint PPT Presentation

Perspectives on Infrastructure for Crowdsourcing Omar Alonso Microsoft 9 February 2011 WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining Disclaimer The views and opinions expressed in this talk are mine and do not necessarily


  1. Perspectives on Infrastructure for Crowdsourcing Omar Alonso Microsoft 9 February 2011 WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  2. Disclaimer The views and opinions expressed in this talk are mine and do not necessarily reflect the official policy or position of Microsoft. WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  3. Disclaimer – II • Personal experience – MTurk, CrowdFlower, Internal MS tools • IR focus – Relevance evaluation, assessment, ranking, query classification, etc – TREC, INEX, Twitter, Facebook • Continuity • Industry perspective WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  4. Introduction • Crowdsourcing is hot • Lots of interest in the research community – Articles showing good results – Workshops and tutorials (ECIR’10, SIGIR’10 , NACL’10, WSDM’11, WWW’11, etc.) – CrowdConf • Large companies leveraging crowdsourcing • Start-ups • VCs are putting money on it WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  5. Areas of interest • Social/behavioral science • Human factors • Algorithms • Databases • Distributed systems • Statistics WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  6. Why Mechanical Turk • Brand (Amazon) • Speed of experimentation • Price • Diversity • Payments • Lots of problems and missing features – Still, people keep using it WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  7. Pedal to the metal • You read the papers • You tell your boss that crowdsourcing is the way to go • You know need to produce hundreds of Ks of labels per month • Easy, right? WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  8. Why not Mechanical Turk • Spam • Worker and task quality • No analytics • Need to build tools around it WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  9. Alternatives? • First mover advantage • The service hasn’t evolved that much • $$$ • People are trying … – CrowdFlower, CloudCrowd, etc. WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  10. Infrastructure thoughts WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  11. The human • As a worker – I hate when instructions are not clear – I’m not a spammer – I just don’t get what you want – Boring task – A good pay is ideal but not the only condition for engagement WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  12. The human – features • Routing/recommendation of similar tasks based on past behavior and/or content. • Requester rating based on payment performance, rejected work, and overall task difficulty. A worker should be able to rate the quality of work and also the quality of the requester. • Ability to comment on a task • Work categorization. Similarly to a job search site, all work that is available should be classified WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  13. The experimenter • As an experimenter – Balancing act: an experiment that would produce the right results and is appealing to workers – Attrition – I want your honest answer for the task – I want qualified workers and I want the system to do some of that for me WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  14. The experimenter – features • Ability to manage workers in different levels of expertise including spammers and potential cases. • Abstract the task as much as possible from the quality control statistics. The developer should provide thresholds for good output. • Ability to mix different pools of workers based on different profile and expertise levels. • Honey-pot management and incremental qualification tests based on expertise and past performance. WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  15. The system • Similarities with MapReduce approaches • Integration of human computation to a language • I would like to program the crowd • Built-in statistics and other quality control WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  16. The system – features • Performance and high availability • Spam detection built in • Payments (including international markets) • Inter-agreement statistics library and ability to plug-in a user-defined one • Uncertainty management • High-level language for designing tasks • Analytics WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

  17. Conclusions and questions • Social networking and crowdsourcing • Crowds, clouds and algorithms • What is the best way to perform human computation? • What is the best way to combine CPU with HPU for solving problems? • What are the desirable integration points for a computation that involves CPU and HPU? WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend