Reducing Label Cost by Combining Feature Labels and Crowdsourcing
Jay Pujara jay@cs.umd.edu Ben London blondon@cs.umd.edu Lise Getoor getoor@cs.umd.edu University of Maryland, College Park
Combining Learning Strategies to Reduce Label Cost 7/2/2011
Reducing Label Cost by Combining Feature Labels and Crowdsourcing - - PowerPoint PPT Presentation
Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies to Reduce Label Cost 7/2/2011 Jay Pujara jay@cs.umd.edu Ben London blondon@cs.umd.edu Lise Getoor getoor@cs.umd.edu University of Maryland,
Combining Learning Strategies to Reduce Label Cost 7/2/2011
McCallum, Andrew and Nigam, Kamal. Text classification by bootstrapping with keywords, EM, and shrinkage. ACL99
Ambati, V., Vogel, S., and Carbonell, J. Active learning and crowd-sourcing for machine
Initialize S by applying feature labels F to data U For t = 1, …, T:
Task: Sentiment Analysis (happy/sad tweets) Data: 77920 normalized* tweets originally
Evaluation Set: 500 hand-labeled tweets Feature labels: happy and sad emoticons from
Crowdsourcing: HIT on Amazon’s Mechanical Turk
Active Learning/Bootstrapping: Use MEGAM
Yang, Jaewon and Leskovec, Jure. Patterns of temporal variation in
Daumé III, Hal. http://www.cs.utah.edu/~hal/megam/ Wikipedia: List of Emoticons http://en.wikipedia.org/wiki/List_of_emoticons
Same amount of data per iteration Active Bootstrapping outperforms Feature Labels +
Even with additional starting data, Feature Labels +
Both methods cost about the same ($16), but
Active Bootstrapping combines the best of both worlds:
100 200 300 400 500 600 Boot 1k Boot 2k Boot 10k Crowd A.B. Crowd Expert
Method Err, I0 Err, I8 Feature Lables, 1K .332 .367 Feature Lables, 2K .302 .353 Feature Lables, 10K .295 .348 Crowdsource, 2K .374 .478 Active Bootstrapping .332 .292
Reduce label cost by combining strategies Introduce algorithm, Active Bootstrapping:
Evaluate on a real-world dataset/task (sentiment