Task Force “Crowdsourcing”
Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak Naderi
WG1 Research
Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 - - PowerPoint PPT Presentation
Task Force Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak Naderi WG1 Research Agenda: Crowdsourcing TF 1. Overview on recent activities of CS TF (Tobias, Babak, 10 min) 2. Short
Task Force “Crowdsourcing”
Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak Naderi
WG1 Research
Agenda: Crowdsourcing TF
– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)
WG1 Research
Joint Activities and Major Outcome
wuerzburg.de/qoewiki/qualinet:crowd:qomex2017meeting
Möller (TU Berlin)
that contains common guidance across different media subjective assessment testing in crowdsourcing (within P.CROWD)
the white: “Best practices and recommendations for crowdsourced QoE-Lessons learned from the qualinet task “force crowdsourcing”.
WG1 Research
Joint Events
31 July - 4 Aug 2017, organized by Matthias Hirth and Tobias Hossfeld http://iotcrowd.org/
Systems, Berlin, 2016
Crowdsourcing: Judith Redi (TU Delft), Matthias Hirth (Uni of Würzburg), Tim Polzehl (TU Berlin)
ISCA Archive
WG1 Research
Joint Publications: Book
Centred Experiments”. Editors: Daniel Archambault, Helen C. Purchase, Tobias Hoßfeld
– Understanding The Crowd: ethical and practical matters in the academic use of crowdsourcing: Sheelagh Carpendale, Neha Gupta, Tobias Hoßfeld, David Martin, Babak Naderi, Judith Redi, Ernestasia Siahaan, Ina Wechsung – Crowdsourcing for QoE Experiments: Sebastian Egger, Judith Redi, Sebastian Möller, Tobias Hossfeld, Matthias Hirth, Christian Keimel, and Babak Naderi – Crowdsourcing Versus the Laboratory: Towards Human-centered Experiments Using the Crowd: Ujwal Gadiraju, Sebastian Möller, Martin Nöllenburg, Dietmar Saupe, Sebastian Egger, Daniel W. Archambault, and Brian Fischer – Crowdsourcing Technology to Support Academic Research: Matthias Hirth, Jason Jacques, Peter Rodgers, Ognjen Scekic, and Michael Wybrow
WG1 Research
Joint Publications: PQS 2016 / PCS 2016
Effective Task Performance on Crowdsourcing Tasks“ by Vaggelis Mourelatos; Manolis Tzagarakis.
Tasks“ by Vlad Hosu; Franz Hahn; Igor Zingman; Dietmar Saupe.
Crowdsourced Subjective Quality Testing“ by Michael Seufert; Tobias Hoßfeld.
crowdsourcing file download QoE study“ by Andreas Sackl; Bruno Gardlo; Raimund Schatz.
quality by Vlad Hosu, Franz Hahn, Oliver Wiedemann, Sung-Hwan Jung, Dietmar Saupe. 32nd Picture Coding Symposium (PCS 2016), Berlin, 2016
WG1 Research
Joint Publications: QoMEX 2017
Quality Evaluation“ by Ondrej Zach; Michael Seufert; Matthias Hirth; Martin Slanina; Phuoc Tran-Gia.
Kathrin Borchert; Matthias Hirth; Thomas Zinner; Anja Göritz.
Streaming with YoMoApp” by Michael Seufert; Nikolas Wehner; Florian Wamser; Pedro Casas; Alessandro D'Alconzo; Phuoc Tran- Gia.
Franz Hahn; Hui Men; Tamas Szirányi; Shujun Li; Dietmar Saupe.
video quality database” by Hui Men; Hanhe Lin; Dietmar Saupe.
Crowdsourcing Approach” by Rafael Zequeira Jiménez, Laura Fernández Gallardo and Sebastian Möller.
WG1 Research
Recent and Future Activities
– National project proposal (under submission) “Analysis of influence factors and definition of subjective methods for evaluating the quality of speech services using crowdsourcing” (Sebastian Möller, Tobias Hoßfeld) – National DFG project (accepted) “Design and Evaluation of new mechanisms for crowdsourcing as emerging paradigm for the
Ralf Steinmetz, Christoph Rensing) http://dfg-crowdsourcing.de
– Adaptive crowdsourcing and automatic (multi-parameter) selection – CS ITU-T: Lab vs. Crowdsourcing experiment as input for standardization – ITU Standardization on Crowdsourcing (P.Crowd)
WG1 Research
WG1 Research
Agenda: Crowdsourcing TF
– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)
WG1 Research
Michael Seufert 11
YoMoApp Portal
YoMoApp
Android mobile app based on Android WebView YouTube mobile website can be displayed, videos
can be streamed via HTML5 video player
Thus, in the app, YouTube is fully functional
including all features, plus immersive monitoring
JavaScript is used to monitor the playback via
the HTML 5 <video>-element (player state/events, video playback time, buffer filling level, video quality level)
Device characteristics, user interactions,
network statistics, and subjective feedback can be obtained through the Android app (screen size, volume, location, cell id, RAT, throughput, etc.)
Michael Seufert 12
YoMoApp Portal
YoMoApp Portal
YoMoApp portal: http://yomoapp.de/dashboard
Michael Seufert 13
YoMoApp Portal
YoMoApp Portal
YoMoApp portal: http://yomoapp.de/dashboard
Michael Seufert 14
YoMoApp Portal
YoMoApp Portal
YoMoApp portal: http://yomoapp.de/dashboard Sign in with Google Account Add device ID (can be obtained in YoMoApp statistics view) Multiple devices can be added for one account Select device and browse statistics of all YoMoApp sessions Download log files of single/multiple sessions
More details on log files: http://www.comnet.informatik.uni-
wuerzburg.de/research/cloud_applications_and_networks/internet _applications/yomoapp/ Use YoMoApp for your crowdsourced QoE study!
Chair of Communication Network, University of Würzburg
Institute of Computer Science Chair of Communication Networks
QualiNet: Crowdsourcing TF
Matthias Hirth
16 16
Joint Research Project Würzburg – Duisburg/Essen
National DFG project Design and Evaluation of new mechanisms
for crowdsourcing as emerging paradigm for the organization of work in the Internet
Research objectives
– Processing of sensitive data – Trade-offs between internal and external crowdsourcing – Integration into day-to-day business
– Trade-offs between data quality and costs – Combination of crowd-based and fixed sensors – Task routing in mobile settings
– (Cost-)optimal selection of test-stimuli – Dynamic adaptation of test setup
17 17
Advertisement
Focus
Combining objective measurements from IoT and subjective ratings from crowdsourcing users
When
Where
Würzburg, Bavaria, Germany
More information available at http://iotcrowd.com
2016
ACR vs vs PC (VQA)
ACR
PC
Judgment Cost
0.75 ACR = 1 PC
Perceptually-Guided Im Image Coding
Experiment A
Experiment B
Δbitrate
subjective scores
F
task Δbap
apparent
avg bitrate improvement for best parameter settings
10% 10%
Konstanz Natural Vid ideo Database
Conventional
KoNViD
quality dimensions
KoNViD-10k
Crowdsourcing TF - Activities at TUB
influence the performance of workers
23
Agenda: Crowdsourcing TF
– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)
WG1 Research
CS ITU-T Experiments
Aim: Recommendation on Speech Quality Assessment using Crowdsourcing; comparable to ITU-T P800 Step1: Replicating ACR lab experiment in CS
25
Screening job Anchoring job LOT
(Listening only test)
(speech)
environment test
stimuli (diff MOS)
question(s)
check
Qualification Expires
CS ITU-T Experiments
We consider different
– CS Platforms: Web based (MTurk, Microworkers, CrowdFlower), Mobile based (Crowdee, Clickworker) – Languages: EN, DE, … – Speech datasets: Speechpool of ITU-T SG12/Q9
(Study group 12, ITU-T)
26
WG1 Research
WG1 Crowdsourcing TF: Contact Information
– Tobias Hoßfeld (University of Duisburg-Essen) tobias.hossfeldf@uni-due.de – Babak Naderi (TU Berlin) Babak.Naderi@telekom.de
– https://www3.informatik.uni-wuerzburg.de/qoewiki/qualinet:crowd – Access to the wiki: contact Tobias and Babak
– Qualinet Mail-Reflector for “Crowdsourcing” cs.wg2.qualinet@listes.epfl.ch. – In order to subscribe in this list, you simply have to send an (empty) email to cs.wg2.qualinet-subscribe@listes.epfl.ch and follow the steps
found http://listes.epfl.ch/doc.cgi?liste=cs.wg2.qualinet.
WG1 Research
WG1 Research
30
On average 708,000 HITS were daily available in 2017 Q1.
Taken from https://www.mturk.com
Task Selection Strategies
workers to decide which task to take?
to predict workers decision? Acceptance
given the task data?
31
$
Enjoy Social P. Effort
Motives Obstacles
Important Factors >Method
Would you perform the presented HIT, why?
Crowd: US, Master Payout: $0.05 + $0.05 Task: Rate 60 HITs
Results: 43% of responses contained a reason
Labeling:
– 34 different labels – Fleiss’ Kappa= 0.62
32
Measure Workload >Method
How to measure expected workload of a HIT?
– Rating Scale Mental Effort (RSME)
One item estimation of the overall effort.
– NASA Task Load Index (TLX)
Six items: mental demand, physical demand, temporal demand, effort, performance, and frustration + Weighting
33 RSME NASA TLX
Measure Workload >Method
Crowd: US, +98%, +500 Task: Rate 30 HITs x 10 times Results:
TLX ratings significantly correlated with the mean overall effort score (Pearson’s rTLX(29) = .92, p < 0.001).
Using RSME
34
Predict Acceptance >Method
Given the HIT ratings, predict if one will accept the job.
Dataset with 373 actual HITs were collected (meta-data, Preview screenshot, HTML code)
Crowd: US, +98%, +500 Payouts:
Introductory: $0.15 HIT Rating: $0.10 +$ 0.05
Task: Rate 401 HITs x 15 times
6015 rating collected
(Gold Standard: 14 HITs were manipulated by size) 35
Predict Acceptance >Results
Modeling: based on 72% of observations
Accuracy: 88.8%, Sensitivity: 85.5 %, Specificity: 91.1 %;
Testing:
Accuracy: 89.54% Sensitivity: 88.47 % Specificity:90.82 %
Using the mean ratings: Accuracy 69.9 %. 36
Predicted Mean EW
Workload Prediction>Method Features:
– Syntax based: From HTML code; 27 features
e.g. # <a>,# <input type =radio/checkbox/text>,…
– Semantic based: From text; 6 features
e.g. # words, # of sentences, rareness score,…
– Visual Features: From screenshot based
37
Workload Prediction>Results
Important features: HIT Type, 100 %, 98 %, 75 %, and 95% quartiles, Sub-clauses, Rewards, # unique stems, and # words.
38
Workload Prediction>Results
Using the Predicted Expected Workload: Accuracy: 77.21 % Sensitivity: 77.71 % Specificity: 76.77 %
Comparing to individual ratings: 12.33% decrease of the accuracy
39
Predicted Mean EW Predicted Profit