Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 - - PowerPoint PPT Presentation

crowdsourcing
SMART_READER_LITE
LIVE PREVIEW

Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 - - PowerPoint PPT Presentation

Task Force Crowdsourcing Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak Naderi WG1 Research Agenda: Crowdsourcing TF 1. Overview on recent activities of CS TF (Tobias, Babak, 10 min) 2. Short


slide-1
SLIDE 1

Task Force “Crowdsourcing”

Erfurt Meeting@QoMEX 2017 30 May 2017, 16.00-16.30 Tobias Hossfeld, Babak Naderi

WG1 Research

slide-2
SLIDE 2

Agenda: Crowdsourcing TF

  • 1. Overview on recent activities of CS TF (Tobias, Babak, 10 min)
  • 2. Short research talks:

– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)

  • 3. Next steps: ITU-T experiments (Babak, 5min)

WG1 Research

slide-3
SLIDE 3

Joint Activities and Major Outcome

  • Collected in Wiki https://www3.informatik.uni-

wuerzburg.de/qoewiki/qualinet:crowd:qomex2017meeting

  • CS ITU-T Experiments on Audio: Current status
  • ITU Standardization on Crowdsourcing: P.CROWD, Sebastian

Möller (TU Berlin)

  • Started last QoMEX: ITU-T Standard on one CS Recommendation

that contains common guidance across different media subjective assessment testing in crowdsourcing (within P.CROWD)

  • ITU-T P912 with an appendix focused on Crowdsourcing based on

the white: “Best practices and recommendations for crowdsourced QoE-Lessons learned from the qualinet task “force crowdsourcing”.

  • 2014. https://hal.archives-ouvertes.fr/hal-01078761 ]

WG1 Research

slide-4
SLIDE 4

Joint Events

  • Summer school on "Crowdsourcing and IoT", Würzburg, Germany,

31 July - 4 Aug 2017, organized by Matthias Hirth and Tobias Hossfeld http://iotcrowd.org/

  • PQS 2016: 5th ISCA/DEGA Workshop on Perceptual Quality of

Systems, Berlin, 2016

  • PQS Special Session on

Crowdsourcing: Judith Redi (TU Delft), Matthias Hirth (Uni of Würzburg), Tim Polzehl (TU Berlin)

  • PQS 2016 Publications in

ISCA Archive

  • Crowdsourcing TF meeting, co-located with PQS 2016, TU Berlin

WG1 Research

slide-5
SLIDE 5

Joint Publications: Book

  • Book: “Evaluation in the Crowd: Crowdsourcing and Human-

Centred Experiments”. Editors: Daniel Archambault, Helen C. Purchase, Tobias Hoßfeld

– Understanding The Crowd: ethical and practical matters in the academic use of crowdsourcing: Sheelagh Carpendale, Neha Gupta, Tobias Hoßfeld, David Martin, Babak Naderi, Judith Redi, Ernestasia Siahaan, Ina Wechsung – Crowdsourcing for QoE Experiments: Sebastian Egger, Judith Redi, Sebastian Möller, Tobias Hossfeld, Matthias Hirth, Christian Keimel, and Babak Naderi – Crowdsourcing Versus the Laboratory: Towards Human-centered Experiments Using the Crowd: Ujwal Gadiraju, Sebastian Möller, Martin Nöllenburg, Dietmar Saupe, Sebastian Egger, Daniel W. Archambault, and Brian Fischer – Crowdsourcing Technology to Support Academic Research: Matthias Hirth, Jason Jacques, Peter Rodgers, Ognjen Scekic, and Michael Wybrow

WG1 Research

slide-6
SLIDE 6

Joint Publications: PQS 2016 / PCS 2016

  • „Worker's Cognitive Abilities and Personality Traits as Predictors of

Effective Task Performance on Crowdsourcing Tasks“ by Vaggelis Mourelatos; Manolis Tzagarakis.

  • „Reported Attention as a Promising Alternative to Gaze in IQA

Tasks“ by Vlad Hosu; Franz Hahn; Igor Zingman; Dietmar Saupe.

  • „One Shot Crowdtesting: Approaching the Extremes of

Crowdsourced Subjective Quality Testing“ by Michael Seufert; Tobias Hoßfeld.

  • „Size does matter. Comparing the results of a lab and a

crowdsourcing file download QoE study“ by Andreas Sackl; Bruno Gardlo; Raimund Schatz.

  • Saliency-driven image coding improves overall perceived JPEG

quality by Vlad Hosu, Franz Hahn, Oliver Wiedemann, Sung-Hwan Jung, Dietmar Saupe. 32nd Picture Coding Symposium (PCS 2016), Berlin, 2016

WG1 Research

slide-7
SLIDE 7

Joint Publications: QoMEX 2017

  • On Use of Crowdsourcing for H.264/AVC and H.265/HEVC Video

Quality Evaluation“ by Ondrej Zach; Michael Seufert; Matthias Hirth; Martin Slanina; Phuoc Tran-Gia.

  • „Collecting Subjective Ratings in Enterprise Environments“ by

Kathrin Borchert; Matthias Hirth; Thomas Zinner; Anja Göritz.

  • “Unsupervised QoE Field Study for Mobile YouTube Video

Streaming with YoMoApp” by Michael Seufert; Nikolas Wehner; Florian Wamser; Pedro Casas; Alessandro D'Alconzo; Phuoc Tran- Gia.

  • “The Konstanz natural video database (KoNViD-1k)“ by Vlad Hosu;

Franz Hahn; Hui Men; Tamas Szirányi; Shujun Li; Dietmar Saupe.

  • “Empirical evaluation of no-reference VQA methods on a natural

video quality database” by Hui Men; Hanhe Lin; Dietmar Saupe.

  • “Scoring Voice Likability using Pair-Comparison: Laboratory vs.

Crowdsourcing Approach” by Rafael Zequeira Jiménez, Laura Fernández Gallardo and Sebastian Möller.

WG1 Research

slide-8
SLIDE 8

Recent and Future Activities

  • (Joint) projects and project proposals

– National project proposal (under submission) “Analysis of influence factors and definition of subjective methods for evaluating the quality of speech services using crowdsourcing” (Sebastian Möller, Tobias Hoßfeld) – National DFG project (accepted) “Design and Evaluation of new mechanisms for crowdsourcing as emerging paradigm for the

  • rganization of work in the Internet” (Tobias Hoßfeld, Phuoc Tran-Gia,

Ralf Steinmetz, Christoph Rensing) http://dfg-crowdsourcing.de

  • Crowdsourcing and IoT
  • Enterprise Crowdsourcing
  • Future plans and steps

– Adaptive crowdsourcing and automatic (multi-parameter) selection – CS ITU-T: Lab vs. Crowdsourcing experiment as input for standardization – ITU Standardization on Crowdsourcing (P.Crowd)

WG1 Research

slide-9
SLIDE 9

WG1 Research

Active TF!

slide-10
SLIDE 10

Agenda: Crowdsourcing TF

  • 1. Overview on recent activities of CS TF (Tobias, Babak, 10 min)
  • 2. Short research talks:

– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)

  • 3. Next steps: ITU-T experiments (Babak, 5min)

WG1 Research

slide-11
SLIDE 11

Michael Seufert 11

YoMoApp Portal

YoMoApp

 Android mobile app based on Android WebView  YouTube mobile website can be displayed, videos

can be streamed via HTML5 video player

 Thus, in the app, YouTube is fully functional

including all features, plus immersive monitoring

 JavaScript is used to monitor the playback via

the HTML 5 <video>-element (player state/events, video playback time, buffer filling level, video quality level)

 Device characteristics, user interactions,

network statistics, and subjective feedback can be obtained through the Android app (screen size, volume, location, cell id, RAT, throughput, etc.)

slide-12
SLIDE 12

Michael Seufert 12

YoMoApp Portal

YoMoApp Portal

 YoMoApp portal: http://yomoapp.de/dashboard

slide-13
SLIDE 13

Michael Seufert 13

YoMoApp Portal

YoMoApp Portal

 YoMoApp portal: http://yomoapp.de/dashboard

slide-14
SLIDE 14

Michael Seufert 14

YoMoApp Portal

YoMoApp Portal

 YoMoApp portal: http://yomoapp.de/dashboard  Sign in with Google Account  Add device ID (can be obtained in YoMoApp statistics view)  Multiple devices can be added for one account  Select device and browse statistics of all YoMoApp sessions  Download log files of single/multiple sessions

  • Playout information log (playtime, buffered playtime,…)
  • Event log (device information, network information,…)
  • Statistics log (streaming overview, user rating)

 More details on log files: http://www.comnet.informatik.uni-

wuerzburg.de/research/cloud_applications_and_networks/internet _applications/yomoapp/  Use YoMoApp for your crowdsourced QoE study! 

slide-15
SLIDE 15

Chair of Communication Network, University of Würzburg

Institute of Computer Science Chair of Communication Networks

  • Prof. Dr.-Ing. P. Tran-Gia

QualiNet: Crowdsourcing TF

Matthias Hirth

slide-16
SLIDE 16

16 16

Joint Research Project Würzburg – Duisburg/Essen

 National DFG project Design and Evaluation of new mechanisms

for crowdsourcing as emerging paradigm for the organization of work in the Internet

 Research objectives

  • Enterprise crowdsourcing

– Processing of sensitive data – Trade-offs between internal and external crowdsourcing – Integration into day-to-day business

  • Mobile crowdsourcing

– Trade-offs between data quality and costs – Combination of crowd-based and fixed sensors – Task routing in mobile settings

  • Crowdsourced QoE

– (Cost-)optimal selection of test-stimuli – Dynamic adaptation of test setup

slide-17
SLIDE 17

17 17

Advertisement

 Focus

Combining objective measurements from IoT and subjective ratings from crowdsourcing users

 When

  • 31. July 2017 – 04. August 2017

 Where

Würzburg, Bavaria, Germany

 More information available at http://iotcrowd.com

slide-18
SLIDE 18

2016

  • Scalability
  • where to filter
  • qualification/price tradeoff
  • thousands of stimuli
  • Improving Accuracy
  • ACR vs. PC
  • mathematical models
  • IQA/VQA
  • different dimensions of quality
  • verification of algorithms
  • Bias removal
  • grounding MOS
  • objective anchors
  • crowd as a predictor
slide-19
SLIDE 19

ACR vs vs PC (VQA)

ACR

  • 8 videos
  • 400 ratings

PC

  • 8 videos
  • 400 ratings

Judgment Cost

0.75 ACR = 1 PC

slide-20
SLIDE 20

Perceptually-Guided Im Image Coding

Experiment A

  • Standard JPEG PC (10 bitrates)
  • 30 users per pair

Experiment B

  • Standard vs Saliency-Driven JPEG
  • 3 parameters
  • 50 users per pair

Δbitrate

subjective scores

F

task Δbap

apparent

avg bitrate improvement for best parameter settings

10% 10%

slide-21
SLIDE 21

Konstanz Natural Vid ideo Database

Conventional

  • small number of source sequences
  • little content diversity
  • artificial distortions
  • infeasible for blind VQA algorithms

KoNViD

  • YFCC100m baseline (~800000 videos)
  • filtering methodology
  • ensure naturalness of videos
  • maximize diversity across many

quality dimensions

slide-22
SLIDE 22

KoNViD-10k

  • expand to 10k videos
  • expand to more quality dimensions
  • evaluate using PC
  • new basis for all no-reference VQA CS efforts
slide-23
SLIDE 23

Crowdsourcing TF - Activities at TUB

  • Motivation of Crowdworkers
  • What influence motivation of workers, and how motivation

influence the performance of workers

  • What determines task selection strategies?
  • Speech quality assessment in Crowdsourcing
  • Influence of Environmental factors (Acoustics ’17)
  • Influence of Task design
  • Speech based trapping questions, temporal delays, task length
  • Voice likeability – Pairwise comparisons (CS vs Lab)
  • AI & CS: Translation services and extremism detection

23

slide-24
SLIDE 24

Agenda: Crowdsourcing TF

  • 1. Overview on recent activities of CS TF (Tobias, Babak, 10 min)
  • 2. Short research talks:

– Uni Würzburg: Michael Seufert, Matthias Hirth (5min) – Uni Konstanz: Dietmar Saupe, Vlad Hosu, Franz Hahn (5min) – TU Berlin: Babak Naderi (5min)

  • 3. Next steps: ITU-T experiments (Babak, 5min)

WG1 Research

slide-25
SLIDE 25

CS ITU-T Experiments

Aim: Recommendation on Speech Quality Assessment using Crowdsourcing; comparable to ITU-T P800 Step1: Replicating ACR lab experiment in CS

25

Screening job Anchoring job LOT

(Listening only test)

  • Demographics
  • Language test

(speech)

  • Device &

environment test

  • Hearing test
  • 8 Anchoring

stimuli (diff MOS)

  • X stimuli to rate
  • Speech trapping

question(s)

  • Environmental

check

  • n Temporal

Qualification Expires

slide-26
SLIDE 26

CS ITU-T Experiments

We consider different

– CS Platforms: Web based (MTurk, Microworkers, CrowdFlower), Mobile based (Crowdee, Clickworker) – Languages: EN, DE, … – Speech datasets: Speechpool of ITU-T SG12/Q9

  • TF Members involved: TU Berlin, UDE, Uni. Würzburg,
  • Uni. Konstanz, TU Delft
  • Participating in the new P.CROWD recommendation

(Study group 12, ITU-T)

26

slide-27
SLIDE 27

WG1 Research

Thank you

slide-28
SLIDE 28

WG1 Crowdsourcing TF: Contact Information

  • TF leader

– Tobias Hoßfeld (University of Duisburg-Essen) tobias.hossfeldf@uni-due.de – Babak Naderi (TU Berlin) Babak.Naderi@telekom.de

  • Wiki page

– https://www3.informatik.uni-wuerzburg.de/qoewiki/qualinet:crowd – Access to the wiki: contact Tobias and Babak

  • Mailing list

– Qualinet Mail-Reflector for “Crowdsourcing” cs.wg2.qualinet@listes.epfl.ch. – In order to subscribe in this list, you simply have to send an (empty) email to cs.wg2.qualinet-subscribe@listes.epfl.ch and follow the steps

  • f the e-mail being received. The instructions can also be

found http://listes.epfl.ch/doc.cgi?liste=cs.wg2.qualinet.

WG1 Research

slide-29
SLIDE 29

WG1 Research

More results

slide-30
SLIDE 30

30

On average 708,000 HITS were daily available in 2017 Q1.

Taken from https://www.mturk.com

slide-31
SLIDE 31

Task Selection Strategies

  • 1. What are the influencing factors for

workers to decide which task to take?

  • 2. How to measure them?
  • 3. Given the perceived factors, is it possible

to predict workers decision? Acceptance

  • 4. Is the Expected Workload, predictable

given the task data?

31

$

Enjoy Social P. Effort

Motives Obstacles

slide-32
SLIDE 32

Important Factors >Method

Would you perform the presented HIT, why?

Crowd: US, Master Payout: $0.05 + $0.05 Task: Rate 60 HITs

Results: 43% of responses contained a reason

Labeling:

– 34 different labels – Fleiss’ Kappa= 0.62

32

slide-33
SLIDE 33

Measure Workload >Method

How to measure expected workload of a HIT?

– Rating Scale Mental Effort (RSME)

One item estimation of the overall effort.

– NASA Task Load Index (TLX)

Six items: mental demand, physical demand, temporal demand, effort, performance, and frustration + Weighting

33 RSME NASA TLX

slide-34
SLIDE 34

Measure Workload >Method

Crowd: US, +98%, +500 Task: Rate 30 HITs x 10 times Results:

TLX ratings significantly correlated with the mean overall effort score (Pearson’s rTLX(29) = .92, p < 0.001).

 Using RSME

34

slide-35
SLIDE 35

Predict Acceptance >Method

Given the HIT ratings, predict if one will accept the job.

Dataset with 373 actual HITs were collected (meta-data, Preview screenshot, HTML code)

Crowd: US, +98%, +500 Payouts:

Introductory: $0.15 HIT Rating: $0.10 +$ 0.05

Task: Rate 401 HITs x 15 times

 6015 rating collected

(Gold Standard: 14 HITs were manipulated by size) 35

slide-36
SLIDE 36

Predict Acceptance >Results

Modeling: based on 72% of observations

Accuracy: 88.8%, Sensitivity: 85.5 %, Specificity: 91.1 %;

Testing:

Accuracy: 89.54% Sensitivity: 88.47 % Specificity:90.82 %

Using the mean ratings: Accuracy 69.9 %. 36

Predicted Mean EW

slide-37
SLIDE 37

Workload Prediction>Method Features:

– Syntax based: From HTML code; 27 features

e.g. # <a>,# <input type =radio/checkbox/text>,…

– Semantic based: From text; 6 features

e.g. # words, # of sentences, rareness score,…

– Visual Features: From screenshot based

  • n Saliency Map and OCR (13 features)

37

slide-38
SLIDE 38

Workload Prediction>Results

Important features: HIT Type, 100 %, 98 %, 75 %, and 95% quartiles, Sub-clauses, Rewards, # unique stems, and # words.

38

slide-39
SLIDE 39

Workload Prediction>Results

Using the Predicted Expected Workload: Accuracy: 77.21 % Sensitivity: 77.71 % Specificity: 76.77 %

Comparing to individual ratings: 12.33% decrease of the accuracy

39

Predicted Mean EW Predicted Profit