Two Tools are Better Than One: Tool Diversity as a Means of - - PowerPoint PPT Presentation

two tools are better than one tool diversity as a means
SMART_READER_LITE
LIVE PREVIEW

Two Tools are Better Than One: Tool Diversity as a Means of - - PowerPoint PPT Presentation

IUI 2018 Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance J EAN Y. S ONG , R AYMOND F OK , A LAN L UNDGARD , F AN Y ANG , J UHO K IM , W ALTER S. L ASECKI Michigan Interactive and Social


slide-1
SLIDE 1

IUI 2018

Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance

JEAN Y. SONG, RAYMOND FOK, ALAN LUNDGARD, FAN YANG, JUHO KIM, WALTER S. LASECKI

Michigan Interactive and Social Computing Group

slide-2
SLIDE 2

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Platforms

2

slide-3
SLIDE 3

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing for Human Computation

3

https://playment.io/ https://www.crowdguru.de/en/

slide-4
SLIDE 4

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Microtasking

Task

Divide Microtasks

4

slide-5
SLIDE 5

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Aggregation

5

Task

Divide Microtasks

Aggregate multiple answers

× 2 × 2 × 2 × 2 × 2

slide-6
SLIDE 6

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Using Single Tool

Task

Divide Microtasks

6

Same tool or interface Aggregate multiple answers

× 2 × 2 × 2 × 2 × 2

slide-7
SLIDE 7

Problem with using a single tool: Systematic bias can be accumulated, resulting in inaccurate aggregated result.

slide-8
SLIDE 8

CROMA LAB & KIXLAB | IUI 2018

  • Q. What is Systematic Bias?

8

  • A. Reliable, but not valid performance

Reliable, not Valid Not Reliable, But Valid Not Reliable, not Valid Reliable, Valid

slide-9
SLIDE 9

CROMA LAB & KIXLAB | IUI 2018

9

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

slide-10
SLIDE 10

CROMA LAB & KIXLAB | IUI 2018

10

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

slide-11
SLIDE 11

CROMA LAB & KIXLAB | IUI 2018

11

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

slide-12
SLIDE 12

Proposed Approach: Use tool diversity as a means of improving aggregate crowd performance

slide-13
SLIDE 13

What is Tool Diversity? A property that measures how different tools can be built in terms of their induced biases.

slide-14
SLIDE 14

CROMA LAB & KIXLAB | IUI 2018

14

Analogy to Ensemble Learning

Space of hypotheses

  • h4
  • h2
  • f

h3 w2 w1

  • h1

f : best performing hypothesis hi : other hypotheses wi : weights

Ensemble learning constructs a combination of two alternative hypotheses h1 and h2 with proper weights (w1 and w2), and approximates the best hypothesis f by averaging the two.

slide-15
SLIDE 15

CROMA LAB & KIXLAB | IUI 2018

Proposed Method: Leverage Tool Diversity

15

Task

Divide

Microtasks Diff tools

× 2 × 2 × 2 × 2 × 2

slide-16
SLIDE 16

CROMA LAB & KIXLAB | IUI 2018

Proposed Method: Leverage Tool Diversity

16

Task

Divide Microtasks

Diff tools

× 2 × 2 × 2 × 2 × 2

Semantic image segmentation task

slide-17
SLIDE 17

CROMA LAB & KIXLAB | IUI 2018

Choosing the Tools

17

  • Q. How to diversify errors produced by different tool types?
slide-18
SLIDE 18

CROMA LAB & KIXLAB | IUI 2018

Choosing the Tools

18

  • Q. How to diversify errors produced by different tool types?
  • Q. What are different types of objects?
  • A. General objects, Fuzzy materials, plants, furry objects,

transparent objects, reflective surfaces (intuitive, deformability)

T1 T2 T3 T4

slide-19
SLIDE 19

CROMA LAB & KIXLAB | IUI 2018

Instructions and Worker Interface

19

Worker Interface :

slide-20
SLIDE 20

CROMA LAB & KIXLAB | IUI 2018

Instructions and Worker Interface

20

Instructions :

slide-21
SLIDE 21

CROMA LAB & KIXLAB | IUI 2018

Experiment Settings

  • 12 different visual scenes
  • Total 51 objects
  • Six unique workers for each tool-scene pair (total 288+ workers)
  • Total 1224 object segmentations
  • Platform: Amazon Mechanical Turk

Each worker was paid between $0.35 and $0.60 per task, depending on the number of objects they had to segment or on the level of difficulty of given tool (a pay rate of ~$10/hr).

21

slide-22
SLIDE 22

Results & Discussion

slide-23
SLIDE 23

CROMA LAB & KIXLAB | IUI 2018

Performance of Individual Tools

23

slide-24
SLIDE 24

CROMA LAB & KIXLAB | IUI 2018

Performance of Individual Tools

24

slide-25
SLIDE 25

CROMA LAB & KIXLAB | IUI 2018

What we observed

25

slide-26
SLIDE 26

CROMA LAB & KIXLAB | IUI 2018

Some of the Answers from Workers

26

slide-27
SLIDE 27

How can we see the effect of leveraging tool diversity?

slide-28
SLIDE 28

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

28

Method 1. Single tool aggregation (Uniform majority voting): Baseline T1 T2 → Aggregate → Aggregate

slide-29
SLIDE 29

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

29

Method 2. Multiple tool aggregation (Uniform majority voting) T1 x T2 → Aggregate Method 3. Multiple tool aggregation (Expectation maximization) T1 x T2 → Aggregate w w w w w1 w2 w3 w4

slide-30
SLIDE 30

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

30

slide-31
SLIDE 31

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

31

High recall High precision

High recall + high precision pairs gave the highest performance improvement.

slide-32
SLIDE 32

Generalization

slide-33
SLIDE 33

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Expected Human Error is Diverse

33

Tool 1 Tool 2 Aggregate

Reliable, Valid

slide-34
SLIDE 34

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Aggregation Improves Quality

34

Quality Improves

slide-35
SLIDE 35

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Objective Correct Answer Exists

35

Task with subjective answers: Creative writing Tasks with objective answers: Image segmentation Live captioning Text annotation Handwriting recognition

slide-36
SLIDE 36

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Tolerates Imperfections

36

Example: Scribe (UIST 2012)

W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2012.

slide-37
SLIDE 37

CROMA LAB & KIXLAB | IUI 2018

Possible Future Applications

37

Application1: Tagging Long Videos Application2: Multichannel NLP Application3: Complex/Diverse Annotation Application4: Computer-Human Integration

Context Granularity Text Audio Higher level Lower level Precision Recall

slide-38
SLIDE 38

Thank you!

Authors: Jean Y. Song (jyskwon@umich.edu / jyskwon.github.io), Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, Walter S. Lasecki Funding: Denso Corporation Toyota Research Institute MCity at the University of Michigan National Research Foundation of Korea

Michigan Interactive and Social Computing Group

slide-39
SLIDE 39

Backup Slides

slide-40
SLIDE 40

CROMA LAB & KIXLAB | IUI 2018

Tool 1

40

slide-41
SLIDE 41

CROMA LAB & KIXLAB | IUI 2018

Tool 2

41

slide-42
SLIDE 42

CROMA LAB & KIXLAB | IUI 2018

Tool 3

42

slide-43
SLIDE 43

CROMA LAB & KIXLAB | IUI 2018

Tool 4

43

slide-44
SLIDE 44

CROMA LAB & KIXLAB | IUI 2018

Pixel-Level Majority Voting (50% agreement)

44

Worker 1 Worker 2 Worker 3 Worker 4 Aggregate Final answer

slide-45
SLIDE 45

CROMA LAB & KIXLAB | IUI 2018

In an image, label a pixel as 1 if it belongs to a target object, and 0 if background. Assume:

  • image A having N total pixels
  • M crowd workers
  • The label a worker m assigns to each pixel is denoted as zmn
  • all labels from worker m as a vector Zm
  • the true labels of A to be estimated are denoted as a vector Y
  • ፀ is the confusion matrices set to be estimated.

We can estimate the true labels Y by maximizing the marginal likelihood of the observed worker labels: The EM algorithm works iteratively by applying the 1) expectation step and the 2) maximization step.

Expectation Maximization (Dawid-Skene Algorithm)

45