[PPT] - Two Tools are Better Than One: Tool Diversity as a Means of PowerPoint Presentation

SLIDE 1

IUI 2018

Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance

JEAN Y. SONG, RAYMOND FOK, ALAN LUNDGARD, FAN YANG, JUHO KIM, WALTER S. LASECKI

Michigan Interactive and Social Computing Group

SLIDE 2

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Platforms

2

SLIDE 3

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing for Human Computation

3

https://playment.io/ https://www.crowdguru.de/en/

SLIDE 4

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Microtasking

Task

Divide Microtasks

…

4

SLIDE 5

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Aggregation

5

Task

Divide Microtasks

…

Aggregate multiple answers

× 2 × 2 × 2 × 2 × 2

SLIDE 6

CROMA LAB & KIXLAB | IUI 2018

Crowdsourcing Strategy: Using Single Tool

Task

Divide Microtasks

…

6

Same tool or interface Aggregate multiple answers

× 2 × 2 × 2 × 2 × 2

SLIDE 7

Problem with using a single tool: Systematic bias can be accumulated, resulting in inaccurate aggregated result.

SLIDE 8

CROMA LAB & KIXLAB | IUI 2018

Q. What is Systematic Bias?

8

A. Reliable, but not valid performance

Reliable, not Valid Not Reliable, But Valid Not Reliable, not Valid Reliable, Valid

SLIDE 9

CROMA LAB & KIXLAB | IUI 2018

9

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

SLIDE 10

CROMA LAB & KIXLAB | IUI 2018

10

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

SLIDE 11

CROMA LAB & KIXLAB | IUI 2018

11

Example of Systematic (Error) Bias

Tool 1: Opensurfaces (TOG 2013)

Bell, Sean, et al. "Opensurfaces: A richly annotated catalog of surface appearance." ACM Transactions on Graphics (TOG)32.4 (2013): 111.

Tool 2: Click’n’Cut (CrowdMM 2014)

Carlier, Axel, et al. "Click'n'Cut: Crowdsourced interactive segmentation with object candidates." International ACM Workshop on Crowdsourcing for Multimedia. 2014.

SLIDE 12

Proposed Approach: Use tool diversity as a means of improving aggregate crowd performance

SLIDE 13

What is Tool Diversity? A property that measures how different tools can be built in terms of their induced biases.

SLIDE 14

CROMA LAB & KIXLAB | IUI 2018

14

Analogy to Ensemble Learning

Space of hypotheses

h4
h2
f

h3 w2 w1

h1

f : best performing hypothesis hi : other hypotheses wi : weights

Ensemble learning constructs a combination of two alternative hypotheses h1 and h2 with proper weights (w1 and w2), and approximates the best hypothesis f by averaging the two.

SLIDE 15

CROMA LAB & KIXLAB | IUI 2018

Proposed Method: Leverage Tool Diversity

15

Task

Divide

…

Microtasks Diff tools

× 2 × 2 × 2 × 2 × 2

SLIDE 16

CROMA LAB & KIXLAB | IUI 2018

Proposed Method: Leverage Tool Diversity

16

Task

Divide Microtasks

…

Diff tools

× 2 × 2 × 2 × 2 × 2

Semantic image segmentation task

SLIDE 17

CROMA LAB & KIXLAB | IUI 2018

Choosing the Tools

17

Q. How to diversify errors produced by different tool types?

SLIDE 18

CROMA LAB & KIXLAB | IUI 2018

Choosing the Tools

18

Q. How to diversify errors produced by different tool types?
Q. What are different types of objects?
A. General objects, Fuzzy materials, plants, furry objects,

transparent objects, reflective surfaces (intuitive, deformability)

T1 T2 T3 T4

SLIDE 19

CROMA LAB & KIXLAB | IUI 2018

Instructions and Worker Interface

19

Worker Interface :

SLIDE 20

CROMA LAB & KIXLAB | IUI 2018

Instructions and Worker Interface

20

Instructions :

SLIDE 21

CROMA LAB & KIXLAB | IUI 2018

Experiment Settings

12 different visual scenes
Total 51 objects
Six unique workers for each tool-scene pair (total 288+ workers)
Total 1224 object segmentations
Platform: Amazon Mechanical Turk

Each worker was paid between $0.35 and $0.60 per task, depending on the number of objects they had to segment or on the level of difficulty of given tool (a pay rate of ~$10/hr).

21

SLIDE 22

Results & Discussion

SLIDE 23

CROMA LAB & KIXLAB | IUI 2018

Performance of Individual Tools

23

SLIDE 24

CROMA LAB & KIXLAB | IUI 2018

Performance of Individual Tools

24

SLIDE 25

CROMA LAB & KIXLAB | IUI 2018

What we observed

25

SLIDE 26

CROMA LAB & KIXLAB | IUI 2018

Some of the Answers from Workers

26

SLIDE 27

How can we see the effect of leveraging tool diversity?

SLIDE 28

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

28

Method 1. Single tool aggregation (Uniform majority voting): Baseline T1 T2 → Aggregate → Aggregate

SLIDE 29

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

29

Method 2. Multiple tool aggregation (Uniform majority voting) T1 x T2 → Aggregate Method 3. Multiple tool aggregation (Expectation maximization) T1 x T2 → Aggregate w w w w w1 w2 w3 w4

SLIDE 30

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

30

SLIDE 31

CROMA LAB & KIXLAB | IUI 2018

Comparison of Aggregation Methods

31

High recall High precision

High recall + high precision pairs gave the highest performance improvement.

SLIDE 32

Generalization

SLIDE 33

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Expected Human Error is Diverse

33

Tool 1 Tool 2 Aggregate

Reliable, Valid

SLIDE 34

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Aggregation Improves Quality

34

Quality Improves

SLIDE 35

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Objective Correct Answer Exists

35

Task with subjective answers: Creative writing Tasks with objective answers: Image segmentation Live captioning Text annotation Handwriting recognition

SLIDE 36

CROMA LAB & KIXLAB | IUI 2018

Generalizability: Tolerates Imperfections

36

Example: Scribe (UIST 2012)

W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2012.

SLIDE 37

CROMA LAB & KIXLAB | IUI 2018

Possible Future Applications

37

Application1: Tagging Long Videos Application2: Multichannel NLP Application3: Complex/Diverse Annotation Application4: Computer-Human Integration

Context Granularity Text Audio Higher level Lower level Precision Recall

SLIDE 38

Thank you!

Authors: Jean Y. Song (jyskwon@umich.edu / jyskwon.github.io), Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, Walter S. Lasecki Funding: Denso Corporation Toyota Research Institute MCity at the University of Michigan National Research Foundation of Korea

Michigan Interactive and Social Computing Group

SLIDE 39

Backup Slides

SLIDE 40

CROMA LAB & KIXLAB | IUI 2018

Tool 1

40

SLIDE 41

CROMA LAB & KIXLAB | IUI 2018

Tool 2

41

SLIDE 42

CROMA LAB & KIXLAB | IUI 2018

Tool 3

42

SLIDE 43

CROMA LAB & KIXLAB | IUI 2018

Tool 4

43

SLIDE 44

CROMA LAB & KIXLAB | IUI 2018

Pixel-Level Majority Voting (50% agreement)

44

Worker 1 Worker 2 Worker 3 Worker 4 Aggregate Final answer

SLIDE 45

CROMA LAB & KIXLAB | IUI 2018

In an image, label a pixel as 1 if it belongs to a target object, and 0 if background. Assume:

image A having N total pixels
M crowd workers
The label a worker m assigns to each pixel is denoted as zmn
all labels from worker m as a vector Zm
the true labels of A to be estimated are denoted as a vector Y
ፀ is the confusion matrices set to be estimated.

We can estimate the true labels Y by maximizing the marginal likelihood of the observed worker labels: The EM algorithm works iteratively by applying the 1) expectation step and the 2) maximization step.

Expectation Maximization (Dawid-Skene Algorithm)

45