Best of both worlds: Human-machine collaboration for object - - PowerPoint PPT Presentation

best of both worlds human machine collaboration for
SMART_READER_LITE
LIVE PREVIEW

Best of both worlds: Human-machine collaboration for object - - PowerPoint PPT Presentation

Best of both worlds: Human-machine collaboration for object annotation Fei-Fei Li Olga Russakovsky Li-Jia Li (Stanford U.) (Stanford U.) (Snapchat) CVPR 2015 Backpack Strawberry Flute Traffic light Backpack Matchstick Bathing cap


slide-1
SLIDE 1

Best of both worlds: Human-machine collaboration for

  • bject annotation

Fei-Fei Li (Stanford U.) Li-Jia Li (Snapchat) Olga Russakovsky (Stanford U.)

CVPR 2015

slide-2
SLIDE 2

Backpack

slide-3
SLIDE 3

Backpack Flute Strawberry Traffic light Bathing cap Matchstick Racket Sea lion

slide-4
SLIDE 4

Large-scale recognition

slide-5
SLIDE 5

Large-scale recognition

Need benchmark datasets

slide-6
SLIDE 6

PASCAL VOC 2005-2012

Classification: person, motorcycle Detection Segmentation

Person Motorcycle

Action: riding bicycle

Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.

20 object classes 22,591 images

slide-7
SLIDE 7

Large Scale Visual

Recognition Challenge (ILSVRC) 2010-2014

20 object classes 22,591 images 200 object classes 517,840 images DET 1000 object classes 1,431,167 images CLS-LOC

Person

http://image-net.org/challenges/LSVRC/

Person

Dog

Person Person
slide-8
SLIDE 8

ILSVRC types of image annotations

Steel drum

Image classification

  • ne object class per

image

  • no bounding boxes

1,000 object classes 1,331,167 images

$

slide-9
SLIDE 9

ILSVRC types of image annotations

Steel drum

Image classification

  • ne object class per

image

  • no bounding boxes

Single-object localization

  • ne object class per image
  • bounding boxes around all

instances of this class Steel drum 1,000 object classes 1,331,167 images 1,000 object classes 573,966 images 657,231 bounding boxes

$

$$

slide-10
SLIDE 10

ILSVRC types of image annotations

Steel drum

Image classification

  • ne object class per

image

  • no bounding boxes

Single-object localization

  • ne object class per image
  • bounding boxes around all

instances of this class Steel drum

Object detection

  • all target object classes
  • bounding boxes around

all instances 1,000 object classes 1,331,167 images 1,000 object classes 573,966 images 657,231 bounding boxes 200 object classes 81,799 images 228,981 bounding boxes

$

$$

$$$

Person Car Motorcycle Helmet

slide-11
SLIDE 11

Q: How good is scene understanding with ILSVRC?

slide-12
SLIDE 12

An unknown image

Q: How good is scene understanding with ILSVRC?

slide-13
SLIDE 13

Table ILSVRC image classification:

Q: How good is scene understanding with ILSVRC?

slide-14
SLIDE 14

Table ILSVRC single-object localization:

Q: How good is scene understanding with ILSVRC?

slide-15
SLIDE 15

ILSVRC object detection: state-of-the-art output (removing wrong detections) Person Person Table Table TV Backpack

Q: How good is scene understanding with ILSVRC?

slide-16
SLIDE 16

Person Person Table Table TV Backpack

Cup Cup Cup Table Couch Couch Potted Plant Potted Plant Lamp Potted Plant Tapeplayer

ILSVRC object detection: all instances of the 200 target objects

Lamp

Q: How good is scene understanding with ILSVRC?

slide-17
SLIDE 17

One unsolved question:
 What would it take to recognize all the objects here?

slide-18
SLIDE 18

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects

slide-19
SLIDE 19

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

slide-20
SLIDE 20

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

slide-21
SLIDE 21

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

Crowd engineering is improving

slide-22
SLIDE 22

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

Crowd engineering is improving

Humans need short, focused annotation tasks

Data

slide-23
SLIDE 23

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

Crowd engineering is improving

Object detectors are improving

slide-24
SLIDE 24

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

Crowd engineering is improving

Object detectors are improving

Object detectors are reasonably accurate on some classes

Algorithms

slide-25
SLIDE 25

Cost

The accuracy/cost tradeoff

Label quantity and quality per image

Dense manual annotation High accuracy Huge cost Many objects Fully automatic object detection Low cost Low accuracy Few objects

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Crowd engineering is improving Object detectors are improving

slide-26
SLIDE 26

Human-machine collaboration
 for object annotation

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-27
SLIDE 27

Human-machine collaboration
 for object annotation

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints

slide-28
SLIDE 28

Bed (0.5) Pillow (0.8)

Detections

For every box B, class C: P(det(B,C) | Image)

Human-machine collaboration
 for object annotation

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints

slide-29
SLIDE 29

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

Human-machine collaboration
 for object annotation

Bed (0.5) Pillow (0.8)

Detections

For every box B, class C: P(det(B,C) | Image)

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Solicit feedback

slide-30
SLIDE 30

Human-machine collaboration
 for object annotation

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Update state Solicit feedback

slide-31
SLIDE 31

Human-machine collaboration
 for object annotation

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Output detections Update state Solicit feedback

slide-32
SLIDE 32

HCI in computer vision

Human-machine collaboration
 for object annotation

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Output detections Update state Solicit feedback

Branson ECCV2010 Jain ICCV2013 Kovashka ICCV2011 Vondrick IJCV 2013 Wah ICCV2011 Wah CVPR2014 Parkash ECCV2012 Vijayanarasimhan IJCV2014 Biswas CVPR2013 Branson CVPR2014

slide-33
SLIDE 33

Computer

Some qualitative results

Object Detection

...

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-34
SLIDE 34

Computer Computer Human

Some qualitative results

Object Detection

...

Verify-box: Is the yellow box tight around a car Answer: No

...

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-35
SLIDE 35

Computer Computer Human

Some qualitative results

Object Detection

...

Verify-box: Is the yellow box tight around a car Answer: No

...

Computer Human

... ...

Draw-box: Draw a box around a person

Answer: Yellow box below

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-36
SLIDE 36

Computer Computer Human

Some qualitative results

Object Detection

...

Verify-box: Is the yellow box tight around a car Answer: No

...

Computer

...

Car Person

Final Labeling

Computer Human

... ...

Draw-box: Draw a box around a person

Answer: Yellow box below

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-37
SLIDE 37

Human-machine collaboration
 for object annotation

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Output detections Update state Solicit feedback

slide-38
SLIDE 38

Human-machine collaboration
 for object annotation

Input image and constraints

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

Solicit feedback

Output detections Update state

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-39
SLIDE 39

What question to ask?

Current estimates

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Decide which question to ask

  • ut of (infinitely) many options

Bed (0.5) Pillow (0.8)

slide-40
SLIDE 40

What question to ask?

Current estimates

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Decide which question to ask

  • ut of (infinitely) many options

… User answers (A) Update estimates depending on: User answers (B) User answers (C)

  • r
  • r
  • r

Bed (0.5) Pillow (0.8)

slide-41
SLIDE 41

What question to ask?

Current estimates

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Decide which question to ask

  • ut of (infinitely) many options

… User answers (A) Update estimates depending on: User answers (B) User answers (C)

  • r
  • r
  • r

Bed (0.5) Pillow (0.8)

State

slide-42
SLIDE 42

What question to ask?

Current estimates

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Decide which question to ask

  • ut of (infinitely) many options

… User answers (A) Update estimates depending on: User answers (B) User answers (C)

  • r
  • r
  • r

Bed (0.5) Pillow (0.8)

State Need to decide

  • n an action
slide-43
SLIDE 43

What question to ask?

Current estimates

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Decide which question to ask

  • ut of (infinitely) many options

… User answers (A) Update estimates depending on: User answers (B) User answers (C)

  • r
  • r
  • r

Bed (0.5) Pillow (0.8)

State Need to decide

  • n an action

Probabilistic transitions to new states

slide-44
SLIDE 44

Model: Markov Decision Process (MDP)

State s1 State s2 State s3 State s4

POMDP in vision Karayev CVPR2014, sensor placement Vaisenberg PMC2013, HCI Dai AAAI2010, Kamar AAMAS2012

slide-45
SLIDE 45

Model: Markov Decision Process (MDP)

State s1 State s2 State s3

Probability P(s1, a1, s2) R e w a r d R ( s

1

, a

1

, s

2

)

Action a1

Reward R(s1, a1, s3)

State s4

Probability P(s1, a1, s3)

POMDP in vision Karayev CVPR2014, sensor placement Vaisenberg PMC2013, HCI Dai AAAI2010, Kamar AAMAS2012

slide-46
SLIDE 46

Model: Markov Decision Process (MDP)

State s1 State s2 State s3

Probability P(s1, a1, s2) R e w a r d R ( s

1

, a

1

, s

2

)

Action a1 Action a2

P r

  • b

a b i l i t y P ( s1 , a2 , s2 ) Reward R(s1, a1, s3)

State s4

Probability P(s1, a1, s3) P r

  • b

a b i l i t y P ( s

1

, a

2

, s

4

) Reward P(s1,a2,s2) Reward P(s1,a2,s4)

POMDP in vision Karayev CVPR2014, sensor placement Vaisenberg PMC2013, HCI Dai AAAI2010, Kamar AAMAS2012

slide-47
SLIDE 47

Model: Markov Decision Process (MDP)

State s1 State s2 State s3

Probability P(s1, a1, s2) R e w a r d R ( s

1

, a

1

, s

2

)

Action a1 Action a2

P r

  • b

a b i l i t y P ( s1 , a2 , s2 ) Reward R(s1, a1, s3)

State s4

Probability P(s1, a1, s3) P r

  • b

a b i l i t y P ( s

1

, a

2

, s

4

) Reward P(s1,a2,s2) Reward P(s1,a2,s4)

POMDP in vision Karayev CVPR2014, sensor placement Vaisenberg PMC2013, HCI Dai AAAI2010, Kamar AAMAS2012

slide-48
SLIDE 48

State: set of object detections, with probabilities

Model: Markov Decision Process (MDP)

Bed (0.6) Pillow (0.9)

Computer+human

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-49
SLIDE 49

State: set of object detections, with probabilities Action: a question to ask humans

Model: Markov Decision Process (MDP)

1) Is there a fan?

Cost: 5.34 sec Error rates: .13/.02

2) Is this a bed?

Cost: 5.89 sec Error rates: .23/.07

3) Is this an object?

Cost: 5.71 sec Error rates: .29/.04

4) Name this object.

Cost: 9.67 sec Error rates: .25/.08/.06

5) Are there more pillows?

Cost: 7.57 sec Error rates: .25/.26

6) Outline another bed, if any.

Cost: 10.21 sec Error rates: .28/.16/.29

7) Name another object: pillow, bed, what else?

Cost: 9.46 sec Error rates: .02/.12/.05

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-50
SLIDE 50

State: set of object detections, with probabilities Action: a question to ask humans Transition probability: probability distribution over user responses

Model: Markov Decision Process (MDP)

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-51
SLIDE 51

State: set of object detections, with probabilities Action: a question to ask humans Transition probability: probability distribution over user responses Reward: increase in estimated quality of labeling divided by the cost of actions

Model: Markov Decision Process (MDP)

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-52
SLIDE 52

State: set of object detections, with probabilities Action: a question to ask humans Transition probability: probability distribution over user responses Reward: increase in estimated quality of labeling divided by the cost of actions Algorithm: 2-step lookahead search

Model: Markov Decision Process (MDP)

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-53
SLIDE 53

State: set of object detections, with probabilities Action: a question to ask humans Transition probability: probability distribution over user responses Reward: increase in estimated quality of labeling divided by the cost of actions Algorithm: 2-step lookahead search

Model: Markov Decision Process (MDP)

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

slide-54
SLIDE 54

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-55
SLIDE 55

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-56
SLIDE 56

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-57
SLIDE 57

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Simplifying assumptions of [Branson ECCV10]: user’s answer is independent of (1) other users, and (2) image appearance Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-58
SLIDE 58

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Simplifying assumptions of [Branson ECCV10]: user’s answer is independent of (1) other users, and (2) image appearance

Precomputed error rates

Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-59
SLIDE 59

Computing the transition probability

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Precomputed error rates Current estimate of the correct answer

Simplifying assumptions of [Branson ECCV10]: user’s answer is independent of (1) other users, and (2) image appearance Given:

  • An action/question A (e.g., “is there a fan in this image?”)
  • Possible truths T1, T2, … (e.g., T1 = “there is a fan”, T2 = “there is no fan”)
  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of user answer u (e.g., u = user says “yes”)
slide-60
SLIDE 60

Computing the correct answer

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Simplifying assumptions of [Branson ECCV10]: user’s answer is independent of (1) other users, and (2) image appearance Given:

  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of truth T (e.g., T = there is a fan in the image)

Current estimate of the correct answer Precomputed error rates Computer vision model

Number of users

slide-61
SLIDE 61

Computing the correct answer

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Simplifying assumptions of [Branson ECCV10]: user’s answer is independent of (1) other users, and (2) image appearance Given:

  • Image appearance I and all user responses so far U

Goal:

  • Compute the probability of truth T (e.g., T = there is a fan in the image)

Current estimate of the correct answer

Precomputed error rates

Computer vision model

Number of users

slide-62
SLIDE 62

Bed (0.6) Objects in image: curtains (prob 0.7), fan (0,3), plant (0.8), cow (0.1), …

An object (0.9)

Another bed in image (0.2) Another pillow in image (0.9) Pillow (0.9)

An object (0.1)

Computer+human

Image classifiers: 200-way CNN classifiers released with LSDA Probabilities from Platt scaling [Hoffman NIPS14, Yangqing Jia’s Caffe, Platt99] Object detectors: 200 object RCNN detectors + Platt scaling [Girshick CVPR14, Yangqing Jia’s Caffe, Platt99] Probability of object in region: Objectness measure [Alexe PAMI2012] Probability of another instance of same class, probability of another class in image: Statistics from ILSVRC2014 val-DET data

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Multiple computer vision models

slide-63
SLIDE 63

Human-machine collaboration
 for object annotation

Bed (0.6) Pillow (0.9)

Detections

For every box B, class C: P(det(B,C) | Image, User input)

Multiple types of human input

Is this an object? Is there a fan? Is this a bed? Name this object Outline another bed, if any Are there more pillows? Name another

  • bject: pillow,

bed, what else?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Input image and constraints Output detections Update state Solicit feedback

slide-64
SLIDE 64

2K images of ILSVRC2014 detection val set with at least 4 object instances Human error rates computed from AMT experiments Annotation experiments in simulation

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Results

slide-65
SLIDE 65

Computer vision only

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Results

2K images of ILSVRC2014 detection val set with at least 4 object instances Human error rates computed from AMT experiments Annotation experiments in simulation

slide-66
SLIDE 66

Computer vision only

F u l l m

  • d

e l : c

  • m

p u t e r v i s i

  • n

+ a l l h u m a n q u e s t i

  • n

s

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Results

2K images of ILSVRC2014 detection val set with at least 4 object instances Human error rates computed from AMT experiments Annotation experiments in simulation

slide-67
SLIDE 67

Computer vision only

Only human F u l l m

  • d

e l : c

  • m

p u t e r v i s i

  • n

+ a l l h u m a n q u e s t i

  • n

s

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Results

2K images of ILSVRC2014 detection val set with at least 4 object instances Human error rates computed from AMT experiments Annotation experiments in simulation

slide-68
SLIDE 68

Computer vision only

Only human F u l l m

  • d

e l : c

  • m

p u t e r v i s i

  • n

+ a l l h u m a n q u e s t i

  • n

s

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Takeaways 1) CV and humans are mutually beneficial

slide-69
SLIDE 69

O n l y H u m a n Full model

Computer vision only

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Takeaways 1) CV and humans are mutually beneficial

slide-70
SLIDE 70

O n l y H u m a n Full model

Computer vision only

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Takeaways 1) CV and humans are mutually beneficial 2) CV models are not perfectly calibrated

slide-71
SLIDE 71

O n l y H u m a n Full model

CV + binary questions Computer vision only

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Takeaways 1) CV and humans are mutually beneficial 2) CV models are not perfectly calibrated 3) Complex human tasks are necessary

slide-72
SLIDE 72

O n l y H u m a n Full model

R a n d

  • m
  • r

d e r

  • f

q u e s t i

  • n

s CV + binary questions Computer vision only

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Takeaways 1) CV and humans are mutually beneficial 2) CV models are not perfectly calibrated 3) Complex human tasks are necessary 4) An MDP is effective for selecting tasks

slide-73
SLIDE 73

O n l y H u m a n Full model

CV + binary questions Computer vision only

ILSVRC annotation

Takeaways 1) CV and humans are mutually beneficial 2) CV models are not perfectly calibrated 3) Complex human tasks are necessary 4) An MDP is effective for selecting tasks 5) More efficient than ILSVRC annotation

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

R a n d

  • m
  • r

d e r

  • f

q u e s t i

  • n

s

slide-74
SLIDE 74

What if humans were better?

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Current error rates

slide-75
SLIDE 75

2x higher error rates

Current error rates

2x lower error rates 8 x l

  • w

e r e r r

  • r

r a t e

O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

What if humans were better?