The Multidimensional Wisdom of Crowds Welinder P., Branson S., - - PowerPoint PPT Presentation

the multidimensional wisdom of crowds
SMART_READER_LITE
LIVE PREVIEW

The Multidimensional Wisdom of Crowds Welinder P., Branson S., - - PowerPoint PPT Presentation

The Multidimensional Wisdom of Crowds Welinder P., Branson S., Belongie S., Perona, P Experiment Presentation [CS395T] Visual Recognition Fall 2012 Presented by: Niveda Krishnamoorthy Problem Overview Slides from


slide-1
SLIDE 1

The Multidimensional Wisdom of Crowds

Welinder P., Branson S., Belongie S., Perona, P Experiment Presentation [CS395T] Visual Recognition Fall 2012 Presented by: Niveda Krishnamoorthy

slide-2
SLIDE 2

Problem Overview

slide-3
SLIDE 3

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-4
SLIDE 4

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-5
SLIDE 5

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-6
SLIDE 6

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-7
SLIDE 7

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-8
SLIDE 8

Motivation

slide-9
SLIDE 9

Rate of correct detection Rate of correct rejection

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

slide-10
SLIDE 10

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

50% error Bots Rate of correct detection Rate of correct rejection

slide-11
SLIDE 11

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

50% error Competent Rate of correct detection Rate of correct rejection

slide-12
SLIDE 12

Fraction of True Positives Fraction of True Negatives

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

50% error Pessimistic

slide-13
SLIDE 13

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

50% error Optimistic Rate of correct detection Rate of correct rejection

slide-14
SLIDE 14

Distribution of Human Expertise – Task: Finding bluebirds Experiment #1

50% error Adversarial Rate of correct detection Rate of correct rejection

slide-15
SLIDE 15

The Idea

slide-16
SLIDE 16

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-17
SLIDE 17

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-18
SLIDE 18

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-19
SLIDE 19

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-20
SLIDE 20

Error Rate for Bluebirds dataset

slide-21
SLIDE 21

Estimating Image Difficulty

Complex Images

slide-22
SLIDE 22

1D clusters from learned Xi values Dataset: Bluebirds

slide-23
SLIDE 23

1D clusters from learned Xi values Dataset: Bluebirds

slide-24
SLIDE 24

How do these learned image complexities compare with vision-based techniques?

Vision-based measure:

Predicted time* to label an image as a measure of image complexity

*What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations. S. Vijayanarasimhan and

  • K. Grauman. CVPR 2009

Approach: Extract 2804-d feature vectors for MSRC dataset

  • Pyramid of HoG
  • Color histogram
  • Grayscale histogram
  • Spatial pyramid of edge density (Canny edge)

Train a regressor on top 200 features selected using ReliefF Predict time to label images for bluebirds dataset

slide-25
SLIDE 25

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity Predicted time to label image

slide-26
SLIDE 26

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity Predicted time to label image Less time to label images that are at the ends

slide-27
SLIDE 27

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity Predicted time to label image Longer time to label images towards the center

slide-28
SLIDE 28

Qualitative Comparison

slide-29
SLIDE 29

Complex Images – Examples False negatives

  • 0.1729917846
  • 0.032638129
  • 0.2354540127
  • 0.3173699459
  • 0.4584787866
  • 0.9812171195
slide-30
SLIDE 30

Complex Images – Examples False positive

0.1455767129 0.0405159085 0.1051552874 0.2087033725 0.5611137687 0.478586944 0.02178887

slide-31
SLIDE 31

Easy Images – Examples True negatives

  • 1.339038233
  • 1.1632084806
  • 1.7174006439
  • 1.4096763371
  • 1.214442414
  • 1.7104861404
  • 1.4764330722
slide-32
SLIDE 32

Easy Images – Examples True positives

1.1407150551 1.0859884692 1.1461191967 1.0293133893 1.077287623

slide-33
SLIDE 33

Task: Finding ducks

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-34
SLIDE 34

Slides from http://videolectures.net/nips2010_welinder_mwc/

slide-35
SLIDE 35

2D clusters from learned Xi values Dataset: Ducks

Recreated in MATLAB

slide-36
SLIDE 36

Vision-based complexity (vs) Learned Image Complexity

Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

slide-37
SLIDE 37

Vision-based complexity (vs) Learned Image Complexity

Images along the

  • uter edge of the

cluster take longer to label

Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

slide-38
SLIDE 38

Vision-based complexity (vs) Learned Image Complexity

Similarly

Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

slide-39
SLIDE 39

Vision-based complexity (vs) Learned Image Complexity

Images in the wrong clusters take longer to label

Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

slide-40
SLIDE 40

Vision-based complexity (vs) Learned Image Complexity

Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

Images at the center can also take longer to

  • label. Why?
slide-41
SLIDE 41

Discussion

Is vision-based image complexity a good indicator of difficulty in labeling an image? What are the other factors?

slide-42
SLIDE 42

Discussion

Is vision-based image complexity a good indicator of difficulty in labeling an image? What are the other factors? Bird pose Occlusions Lighting

slide-43
SLIDE 43

Discussion

1. The authors experiment only with a 2-dimensional model of human expertise How would this model perform by increasing the number of intrinsic dimensions?

slide-44
SLIDE 44

Extending this approach to a video dataset YouTube corpus

slide-45
SLIDE 45

Example YouTube video with descriptions

A french bulldog is playing with a big ball A small dog chases a big ball. A French bulldog is running fast and playing with a blue yoga ball all by himself in a field. The little dog pushed a big blue ball. A dog is playing with a very large ball. A dog chases a giant rubber ball around A dog is playing with ball http://youtu.be/FYyqIJ36dSU

slide-46
SLIDE 46

Approach

YouTube corpus is not cut out for this task. Consider predicting the presence of the activity “run”

  • 1. Selected 50 videos where “run” was the predicted activity using majority voting
  • 2. Selected 30 videos where “play” was the predicted activity using majority voting
  • 3. Selected 20 videos where “walk” was the predicted activity using majority voting

Ground Truth Labels were assigned accordingly Each video had variable number of annotators. Picked the 20 most frequent annotators.

slide-47
SLIDE 47

Results

Subsampled “RUN” data

slide-48
SLIDE 48

1D clusters from learned Xi values Dataset: YouTube videos

slide-49
SLIDE 49

How do these learned video complexities compare with vision-based techniques?

Vision-based measure:

Number of STIPS in the video STIP density

*Learning Realistic Human Actions from Movies. I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld. CVPR 2008.

slide-50
SLIDE 50

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity Number of STIPs

slide-51
SLIDE 51

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity STIP density

slide-52
SLIDE 52

Vision-based complexity (vs) Learned Image Complexity

Learned image complexity STIP density PLAY/WALK RUN

Not much correlation but false negatives seem to have a higher STIP density

slide-53
SLIDE 53

Discussion

How can we quantify the complexity of a video? STIP density? Video length? Variety in STIPS? Confusion amongst multiple annotators? How can we quantify the effort involved in labeling a video? How do these relate to video ambiguity?

slide-54
SLIDE 54

Qualitative Comparison – True positive

Learned image complexity STIP density http://youtu.be/NKm8c_7mgx4

slide-55
SLIDE 55

Qualitative Comparison – True negative

Learned image complexity STIP density http://youtu.be/abiezv1p7SY

slide-56
SLIDE 56

Qualitative Comparison – False positive

Learned image complexity STIP density http://youtu.be/1l9Hx1kX_tQ

slide-57
SLIDE 57

Qualitative Comparison – False negative

Learned image complexity STIP density http://youtu.be/8miosT-Fs1k

slide-58
SLIDE 58

Strengths

1. Each annotator is modeled as a multi-dimensional entity – competence, expertise, bias 2. Can be extended to any domain to estimate the ground truth with least error 3. Models image complexities without even seeing the image 4. The model discovers groups of annotators with varying skill sets.

slide-59
SLIDE 59

1. Image difficulties are learned from human annotations only, which is great! But would the model perform better if image difficulty was incorporated as a known parameter (using some vision-based technique) into the graphical model?

Discussion

slide-60
SLIDE 60

?