Datasets for object recognition and scene understanding Slides - - PowerPoint PPT Presentation

datasets for object recognition and scene understanding
SMART_READER_LITE
LIVE PREVIEW

Datasets for object recognition and scene understanding Slides - - PowerPoint PPT Presentation

Datasets for object recognition and scene understanding Slides adapted with gratitude from http://www.cs.washington.edu/ education/courses/cse590v/11au/ (Neeraj Kumar and Brian Russell) 1972 Slide credit: A. Torralba Slide credit: A. Torralba


slide-1
SLIDE 1

Datasets for object recognition and scene understanding

Slides adapted with gratitude from http://www.cs.washington.edu/ education/courses/cse590v/11au/ (Neeraj Kumar and Brian Russell)

slide-2
SLIDE 2

1972 Slide credit: A. Torralba

slide-3
SLIDE 3

Marr, 1976

Slide credit: A. Torralba

slide-4
SLIDE 4

Caltech 101 and 256

Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004 30,607 images 9,146 images

101 object classes 256 object classes

Slide credit: A. Torralba

slide-5
SLIDE 5

591 images, 23 object classes Pixel-wise segmentation

MSRC

  • J. Winn, A. Criminisi, and T. Minka, 2005
slide-6
SLIDE 6

B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, IJCV 2008

labelme.csail.mit.edu Tool went online July 1st, 2005 825,597 object annotations collected 199,250 images available for labeling

LabelMe

slide-7
SLIDE 7
slide-8
SLIDE 8

Quality of the labeling

Person 7 12 21 Dog 16 28 52 Bird 13 37 168 Chair 7 10 15 Street lamp 5 9 15 House 5 7 12 Motorbike 12 22 36 Boat 6 9 14 Tree 11 20 36 Mug 6 8 11 Bottle 7 8 11 Car 8 15 22

25% 50% 75% 25% 50% 75% Average labeling quality

slide-9
SLIDE 9

Extreme labeling

slide-10
SLIDE 10

The other extreme of extreme labeling

… things do not always look good…

slide-11
SLIDE 11

Testing

Most common labels: test adksdsa woiieiie …

slide-12
SLIDE 12

Sophisticated testing

Most common labels: Star Square Nothing …

slide-13
SLIDE 13

2011 version - 20 object classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor The train/val data has 11,530 images containing 27,450 ROI annotated objects and 5,034 segmentations

  • Three main competitions: classification, detection, and segmentation
  • Three "taster" competitions: person layout, action classification, and

ImageNet large scale recognition

  • M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman
slide-14
SLIDE 14

80.000.000 tiny images

75.000 non-abstract nouns from WordNet 7 Online image search engines Google: 80 million images And after 1 year downloading images

  • A. Torralba, R. Fergus, W.T

. Freeman. PAMI 2008

Slide credit: A. Torralba

slide-15
SLIDE 15
  • An ontology of images based on WordNet

– 22,000+ categories of visual concepts – 15 million human-cleaned images – www.image-net.org

~105+ nodes ~108+ images shepherd dog, sheep dog German shepherd collie animal

Deng, Dong, Socher, Li & Fei-Fei, CVPR 2009

Slide credit: A. Torralba

slide-16
SLIDE 16
  • J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, CVPR
  • Collected all the terms from WordNet that described scenes, places, and

environments

  • Any concrete noun which could reasonably complete the phrase “I

am in a place”, or “let’s go to the place”

  • 899 scene categories
  • 130,519 images
  • 397 scene categories with at least 100 images
  • 63,726 labeled objects
slide-17
SLIDE 17

Unbiased Look at Dataset Bias

Alyosha Efros (CMU) Antonio Torralba (MIT)

All the following slides are from A. Torralba and A. Efros

slide-18
SLIDE 18

Are datasets measuring the right thing?

  • In Machine Learning:

Dataset is The World

  • In Recognition

Dataset is a representation of The World

  • Do datasets provide a good representation?
slide-19
SLIDE 19

Visual Data is Inherently Biased

  • Internet is a tremendous repository of visual

data (Flickr, YouTube, Picassa, etc)

  • But it’s not random samples of visual world
slide-20
SLIDE 20

Flickr Paris

slide-21
SLIDE 21

Google
 StreetView Paris

Knopp, Sivic, Pajdla, ECCV 2010

slide-22
SLIDE 22

Sampled Alyosha Efros’s Paris

slide-23
SLIDE 23

Sampling Bias

  • People like to take pictures on vacation
slide-24
SLIDE 24

Photographer Bias

  • People want their pictures to be recognizable

and/or interesting

vs.

slide-25
SLIDE 25

Social Bias

“100 Special Moments” by Jason Salavon

slide-26
SLIDE 26

Our Question

  • How much does this bias affect standard

datasets used for object recognition?

slide-27
SLIDE 27

“Name That Dataset!” game

__ Caltech 101 __ Caltech 256 __ MSRC __ UIUC cars __ Tiny Images __ Corel __ PASCAL 2007 __ LabelMe __ COIL-100 __ ImageNet __ 15 Scenes __ SUN’09

slide-28
SLIDE 28

SVM plays “Name that dataset!”

slide-29
SLIDE 29

SVM plays “Name that dataset!”

  • 12 1-vs-all

classifiers

  • Standard full-image

features

  • 39% performance

(chance is 8%)

slide-30
SLIDE 30

SVM plays “Name that dataset!”

slide-31
SLIDE 31

Datasets have different goals…

  • Some are object-centric (e.g. Caltech,

ImageNet)

  • Otherwise are scene-centric (e.g. LabelMe,

SUN’09)

  • What about playing “name that dataset” on

bounding boxes?

slide-32
SLIDE 32

Similar results

Performance: 61% (chance: 20%)

slide-33
SLIDE 33

Where does this bias comes from?

slide-34
SLIDE 34

Some bias is in the world

slide-35
SLIDE 35

Some bias is in the world

slide-36
SLIDE 36

Some bias comes from the way the data is collected

slide-37
SLIDE 37

Google mugs Mugs from LabelMe

slide-38
SLIDE 38

Measuring Dataset Bias

slide-39
SLIDE 39

Cross-Dataset Generalization

Classifier trained on MSRC cars

MSRC Caltech101 ImageNet PASCAL LabelMe SUN

slide-40
SLIDE 40

Cross-dataset Performance

slide-41
SLIDE 41
slide-42
SLIDE 42

Dataset Value

slide-43
SLIDE 43

Mixing datasets

Task: car detection
 Features: HOG Training on
 Caltech 101 Adding additional
 data from PASCAL AP Number training examples

Test on Caltech 101

slide-44
SLIDE 44

AP Number training examples Training on
 PASCAL Adding more
 PASCAL Adding more
 from LabelMe Adding more
 from Caltech 101

Mixing datasets

Test on PASCAL

slide-45
SLIDE 45

Negative Set Bias

Not all the bias comes from the appearance of the objects we care about

slide-46
SLIDE 46

Summary (from 2011)

  • Our best-performing techniques just don’t work in

the real world

– e.g., try a person detector on Hollywood film – but new datasets (PASCAL, ImageNet) are better than older

  • nes (MSRC, Caltech)
  • The classifiers are inherently designed to overfit to

type of data it’s trained on.

– but larger datasets are getting better

slide-47
SLIDE 47

Four Stages of Dataset Grief

  • 1. Denial
  • 2. Machine Learning

WHAT BIAS? I AM SURE THAT MY MSRC CLASSIFIER WILL WORK ON ANY DATA! OF COURSE THERE IS BIAS! THAT’’S WHY YOU MUST ALWAYS TRAIN AND TEST ON THE SAME DATASET.

  • 3. Despair

RECOGNITION IS HOPELESS., IT WILL NEVER WORK. WE WILL JUST KEEP OVERFITTING TO THE NEXT DATASET… BIAS IS HERE TO STAY, SO WE MUST BE VIGILANT THAT OUR ALGORITHMS DON’T GET DISTRACTED BY IT.

  • 4. Acceptance
slide-48
SLIDE 48

Lessons that still apply in 2018

  • Datasets are bigger but still very biased
  • Specific insights about particular datasets less

relevant, but overall message still critical

  • Also, exemplary analysis paper!
  • Some work since then
  • Undoing the damage of dataset bias (Khosla et al. https://

people.csail.mit.edu/khosla/papers/eccv2012_khosla.pdf)

  • A deeper look at dataset bias (Tommasi et al. https://arxiv.org/pdf/

1505.01257.pdf)

  • What makes ImageNet good for transfer learning (Huh et al. https://

arxiv.org/pdf/1608.08614.pdf)

  • Work on domain adaptation/transfer learning
  • Work on fairness in machine learning