The Promise and Perils of Big Data Some Slides from A. Efros and A. - - PowerPoint PPT Presentation

the promise and perils of big data
SMART_READER_LITE
LIVE PREVIEW

The Promise and Perils of Big Data Some Slides from A. Efros and A. - - PowerPoint PPT Presentation

The Promise and Perils of Big Data Some Slides from A. Efros and A. Torralba Why do we need data? Most problems in vision are ambiguous and hard. 2D -> 3D Segmentation/Edges So, how do we solve these problems? Magic of data !


slide-1
SLIDE 1

The Promise and Perils of Big Data

Some Slides from A. Efros and A. Torralba

slide-2
SLIDE 2

Why do we need data?

Most problems in vision are ambiguous and hard.

  • 2D -> 3D
  • Segmentation/Edges
slide-3
SLIDE 3

So, how do we solve these problems?

  • Magic of data !
  • Use data to learn better likelihoods: how

things look like.

  • Use data to learn priors of what is more

likely than others.

But how much data do we need?

slide-4
SLIDE 4

The extremes of learning

Number of training samples 1 10 102 103 104 106-7 Extrapolation problem Generalization Transfer learning Interpolation problem Correspondence Finding the differences

1010 Datasets before 2012 Current Datasets

slide-5
SLIDE 5

So how much data does humans use?

slide-6
SLIDE 6

What’s the Capacity of Visual Long Term Memory?

“Basically, my recollection is that we just separated the pictures into distinct thematic categories: e.g. cars, animals, single-person, 2- people, plants, etc.) Only a few slides were selected which fell into each category, and they were visually distinct.” According to Standing

Standing (1973) 10,000 images 83% Recognition What we know… What we don’t know…

Sparse Details Dogs Playing Cards “Gist” Only Highly Detailed

… people can remember thousands

  • f images

… what people are remembering for each item?

High Fidelity Visual Memory is possible (Hollingworth 2004) Slide by Aude Oliva

slide-7
SLIDE 7

Massive Memory I: Methods

... ... ...

Showed 14 observers 2500 categorically unique objects 1 at a time, 3 seconds each 800 ms blank between items Study session lasted about 5.5 hours Repeat Detection task to maintain focus 1-back Followed by 300 2-alternative forced choice tests 1024-back Slide by Aude Oliva

slide-8
SLIDE 8

Slide by Aude Oliva

slide-9
SLIDE 9

how far can we push the fidelity of visual LTM representation ?

Same object category, different instance Slide by Aude Oliva

slide-10
SLIDE 10

how far can we push the fidelity of visual LTM representation ?

Same object, different states Slide by Aude Oliva

slide-11
SLIDE 11

Visual Cognition Expert Predictions

92%

Massive Memory I: Recognition Memory Results

Replication of Standing (1973) Slide by Aude Oliva

slide-12
SLIDE 12

92% 88% 87% Slide by Aude Oliva

Massive Memory I: Recognition Memory Results

slide-13
SLIDE 13

Extrapolation of Repeat Detection Data

Human performances for n = 1024

Power law (r2=.988) Quadratic (r

2=.988) Brady, Konkle, Alvarez, Oliva (submitted)

Slide by Aude Oliva

slide-14
SLIDE 14

how much data does computer vision researchers use?

slide-15
SLIDE 15

10

images 1972

slide-16
SLIDE 16

10

1 images

slide-17
SLIDE 17

10

1 images

Marr, 1976

slide-18
SLIDE 18

10

2-4 images

slide-19
SLIDE 19

10

2-4 images In 1996 DARPA released 14000 images, 
 from over 1000 individuals.

The faces and cars scale

slide-20
SLIDE 20

The PASCAL Visual Object Classes

  • M. Everingham, Luc van Gool , C. Williams, J. Winn, A. Zisserman 2007

In 2007, the twenty object classes that have been selected are: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

slide-21
SLIDE 21

10

2-4 images

slide-22
SLIDE 22

10

5 images

slide-23
SLIDE 23

Caltech 101 and 256

Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004

10

5 images

slide-24
SLIDE 24

Lotus Hill Research Institute image corpus

Z.Y . Yao, X. Yang, and S.C. Zhu, 2007

slide-25
SLIDE 25

B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, IJCV 2008

Labelme.csail.mit.edu Tool went online July 1st, 2005 530,000 object annotations collected

LabelMe

10

5 images

slide-26
SLIDE 26

Quality of labeling

Person 7 12 21 Dog 16 28 52 Bird 13 37 168 Chair 7 10 15 Street lamp 5 9 15 House 5 7 12 Motorbike 12 22 36 Boat 6 9 14 Tree 11 20 36 Mug 6 8 11 Bottle 7 8 11 Car 8 15 22

25% 50% 75% 25% 50% 75% Average labeling quality

slide-27
SLIDE 27

Extreme labeling

slide-28
SLIDE 28

The other extreme of extreme labeling

… things do not always look good…

slide-29
SLIDE 29

Creative testing

slide-30
SLIDE 30

Object statistics Scene statistics How representative of the visual world is it?

Scene and object biases

slide-31
SLIDE 31

10

5 images

slide-32
SLIDE 32

10

6-7 images Things start getting out of hand

slide-33
SLIDE 33

Collecting big datasets

  • ESP game (CMU)

Luis Von Ahn and Laura Dabbish 2004

  • LabelMe (MIT)

Russell, Torralba, Freeman, 2005

  • StreetScenes (CBCL-MIT)

Bileschi, Poggio, 2006

  • WhatWhere (Caltech)

Perona et al, 2007

  • PASCAL challenge

2006, 2007

  • Lotus Hill Institute

Song-Chun Zhu et al, 2007

  • 80 million images

Torralba, Fergus, Freeman, 2007

10

6-7 images

slide-34
SLIDE 34

80.000.000 images

75.000 non-abstract nouns from WordNet 7 Online image search engines Google: 80 million images And after 1 year downloading images

  • A. Torralba, R. Fergus, W.T

. Freeman. PAMI 2008

10

6-7 images

slide-35
SLIDE 35
  • An ontology of images based on WordNet
  • ImageNet currently has

– 22,000+ categories of visual concepts – 15 million human-cleaned images (~700im/ categ) – 1/3+ is released online @ www.image-net.org

~105+ nodes ~108+ images shepherd dog, sheep dog German shepherd collie animal

Deng, Dong, Socher, Li & Fei-Fei, CVPR 2009

10

6-7 images

slide-36
SLIDE 36
slide-37
SLIDE 37

Alexander Sorokin, David Forsyth, "Utility data annotation with Amazon Mechanical Turk", First IEEE Workshop on Internet Vision at CVPR 08.

Labeling for money

slide-38
SLIDE 38

10

6-7 images

slide-39
SLIDE 39

10

8-11 images

slide-40
SLIDE 40

Datasets in perspective

Number of images on my hard drive: 104 Number of images seen during my first 10 years: 108

(3 images/second * 60 * 60 * 16 * 365 * 10 = 630720000)

Number of images seen by all humanity: 1020

106,456,367,669 humans1 * 100 years * 3 images/second * 60 * 60 * 16 * 365 =

1 from http://www.prb.org/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx

Number of all 32x32 images: 107373

256 32*32*3 ~ 107373

PASCAL Number of samples

slide-41
SLIDE 41

When do we need big data?

slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45

Unreasonable Effectiveness of Data

Simple Algorithms (Dumb) + Lot of Data are better than Complicated algorithms

Example: Machine Translation Example: Texture Generation

slide-46
SLIDE 46

Machine Translation

slide-47
SLIDE 47

Step 1: Source Sentence Chunking

  • Segment source sentence into overlapping n-grams via sliding

window

  • Typical n-gram length 4 to 9 terms
  • Each term is a word or a known phrase
  • Any sentence length

S1 S2 S3 S4 S5 S6 S7 S8 S9 S1 S2 S3 S4 S5 S2 S3 S4 S5 S6 S3 S4 S5 S6 S7 S4 S5 S6 S7 S8 S5 S6 S7 S8 S9 Slide by Jaime Carbonell

slide-48
SLIDE 48

Flooding Set

Step 2: Dictionary Lookup

T3-a T3-b T3-c T4-a T4-b T4-c T4-d T4-e T5-a T6-a T6-b T6-c

  • Using bilingual dictionary, list all possible target translations for

each source word or phrase

Source Word-String

T2-a T2-b T2-c T2-d

Target Word Lists

S2 S3 S4 S5 S6 Inflected Bilingual Dictionary

Slide by Jaime Carbonell

slide-49
SLIDE 49

Step 3: Search Target Text

  • Using the Flooding Set, search target text for word-strings containing one word from

each group

  • Find maximum number of words from Flooding Set in minimum length word-string

– Words or phrases can be in any order – Ignore function words in initial step (T5 is a function word in this example)

T2-a T2-b T2-c T2-d T3-a T3-b T3-c T4-a T4-b T4-c T4-d T4-e T5-a T6-a T6-b T6-c

Flooding Set

Slide by Jaime Carbonell

slide-50
SLIDE 50

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T3-b T(x) T2-d T(x) T(x) T6-c T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x)

Step 3: Search Target Text (Example)

T2-a T2-b T2-c T2-d T3-a T3-b T3-c T4-a T4-b T4-c T4-d T4-e T5-a T6-a T6-b T6-c

Flooding Set Target Corpus

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T3-b T(x) T2-d T(x) T(x) T6-c T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T3-b T(x) T2-d T(x) T(x) T6-c

Target Candidate 1

Slide by Jaime Carbonell

slide-51
SLIDE 51

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x)

Step 3: Search Target Text (Example)

T2-a T2-b T2-c T2-d T3-a T3-b T3-c T4-a T4-b T4-c T4-d T4-e T5-a T6-a T6-b T6-c

Flooding Set Target Corpus

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T4-a T6-b T(x) T2-c T3-a T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T4-a T6-b T(x) T2-c T3-a

Target Candidate 2

Slide by Jaime Carbonell

slide-52
SLIDE 52

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x)

Step 3: Search Target Text (Example)

T2-a T2-b T2-c T2-d T3-a T3-b T3-c T4-a T4-b T4-c T4-d T4-e T5-a T6-a T6-b T6-c

Flooding Set Target Corpus

T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T3-c T2-b T4-e T5-a T6-a T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T(x) T3-c T2-b T4-e T5-a T6-a

Target Candidate 3 Reintroduce function words after initial match (e.g. T5)

Slide by Jaime Carbonell

slide-53
SLIDE 53

Scoring

  • Step 4: Score Word-String Candidates
  • Scoring of candidates based on:

– Proximity (minimize extraneous words in target n-gram ≈ precision) – Number of word matches (maximize coverage ≈ recall)) – Regular words given more weight than function words – Combine results (e.g., optimize F1 or p-norm or …)

T3-b T(x) T2-d T(x) T(x) T6-c T4-a T6-b T(x) T2-c T3-a T3-c T2-b T4-e T5-a T6-a

Proximity 3rd 1st 1st Word Matches 3rd 2st 1st “Regular” Words 3rd 1st 1st Total Scoring 3rd 2nd 1st

Target Word-String Candidates

Slide by Jaime Carbonell

slide-54
SLIDE 54

T3-b T(x3) T2-d T(x5) T(x6) T6-c T4-a T6-b T(x3) T2-c T3-a T3-c T2-b T4-e T5-a T6-a T(x2) T4-a T6-b T(x3) T2-c T(x1) T2-d T3-c T(x2) T4-b T(x1) T3-c T2-b T4-e T6-b T(x11) T2-c T3-a T(x9) T2-b T4-e T5-a T6-a T(x8) T6-b T(x3) T2-c T3-a T(x8)

Step 5: Select Candidates Using Overlap
 (Propagate context over entire sentence)

Word-String 1 Candidates Word-String 2 Candidates Word-String 3 Candidates

T(x2) T4-a T6-b T(x3) T2-c T4-a T6-b T(x3) T2-c T3-a T6-b T(x3) T2-c T3-a T(x8) T3-c T2-b T4-e T5-a T6-a T3-b T(x3) T2-d T(x5) T(x6) T6-c T3-b T(x3) T2-d T(x5) T(x6) T6-c T4-a T6-b T(x3) T2-c T3-a T(x1) T3-c T2-b T4-e T3-c T2-b T4-e T5-a T6-a T2-b T4-e T5-a T6-a T(x8)

Slide by Jaime Carbonell

slide-55
SLIDE 55

Step 5: Select Candidates Using Overlap

T(x1) T3-c T2-b T4-e T3-c T2-b T4-e T5-a T6-a T2-b T4-e T5-a T6-a T(x8) T(x2) T4-a T6-b T(x3) T2-c T4-a T6-b T(x3) T2-c T3-a T6-b T(x3) T2-c T3-a T(x8) T(x2) T4-a T6-b T(x3) T2-c T3-a T(x8) T(x1) T3-c T2-b T4-e T5-a T6-a T(x8)

Best translations selected via maximal overlap

Alternative 1 Alternative 2

Slide by Jaime Carbonell

slide-56
SLIDE 56

A (Simple) Real Example of Overlap

a United States soldier died and two others were injured Monday a United States soldier United States soldier died soldier died and two others died and two others were injured two others were injured Monday

N-grams generated from Flooding

Flooding N-gram fidelity Overlap Long range fidelity

N-grams connected via Overlap

Slide by Jaime Carbonell

slide-57
SLIDE 57

Texture Synthesis

slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64

So, how do we use big data?

slide-65
SLIDE 65

Two ways to use Lots of Data

Brute Force Vision: Find that needle in the haystack and disregard the rest (a.k.a. kNN) See what different subsets

  • f data think of you
slide-66
SLIDE 66

kNN matching is great…

  • because we live in a (mostly) boring

world!

slide-67
SLIDE 67

Lots Of Images

  • A. Torralba, R. Fergus, W.T

.Freeman. PAMI 2008

slide-68
SLIDE 68

Lots Of Images

  • A. Torralba, R. Fergus, W.T

.Freeman. PAMI 2008

slide-69
SLIDE 69

Lots Of Images

slide-70
SLIDE 70

Automatic Colorization Result

Grayscale input High resolution Colorization of input using average

  • A. Torralba, R. Fergus, W.T

.Freeman. 2008

slide-71
SLIDE 71

im2gps

Instead of using objects labels, the web provides other kinds of metadata associate to large collections of images Hays & Efros. CVPR 2008 20 million geotagged and geographic text-labeled images

slide-72
SLIDE 72

Hays & Efros. CVPR 2008

im2gps

slide-73
SLIDE 73

Image completion

Instead, generate proposals using millions of images Hays, Efros, 2007 Input 16 nearest neighbors
 (gist+color matching)

  • utput
slide-74
SLIDE 74

With a good image similarity
 and a lot of data… Input image Nearest neighbors

22,000 LabelMe scenes

Hays, Efros, Siggraph 2006 Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-75
SLIDE 75

With a good image similarity
 and a lot of data…

Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-76
SLIDE 76

With a good image similarity
 and a lot of data…

Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-77
SLIDE 77

With a good image similarity
 and a lot of data…

Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-78
SLIDE 78

With a good image similarity
 and a lot of data…

Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-79
SLIDE 79

Outputs

Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

slide-80
SLIDE 80

While many scenes are boring…

Slide by Antonio Torralba

slide-81
SLIDE 81

Some scenes are unique

Slide by Antonio Torralba

slide-82
SLIDE 82

Dealing with sparse data (rare scenes)

  • better similarity
slide-83
SLIDE 83

83

Medici Fountain, Paris

slide-84
SLIDE 84

84

slide-85
SLIDE 85

85

slide-86
SLIDE 86

86

Medici Fountain, Paris (winter)

slide-87
SLIDE 87

87

slide-88
SLIDE 88

88

slide-89
SLIDE 89

89

slide-90
SLIDE 90

90

slide-91
SLIDE 91

91

slide-92
SLIDE 92

OUR GOAL

92

slide-93
SLIDE 93

93

slide-94
SLIDE 94

94

Top Matches Input Query

slide-95
SLIDE 95

95

Top Matches Input Query

slide-96
SLIDE 96

96

Top Matches Input Query

slide-97
SLIDE 97

IMPORTANT PARTS?

97

Input Query Important Parts

slide-98
SLIDE 98

98

Top Matches Input Query

slide-99
SLIDE 99

99

“Data-driven Uniqueness”

slide-100
SLIDE 100

Search using Images

100

Top Matches Input Query

slide-101
SLIDE 101

Search using Sketches

101

slide-102
SLIDE 102

Search using Paintings

102

Input Painting Top Matches

slide-103
SLIDE 103

Search using Paintings

103

Input Painting Top Matches

slide-104
SLIDE 104

Dealing with sparse data (rare scenes)

  • better similarity
  • better alignment

– e.g. reduce resolution, sifting, warping, etc.

slide-105
SLIDE 105

Matching scenes

Two images taken from the same scene category, but different instances

  • Contain different objects with different

scales, perspectives and spatial location

Liu, Yuen, Torralba, Sivic, Freeman. ECCV 08

slide-106
SLIDE 106

Image representation

128 dimensions/pixel

SIFT Visualization: map 128
 dimensions in 3D color space

slide-107
SLIDE 107
slide-108
SLIDE 108

Scene parsing results

Query Best match Annotation of best match Warped best match to query Parsing result Ground truth

slide-109
SLIDE 109

Liu, Yuen, Torralba. CVPR 2009; Yuen, Torralba. ECCV 2010

Prediction

slide-110
SLIDE 110

Dealing with sparse data (rare scenes)

  • better similarity
  • better alignment
  • e.g. reduce resolution, sifting, warping, etc.
  • Use sub-images (primitives) to match
  • Allows matching from multiple images
slide-111
SLIDE 111

Predicting Surface Normals

slide-112
SLIDE 112

Matching Parts

slide-113
SLIDE 113

Matching Parts

slide-114
SLIDE 114

Sparse Detections Input Image

Matching Parts

slide-115
SLIDE 115

Dealing with sparse data (rare scenes)

  • better similarity
  • better alignment
  • Use sub-images (primitives) to match
  • Understand the simple stuff first

– e.g. tracking via recognition, background subtraction, “object pop-out”, etc.

slide-116
SLIDE 116

Recognize when it’s easy!

Ramanan, Forsyth, Zisserman, 2004

slide-117
SLIDE 117

Guess structure

David C. Lee, Martial Hebert, Takeo Kanade, CVPR’09

slide-118
SLIDE 118

David C. Lee, Martial Hebert, Takeo Kanade, CVPR’09

Guess structure

slide-119
SLIDE 119

Subtracting away structure

Structure Objects Wall appearance modeling David C. Lee, Martial Hebert, Takeo Kanade, CVPR’09

slide-120
SLIDE 120

Dealing with sparse data (rare scenes)

  • better similarity
  • better alignment

– e.g. reduce resolution, sifting, warping, etc.

  • segment into chunks

– e.g. segmentation for recognition approaches

  • get rid of simple stuff first

– e.g. background subtraction, “object pop-out”, etc.

  • Moving away from kNN methodology…
  • use data to make connections

– e..g The Memex, manifold learning, data association, subpopulation means, etc.

slide-121
SLIDE 121

Memex – Knowledge Graph

slide-122
SLIDE 122

Manifolds

  • Images are high dimensional: A 64x64

image is 4096 dimensional vector.

  • But the possible images are much less!
  • Is there a subspace where the set of

images lie?

slide-123
SLIDE 123

appearance variation

manifolds in vision

images from hormel corp.

Slide by Dave Thompson

slide-124
SLIDE 124

manifolds in vision

images from www.golfswingphotos.com

Slide by Dave Thompson

slide-125
SLIDE 125

reasonable distance metrics

?

Slide by Dave Thompson

slide-126
SLIDE 126

reasonable distance metrics

?

linear interpolation

Slide by Dave Thompson

slide-127
SLIDE 127

reasonable distance metrics

?

manifold interpolation

Slide by Dave Thompson

slide-128
SLIDE 128

reasonable distance metrics

slide-129
SLIDE 129

reasonable distance metrics

slide-130
SLIDE 130

Some observations about data collection

slide-131
SLIDE 131

Object distributions

slide-132
SLIDE 132

Classes sorted by frequency

SUN database

The first 9 objects account for 50% of all training examples 17 classes with more than 300 examples 109 classes with less than 50 examples

200 categories

~ Zipf’s law

slide-133
SLIDE 133

Classes sorted by frequency

Rare objects are similar to 
 frequent objects

chair Swivel chair armchair Deck chair

Salakhutdinov, Torralba, and Tenenbaum, CVPR, 2011

slide-134
SLIDE 134

Rare objects are similar to 
 frequent objects

bus van truck car Classes sorted by frequency

Salakhutdinov, Torralba, and Tenenbaum, CVPR, 2011

slide-135
SLIDE 135

Some bias comes from the way the data is collected

slide-136
SLIDE 136

Google mugs Mugs from LabelMe

slide-137
SLIDE 137
slide-138
SLIDE 138

“Name That Dataset!” game

__ Caltech 101 __ Caltech 256 __ MSRC __ UIUC cars __ Tiny Images __ Corel __ PASCAL 2007 __ LabelMe __ COIL-100 __ ImageNet __ 15 Scenes __ SUN’09

slide-139
SLIDE 139

SVM plays “Name that dataset!”

slide-140
SLIDE 140

SVM plays “Name that dataset!”

  • 12 1-vs-all

classifiers

  • Standard full-

image features

  • 39% performance

(chance is 8%)

slide-141
SLIDE 141

SVM plays “Name that dataset!”

slide-142
SLIDE 142

Datasets have different goals…

  • Some are object-centric (e.g. Caltech,

ImageNet)

  • Otherwise are scene-centric (e.g.

LabelMe, SUN’09)

  • What about playing “name that dataset”
  • n bounding boxes?
slide-143
SLIDE 143

Cross-Dataset Generalization

Classifier trained on MSRC cars

MSRC Caltech101 ImageNet PASCAL LabelMe SUN

slide-144
SLIDE 144

AP Number training examples Training on
 PASCAL Adding more
 PASCAL Adding more
 from LabelMe Adding more
 from Caltech 101

Mixing datasets

Test on PASCAL