SLIDE 1 Learning Deep Features for Scene Recognition using Places Database
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014 Bora Çelikkale
SLIDE 2
INTRODUCTION
Human Visual Recognition
Samples world several times / sec ~millions images within a year
SLIDE 3 INTRODUCTION
Primate Brain
Hierarchical organization in layers
- f increasing processing complexity
Inspired CNNs
SLIDE 4
PROBLEM & MOTIVATION
Obj Classification have obtained astonishing performanace with large databases (ImageNet) Iconic images do not contain the richness and diversity of visual info in scenes
SLIDE 5
CONTRIBUTIONS
Scene-centric database 60x larger than SUN Comparison metrics for scene datasets: Density, Diversity
SLIDE 6
SCENE DATASETS
Scene15
(Lazebnik et al. 2006)
15 categories ~3000 imgs
MIT Indoor67
(Quatham & Torralba 2009)
67 categories of indoor places 15.620 imgs
SUN (Xiao et al. 2010)
397 (well-sampled) categories 130.519 imgs
Places (Zhou et al. 2014)
476 categories 7.076.580 imgs
SLIDE 7 PLACES DATASET
Same categories from SUN 696 popular adjectives in Eng
Google Images Bing Images Flickr
>40M imgs are downloaded
1
SLIDE 8 PLACES DATASET
PCA-based duplicate removal across SUN
2
Places & SUN have different images Allows to combine Places & SUN
SLIDE 9 PLACES DATASET
Annotations (with AMT)
Questions (eg: is this a living room?) Two round setup:
- 1. Default answer is NO
- 2. Default answer is YES
Imgs shown / round: 750 + 60 from SUN for control
3
Take >90% accuracy
SLIDE 10
COMPARISON METRICS
Relative Density
SLIDE 11 COMPARISON METRICS
Relative Density
Images have more similar neighbors
NN of a1 NN of b1
SLIDE 12 COMPARISON METRICS
Relative Diversity
Simpson Index: two random individual belong to same specie
NN of a1 NN of b1
SLIDE 13 EXPERIMENTS
Density & Diversity Comparison (AMT)
1
Relative diversity vs. relative density per each category and dataset Show 12 pairs of images Workers select the most similar pair Diversity: pairs are chosen random for each db Density: 5th NN (avoid near duplicates) is chosen as pair with GIST
SLIDE 14 EXPERIMENTS
Cross Dataset Generalization
2
Training and testing across different datasets ImageNet-CNN and linear SVM
SLIDE 15
EXPERIMENTS
Comparison with Hand-designed Features
3
SLIDE 16
EXPERIMENTS
Training CNN for Scene Recognition
2,5M imgs from 205 categories, on AlexNet
4
SLIDE 17
PLACES-CNNs
Hybrid-AlexNet
Places + ImageNet 3.5M imgs, 1183 categories Accuracy = 0.5230 on validation set
Places205-GoogLeNet (on 205 categories)
Accuracy: top1 = 0.5567, top5 = 0.8541 on validation set
Places205-VGG16 (on 205 categories)
Accuracy: top1 = 0.5890, top5 = 0.8770 on validation set
SLIDE 18
PLACES2 DATASET
400+ unique scene categories >10M images AlexNet top1 accuracy: 43.0% VGG16 top1 accuracy: 47.6%
SLIDE 19
DEMO
http://places.csail.mit.edu/demo.html http://places2.csail.mit.edu/demo.html
SLIDE 20
THANK YOU