Learning Deep Features for Scene Recognition using Places Database - - PowerPoint PPT Presentation

▶

Jan 01, 2024 488 likes •700 views

Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014 Bora elikkale INTRODUCTION Human Visual Recognition Samples world several times / sec

SLIDE 1

Learning Deep Features for Scene Recognition using Places Database

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014 Bora Çelikkale

SLIDE 2

INTRODUCTION

Human Visual Recognition

Samples world several times / sec ~millions images within a year

SLIDE 3

INTRODUCTION

Primate Brain

Hierarchical organization in layers

f increasing processing complexity

Inspired CNNs

SLIDE 4

PROBLEM & MOTIVATION

Obj Classification have obtained astonishing performanace with large databases (ImageNet) Iconic images do not contain the richness and diversity of visual info in scenes

SLIDE 5

CONTRIBUTIONS

Scene-centric database 60x larger than SUN Comparison metrics for scene datasets: Density, Diversity

SLIDE 6

SCENE DATASETS

Scene15

(Lazebnik et al. 2006)

15 categories ~3000 imgs

MIT Indoor67

(Quatham & Torralba 2009)

67 categories of indoor places 15.620 imgs

SUN (Xiao et al. 2010)

397 (well-sampled) categories 130.519 imgs

Places (Zhou et al. 2014)

476 categories 7.076.580 imgs

SLIDE 7

PLACES DATASET

Same categories from SUN 696 popular adjectives in Eng

Google Images Bing Images Flickr

>40M imgs are downloaded

1

SLIDE 8

PLACES DATASET

PCA-based duplicate removal across SUN

2

Places & SUN have different images Allows to combine Places & SUN

SLIDE 9

PLACES DATASET

Annotations (with AMT)

Questions (eg: is this a living room?) Two round setup:

1. Default answer is NO
2. Default answer is YES

Imgs shown / round: 750 + 60 from SUN for control

3

Take >90% accuracy

SLIDE 10

COMPARISON METRICS

Relative Density

SLIDE 11

COMPARISON METRICS

Relative Density

Images have more similar neighbors

NN of a1 NN of b1

SLIDE 12

COMPARISON METRICS

Relative Diversity

Simpson Index: two random individual belong to same specie

NN of a1 NN of b1

SLIDE 13

EXPERIMENTS

Density & Diversity Comparison (AMT)

1

Relative diversity vs. relative density per each category and dataset Show 12 pairs of images Workers select the most similar pair Diversity: pairs are chosen random for each db Density: 5th NN (avoid near duplicates) is chosen as pair with GIST

SLIDE 14

EXPERIMENTS

Cross Dataset Generalization

2

Training and testing across different datasets ImageNet-CNN and linear SVM

SLIDE 15

EXPERIMENTS

Comparison with Hand-designed Features

3

SLIDE 16

EXPERIMENTS

Training CNN for Scene Recognition

2,5M imgs from 205 categories, on AlexNet

4

SLIDE 17

PLACES-CNNs

Hybrid-AlexNet

Places + ImageNet 3.5M imgs, 1183 categories Accuracy = 0.5230 on validation set

Places205-GoogLeNet (on 205 categories)

Accuracy: top1 = 0.5567, top5 = 0.8541 on validation set

Places205-VGG16 (on 205 categories)

Accuracy: top1 = 0.5890, top5 = 0.8770 on validation set

SLIDE 18

PLACES2 DATASET

400+ unique scene categories >10M images AlexNet top1 accuracy: 43.0% VGG16 top1 accuracy: 47.6%

SLIDE 19

DEMO

http://places.csail.mit.edu/demo.html http://places2.csail.mit.edu/demo.html

SLIDE 20

Learning Deep Features for Scene Recognition using Places Database

INTRODUCTION

Human Visual Recognition

Samples world several times / sec ~millions images within a year

INTRODUCTION

Primate Brain

Hierarchical organization in layers

Inspired CNNs

PROBLEM & MOTIVATION

Obj Classification have obtained astonishing performanace with large databases (ImageNet) Iconic images do not contain the richness and diversity of visual info in scenes

CONTRIBUTIONS

Scene-centric database 60x larger than SUN Comparison metrics for scene datasets: Density, Diversity

SCENE DATASETS

Scene15

(Lazebnik et al. 2006)

15 categories ~3000 imgs

MIT Indoor67

(Quatham & Torralba 2009)

67 categories of indoor places 15.620 imgs

SUN (Xiao et al. 2010)

397 (well-sampled) categories 130.519 imgs

Places (Zhou et al. 2014)

476 categories 7.076.580 imgs

PLACES DATASET

Same categories from SUN 696 popular adjectives in Eng

>40M imgs are downloaded

1

PLACES DATASET

PCA-based duplicate removal across SUN

2

PLACES DATASET

Annotations (with AMT)

Questions (eg: is this a living room?) Two round setup:

Imgs shown / round: 750 + 60 from SUN for control

3

COMPARISON METRICS

Relative Density

COMPARISON METRICS

Relative Density

Images have more similar neighbors

COMPARISON METRICS

Relative Diversity

Simpson Index: two random individual belong to same specie

EXPERIMENTS

Density & Diversity Comparison (AMT)

1

EXPERIMENTS

Cross Dataset Generalization

2

EXPERIMENTS

Comparison with Hand-designed Features

3

EXPERIMENTS

Training CNN for Scene Recognition

2,5M imgs from 205 categories, on AlexNet

4

PLACES-CNNs

Hybrid-AlexNet

Places + ImageNet 3.5M imgs, 1183 categories Accuracy = 0.5230 on validation set

Places205-GoogLeNet (on 205 categories)

Accuracy: top1 = 0.5567, top5 = 0.8541 on validation set

Places205-VGG16 (on 205 categories)

Accuracy: top1 = 0.5890, top5 = 0.8770 on validation set

PLACES2 DATASET

400+ unique scene categories >10M images AlexNet top1 accuracy: 43.0% VGG16 top1 accuracy: 47.6%

DEMO

http://places.csail.mit.edu/demo.html http://places2.csail.mit.edu/demo.html

THANK YOU