KEMAL ZMECLER (08.03.2016) 1 PRESENTATION TOPIC OBJECT DETECTORS - PowerPoint PPT Presentation

KEMAL ÇİZMECİLER (08.03.2016) 1

PRESENTATION TOPIC OBJECT DETECTORS EMERGE IN DEEP SCENE CNNS Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba International Conference on Learning Representations,2015. 2

OUTLINE § Problem statement and motivation § Sımplifying The Input Images § Visualizing The Receptive Fields § Identifying The Semantics Of Internal Units § Connections with other work § Future direction 3

Problem Definition § to study the internal representation learned by Places-CNN on a task other than object recognition. (i.e. scene recognition) § Visualize those representations through inner layers 4

ImageNet CNN and Places CNN ImageNet CNN for Object Classification Same architecture: AlexNet Places CNN for Scene Classification Slide credit : Bolei Zhou Places

Comparison of ImageNet CNN and Places CNN § The ImageNet-CNN from Jia (2013) is trained on 1.3 million images from 1000 object categories of ImageNet (ILSVRC 2012) and achieves a top-1 accuracy of 57.4%. § Places-CNN is trained on 2.4 million images from 205 scene categories of Places Database (Zhou et al., 2014), and achieves a top-1 accuracy of 50.0%. 6

Object Representations in Computer Vision Part-based models are used to represent objects and visual patterns. -Object as a set of parts -Relative locations between parts Slide credit : Bolei Zhou Figure from Fischler & Elschlager (1973)

How Objects are Representedin CNN? Deconvolution Zeiler, M. et al. Visualizing and Understanding Convolutional Networks,ECCV 2014. Strong activation image Slide credit : Bolei Zhou Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. CVPR 2014 Back-propagation Simonyan, K. et al. Deep inside convolutional networks: Visualising image classification models and saliency maps. ICLR workshop, 2014

DeConvnet Matthew D. Zeiler , Rob Fergus Visualizing and Understanding Convolutional Networks

IMAGES HAVING HIGHEST ACTIVATIONS the earlier layers such as pool1 and pool2 prefer similar images for both networks ; while the later layers tend to be more specialized to the specific task of scene or object categorization.

How Objects are Representedin CNN? CNN uses distributed code to represent objects. Conv1 Conv2 Conv3 Conv4 Slide credit : Bolei Zhou Pool5 http://people.csail.mit.edu/torralba/research/drawCNN/drawNet.html

DIFFERENCE FROM OTHER WORK § (Girshick et al., 2014) proposed a pre-train for auxiliary task and then fine-tune for the target task, § the same network can do both object localization and scene recognition in a single forward-pass. § Another set of recent works (Oquab et al., 2014; Bergamo et al., 2014) demonstrate the ability of deep networks trained on object classification to do localization without bounding box supervision. However , they require object-level supervision. 12

UNCOVERING THE CNN § SIMPLIFYING THE INPUT IMAGES § VISUALIZING THE RECEPTIVE FIELDS § IDENTIFYING THE SEMANTICS OF INTERNAL UNITS 13

SIMPLIFYING THE INPUT IMAGES § simplify this image such that it keeps as little visual information as possible while still having a high classification score for the same category § second approach: generate the minimal image representations using the fully annotated image set of SUN Database (Xiao et al., 2014) instead of performing automatic segmentation 14

SUN DATABASE J. Xiao et. al. 2010 15

SUN DATABASE J. Xiao et. al. 2010 16

HOW IS SIMPLIFICATION HANDLED § At each iteration remove the segment that produces the smallest decrease of the correct classification score and do this until the image is incorrectly classified § Obtain minimal representation 17

Related idea: Poisson Blending • A good blend should preserve gradients of source region without changing the background slide by Derek Hoiem Perez, Patrick, Gangnet, Michel, and Blake, Andrew.Poisson image editing. ACM Trans.Graph., 2003

MINIMAL IMAGE REPRESENTATION

INFERENCE FROM SIMPLIFICATION § For art gallery the minimal image representations contained paintings (81%) and pictures (58%); in amusement parks, carousel (75%), ride (64%), and roller coaster (50%); in bookstore, bookcase (96%), books (68%), and shelves (67%). § These results suggest that object detection is an important part of the representation built by the network to obtain discriminative informationfor scene classification. 20

VISUALIZING THE RECEPTIVE FIELDS § replicate each image many times with small random occluders (image patches of size 11×11) at different locations in the image. § from the K images, center the discrepancy map around the spatial location of the unit that caused the maximum activation for the given image. Then average the re- centered discrepancy maps 21

VISUALIZING THE RECEPTIVE FIELDS 200K images from scene centric Sun database + object-centric ImageNet 22

Estimating the Receptive Fields Actual size of RF is much smaller than the theoretic Estimated receptive fields size pool1 conv3 pool5 Segmentation using the RF of Units (Highlight the regions within the RF that have the highest value in the feature map. ) More semantically meaningful

IDENTIFYING THE SEMANTICS Workers provide tags without being constrained to a dictionary of terms to pre-calculated segments 3 tasks : (1)identify the concept (2) mark the set of images that do not fall into this theme (3) categorize the concept provided in (1) to one of 6 semantic groups ranging from low-level to high-level: from simple elements and colors to object parts and even scenes For each unit, measure its precision as the percentage of images that were selected as fitting the labeled concept 24

Annotating the Semantics of Units Top ranked segmented images are cropped and sent to Amazon Turk for annotation.

Annotating the Semantics of Units Pool5, unit 76; Label: ocean; Type: scene; Precision: 93%

Annotating the Semantics of Units Pool5, unit 13; Label: Lamps; Type: object; Precision: 84%

Annotating the Semantics of Units Pool5, unit 77; Label:legs; Type: object part; Precision: 96%

Annotating the Semantics of Units Pool5, unit 112; Label: pool table; Type: object; Precision: 70%

Annotating the Semantics of Units Pool5, unit 22; Label: dinner table; Type: scene; Precision: 60%

IDENTIFYING THE SEMANTICS consider only units that had a precision above 75% as provided by the AMT workers. Around 60% of the units on each layer where above that threshold. For both networks, units at the early layers (pool1, pool2) have more units responsive to simple elements and colors, while those at later layers (conv4, pool5) have more high-level semantics (responsive more to objects and scenes). 31

Distribution of Semantic Types at Each Layer Object detectors emerge within CNN trained to classify scenes, without any object supervision! Slide credit : Bolei Zhou

WHAT OBJECT CLASSES EMERGE? SUN database is used because dense object annotations are needed to study what the most informative object classes for scene categorization are, and what the natural object frequencies in scene images are. The segmentation shows the regions of the image for which the unit is above a certain threshold. Each unit seems to be selective to a particular appearance of the object. 33

WHAT OBJECT CLASSES EMERGE? § The categories found in pool5 tend to follow the target categories in ImageNet. § There are 6 units that detect lamps, each unit detecting a particular type of lamp providing finer-grained discrimination; § There are 9 units selective to people, each one tuned to different scales or people doing different tasks 35

AND WHY ? The correlation between object frequency in the database and § object frequency discovered by the units in pool5 is 0.54. The correlation between the number of scene categories a § particular object class is the most useful for and discovered object class is 0.84. Also, there are 115 units not detecting objects. This means that we § cannot rule out other representations being used in combination with objects. 36

Evaluation on SUN Database Correlation:0.53 Correlation:0.84

Evaluation on SUN Database Evaluate the performance of the emerged object detectors

CONCLUSION § Object detection emerges inside a CNN trained to recognize scenes. § Many more objects can be naturally discovered, in a supervised setting tuned to scene classification rather than object classification § Object localization in a single forward-pass § Besides taking the output of the last layer as feature, inner layers can Show different levels of abstraction. 39

FUTURE DIRECTION § In what other tasks, can we learn object classes or discriminative object detectors ? § Can we pop-up all object classes? 40

QUESTIONS Thank You 41

KEMAL ZMECLER (08.03.2016) 1 PRESENTATION TOPIC OBJECT DETECTORS - PowerPoint PPT Presentation

KEMAL ZMECLER (08.03.2016) 1 PRESENTATION TOPIC OBJECT DETECTORS EMERGE IN DEEP SCENE CNNS Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba International Conference on Learning Representations,2015. 2 OUTLINE

Ler y Qual i t y LERY SEAFOOD GROUP THOM AS JOHNSEN Lery Seafood Group Ler

CDS & BREXIT Lawrence Jasper Salim Kemal | AGENDA The new New obligation Brexit Customs

Turkey Peace at Home, Peace in theWorld Mustafa Kemal Atatrk is the founder and first

Development Status of Transuranic-Bearing Metal Fuels Kemal O. Pasamehmetoglu AFCI

Event Relations across Domains Jun Araki, Lamana Mulaffer, Arun Pandian, Yukari Yamakawa, Kemal

X10: New opportunities for X10: New opportunities for Compiler-Driven Performance

graduation Thesis in Turkey Dr. Kemal TUTUNCU Selcuk University Konya TURKEY Graduate Studies

Sequential Selection of Projects Kemal Grsoy Rutgers University, Department of MSIS, New

Exasc scale Scientifi fic c Co Computing The Road Ahead Kemal A. Delic Martin Antony

Versatile Test Reactor Project Status Kemal Pasamehmetoglu VTR Executive Director Advancing

2/17/2016 1 2/17/2016 2 2/17/2016 3 2/17/2016 4 2/17/2016 5 2/17/2016 6 2/17/2016 7

LIVING WELL WITH DEMENTIA: A FUNDAMENTAL HUMAN RIGHT Professor Peter Mi+ler CBE Human Rights

and the LAL (Endotoxin Test) IVT Micro Week Meeting Cheryl Platco 1 14-16 Jun2017 Phila PA

Gifted and Gi and Talen alented ed Pres escho hooler ler Ob Obser ervation on and iden

1 2 1994 94 WM Support ort Group p started ed by Arnie ie Smokler ler with 21 members

Building Soil Sustainably Sea ean Smukle ler Gulf f Isl slands ds Food Co Co-op: Feb

CSE440: Introduction to HCI Methods for Design, Prototyping and Evaluating User Interaction

Chapter 11 Marketing and the Arts Management & the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter

Design Principles Chapter 14 Computer Security: Art and Science , 2 nd Edition Version 1.0 Slide

EDA Design Automation Prof. Ulf Schlichtmann Abacus: Fast Legalization of Standard Cell

2. Early Writing Systems Cuneiform, early Linear Scripts Origins of the

61A Lecture 35 Homework 12 due Tuesday 12/10 @ 11:59pm. ! All you have to do is vote on your

GENIE Systematic Errors GENIE Systematic Errors GENIE Systematic Errors Hugh Gallagher, Tufts

NEOLITHIC AGE Desklo settlement (c. 10 Kyrs ago) Some 10,000 yrs ago as the last ice age

KEMAL ZMECLER (08.03.2016) 1 PRESENTATION TOPIC OBJECT DETECTORS - PowerPoint PPT Presentation

KEMAL ZMECLER (08.03.2016) 1 PRESENTATION TOPIC OBJECT DETECTORS EMERGE IN DEEP SCENE CNNS Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba International Conference on Learning Representations,2015. 2 OUTLINE

Ler y Qual i t y LERY SEAFOOD GROUP THOM AS JOHNSEN Lery Seafood Group Ler

CDS &amp; BREXIT Lawrence Jasper Salim Kemal | AGENDA The new New obligation Brexit Customs

Turkey Peace at Home, Peace in theWorld Mustafa Kemal Atatrk is the founder and first

Development Status of Transuranic-Bearing Metal Fuels Kemal O. Pasamehmetoglu AFCI

Event Relations across Domains Jun Araki, Lamana Mulaffer, Arun Pandian, Yukari Yamakawa, Kemal

X10: New opportunities for X10: New opportunities for Compiler-Driven Performance

graduation Thesis in Turkey Dr. Kemal TUTUNCU Selcuk University Konya TURKEY Graduate Studies

Sequential Selection of Projects Kemal Grsoy Rutgers University, Department of MSIS, New

Exasc scale Scientifi fic c Co Computing The Road Ahead Kemal A. Delic Martin Antony

Versatile Test Reactor Project Status Kemal Pasamehmetoglu VTR Executive Director Advancing

2/17/2016 1 2/17/2016 2 2/17/2016 3 2/17/2016 4 2/17/2016 5 2/17/2016 6 2/17/2016 7

LIVING WELL WITH DEMENTIA: A FUNDAMENTAL HUMAN RIGHT Professor Peter Mi+ler CBE Human Rights

and the LAL (Endotoxin Test) IVT Micro Week Meeting Cheryl Platco 1 14-16 Jun2017 Phila PA

Gifted and Gi and Talen alented ed Pres escho hooler ler Ob Obser ervation on and iden

1 2 1994 94 WM Support ort Group p started ed by Arnie ie Smokler ler with 21 members

Building Soil Sustainably Sea ean Smukle ler Gulf f Isl slands ds Food Co Co-op: Feb

CSE440: Introduction to HCI Methods for Design, Prototyping and Evaluating User Interaction

Chapter 11 Marketing and the Arts Management &amp; the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter

Design Principles Chapter 14 Computer Security: Art and Science , 2 nd Edition Version 1.0 Slide

EDA Design Automation Prof. Ulf Schlichtmann Abacus: Fast Legalization of Standard Cell

2. Early Writing Systems Cuneiform, early Linear Scripts Origins of the

61A Lecture 35 Homework 12 due Tuesday 12/10 @ 11:59pm. ! All you have to do is vote on your

GENIE Systematic Errors GENIE Systematic Errors GENIE Systematic Errors Hugh Gallagher, Tufts

NEOLITHIC AGE Desklo settlement (c. 10 Kyrs ago) Some 10,000 yrs ago as the last ice age

CDS & BREXIT Lawrence Jasper Salim Kemal | AGENDA The new New obligation Brexit Customs

Chapter 11 Marketing and the Arts Management & the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter