Object Recognition and Scene Understanding MIT student - PowerPoint PPT Presentation

Object Recognition and Scene Understanding MIT student presentation 6.870

6.870 Template matching and histograms Nicolas Pinto

Introduction

Hosts a guy... Antonio T... a frog... (who has big arms) (who knows a lot about vision) (who has big eyes) and thus should know a lot about vision...

Object Recognition from Local Scale-Invariant Features David G. Lowe Lowe Computer Science Department 3 papers University of British Columbia Vancouver, B.C., V6T 1Z4, Canada (1999) lowe@cs.ubc.ca Abstract translation, scaling, and rotation, and partially invariant to illumination changes and affine or 3D projection. Previous An object recognition system has been developed that uses a approaches to local feature generation lacked invariance to new class of local image features. The features are invariant scale and were more sensitive to projective distortion and to image scaling, translation,and rotation, and partially in- illumination change. The SIFT features share a number of variant to illuminationchanges and affine or 3D projection. properties in common with the responses of neurons in infe- Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs INRIA Rhˆ one-Alps, 655 avenue de l’Europe, Montbonnot 38334, France { Navneet.Dalal,Bill.Triggs } @inrialpes.fr, http://lear.inrialpes.fr Nalal and Triggs Abstract We briefly discuss previous work on human detection in § 2, give an overview of our method § 3, describe our data (2005) We study the question of feature sets for robust visual ob- sets in § 4 and give a detailed description and experimental ject recognition, adopting linear SVM based human detec- evaluation of each stage of the process in § 5–6. The main tion as a test case. After reviewing existing edge and gra- conclusions are summarized in § 7. dient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors sig- 2 Previous Work nificantly outperform existing feature sets for human detec- There is an extensive literature on object detection, but tion. We study the influence of each stage of the computation here we mention just a few relevant papers on human detec- A Discriminatively Trained, Multiscale, Deformable Part Model Pedro Felzenszwalb David McAllester Deva Ramanan University of Chicago Toyota Technological Institute at Chicago UC Irvine Felzenszwalb et al. pff@cs.uchicago.edu mcallester@tti-c.org dramanan@ics.uci.edu (2008) Abstract This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the yey !! 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part Figure 1. Example detection obtained with the person model. The models have become quite popular, their value had not been model is defined by a coarse template, several higher resolution demonstrated on difficult benchmarks such as the PASCAL

Object Recognition from Local Scale-Invariant Features David G. Lowe Lowe Computer Science Department University of British Columbia Vancouver, B.C., V6T 1Z4, Canada (1999) lowe@cs.ubc.ca Abstract translation, scaling, and rotation, and partially invariant to illumination changes and affine or 3D projection. Previous An object recognition system has been developed that uses a approaches to local feature generation lacked invariance to new class of local image features. The features are invariant scale and were more sensitive to projective distortion and to image scaling, translation,and rotation, and partially in- illumination change. The SIFT features share a number of variant to illuminationchanges and affine or 3D projection. properties in common with the responses of neurons in infe- Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs INRIA Rhˆ one-Alps, 655 avenue de l’Europe, Montbonnot 38334, France { Navneet.Dalal,Bill.Triggs } @inrialpes.fr, http://lear.inrialpes.fr Nalal and Triggs Abstract We briefly discuss previous work on human detection in § 2, give an overview of our method § 3, describe our data (2005) We study the question of feature sets for robust visual ob- sets in § 4 and give a detailed description and experimental ject recognition, adopting linear SVM based human detec- evaluation of each stage of the process in § 5–6. The main tion as a test case. After reviewing existing edge and gra- conclusions are summarized in § 7. dient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors sig- 2 Previous Work nificantly outperform existing feature sets for human detec- There is an extensive literature on object detection, but tion. We study the influence of each stage of the computation A Discriminatively Trained, Multiscale, Deformable Part Model Pedro Felzenszwalb David McAllester Deva Ramanan University of Chicago Toyota Technological Institute at Chicago UC Irvine Felzenszwalb et al. pff@cs.uchicago.edu mcallester@tti-c.org dramanan@ics.uci.edu (2008) Abstract This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part Figure 1. Example detection obtained with the person model. The models have become quite popular, their value had not been model is defined by a coarse template, several higher resolution demonstrated on difficult benchmarks such as the PASCAL

Scale-Invariant Feature Transform (SIFT) adapted from Kucuktunc

Scale-Invariant Feature Transform (SIFT) adapted from Brown, ICCV 2003

SIFT local features are invariant ... adapted from David Lee

like me they are robust ... Text ... to changes in illumination, noise, viewpoint, occlusion, etc.

I am sure you want to know how to build them 1. find interest points or “keypoints” Text 2. find their dominant orientation 3. compute their descriptor 4. match them on other images

1. find interest points or “keypoints” Text

keypoints are taken as maxima/minima of a DoG pyramid Text in this settings, extremas are invariant to scale...

a DoG (Difference of Gaussians) pyramid is simple to compute... even him can do it! before after adapted from Pallus and Fleishman

then we just have to find neighborhood extremas in this 3D DoG space if a pixel is an extrema in its neighboring region he becomes a candidate keypoint

too many keypoints? 1. remove low contrast 2. remove edges adapted from wikipedia

Text 2. find their dominant orientation

each selected keypoint is assigned to one or more “dominant” orientations... ... this step is important to achieve rotation invariance

How? using the DoG pyramid to achieve scale invariance: a. compute image gradient magnitude and orientation b. build an orientation histogram c. keypoint’s orientation(s) = peak(s)

a. compute image gradient magnitude and orientation

b. build an orientation histogram adapted from Ofir Pele

c. keypoint’s orientation(s) = peak(s) * * the peak ;-)

Text 3. compute their descriptor

SIFT descriptor = a set of orientation histograms 16x16 neighborhood 4x4 array x 8 bins of pixel gradients = 128 dimensions (normalized)

Text 4. match them on other images

How to atch? nearest neighbor hough transform voting least-squares fit etc.

SIFT is great! Text \\ invariant to affine transformations \\ easy to understand \\ fast to compute

Extension example: Spatial Pyramid Matching using SIFT Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories Svetlana Lazebnik 1 Cordelia Schmid 2 Jean Ponce 1 , 3 slazebni@uiuc.edu Cordelia.Schmid@inrialpes.fr ponce@cs.uiuc.edu Text 1 Beckman Institute 2 INRIA Rhˆ 3 Ecole Normale Sup´ one-Alpes erieure University of Illinois Montbonnot, France Paris, France CVPR 2006

Object Recognition and Scene Understanding MIT student - PowerPoint PPT Presentation

Object Recognition and Scene Understanding MIT student presentation 6.870 6.870 Template matching and histograms Nicolas Pinto Introduction Hosts a guy... Antonio T... a frog... (who has big arms) (who knows a lot about vision) (who

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Scene Recognition Scene Recognition Adriana Kovashka Adriana Kovashka UTCS, PhD student UTCS,

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

Supervised object recognition, unsupervised object recognition then Perceptual organization Bill

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

What is a Chair? The object The texture The object The texture The scene The object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

SPP Kickoff Meeting - May 17, 2017 Mountain-Building Processes in 4-Dimensions (4D-MB) SPP 2017

Serverless Python Serverless Python Michael Bright , Trainer @mjbright Consulting , Trainer

Documentation of the glacier retreat in the eastern part of the Granatspitz Mountains (Austrian

Stochastic conversions of TeV photons into axion- like particles in extragalactic magnetic fields

REST, Hypermedia and the Semantic Gap Why "RMM Level-3" is not good enough.

Creating Special Moments through Parent Support Group at ALPS What is Parent Support Group

Enhancing Gloss-Based Corpora with Facial Features Using Active Appearance Models Christoph

Summarization Evaluation & Systems Ling573 Systems and Applications April 4, 2017 Roadmap

Object Recognition and Scene Understanding MIT student - PowerPoint PPT Presentation

Object Recognition and Scene Understanding MIT student presentation 6.870 6.870 Template matching and histograms Nicolas Pinto Introduction Hosts a guy... Antonio T... a frog... (who has big arms) (who knows a lot about vision) (who

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

Scene Recognition Scene Recognition Adriana Kovashka Adriana Kovashka UTCS, PhD student UTCS,

MIT MIT S EMINAR ON S EMINAR ON MIT ESD.69 EMINAR ON EMINAR ON MIT HST.926 H EALTH EALTH C ARE

Supervised object recognition, unsupervised object recognition then Perceptual organization Bill

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari &amp; Christian Rupprecht Technical

What is a Chair? The object The texture The object The texture The scene The object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

SPP Kickoff Meeting - May 17, 2017 Mountain-Building Processes in 4-Dimensions (4D-MB) SPP 2017

Serverless Python Serverless Python Michael Bright , Trainer @mjbright Consulting , Trainer

Documentation of the glacier retreat in the eastern part of the Granatspitz Mountains (Austrian

Stochastic conversions of TeV photons into axion- like particles in extragalactic magnetic fields

REST, Hypermedia and the Semantic Gap Why &quot;RMM Level-3&quot; is not good enough.

Creating Special Moments through Parent Support Group at ALPS What is Parent Support Group

Enhancing Gloss-Based Corpora with Facial Features Using Active Appearance Models Christoph

Summarization Evaluation &amp; Systems Ling573 Systems and Applications April 4, 2017 Roadmap

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

REST, Hypermedia and the Semantic Gap Why "RMM Level-3" is not good enough.

Summarization Evaluation & Systems Ling573 Systems and Applications April 4, 2017 Roadmap