O b j e c t R e c o g n i t i o n S I F T v s - PowerPoint PPT Presentation

Department of Informatics Intelligent Robotics WS 2015/16 23.11.2015 O b j e c t R e c o g n i t i o n S I F T v s C o n v o l u t i o n a l N e u r a l N e t w o r k s Josip Josifovski 4josifov@informatik.uni-hamburg.de

Outline ● Object recognition : ● Definition, problem, human vision system and machine vision ● Scale Invariant Feature Transform (SIFT) ● Algorithm details and example ● Convolutional Neural Networks (CNNs) ● Algorithm details and example ● Comparison of SIFT and CNN ● Biological plausibility, complexity, resources and applicability ● Summary 23.11.2015 Object recognition - SIFT vs CNNs 2

Object recognition - Definition "The term recognition has been used to refer to many different visual capabilities, including identification , categorization and discrimination . Normally, when we speak of recognizing an object we mean that we have successfully categorized as an instance of a particular object class." Liter, Jeffrey C., and Heinrich H. Bülthoff. "An introduction to object recognition."Zeitschrift für Naturforschung C 53.7-8 (1998): 610-621. Identification – equality on a physical level Categorization – assigning an object to some category, as humans do Discrimination – classification , assigning an object to one class 23.11.2015 Object recognition - SIFT vs CNNs 3

Object recognition – Problem http://www.kyb.tuebingen.mpg.de/typo3temp/pics/915b4f5fb5.jpg 23.11.2015 Object recognition - SIFT vs CNNs 4

How humans do it? Easy task for human ● Two pathways for processing of visual ● input in the brain: Ventral pathway ● Dorsal pathway ● Hierarchical processing in the cortex: ● Increasing receptive fields ● Increasing complexity of details ● Kruger, Norbert, et al. "Deep hierarchies in the primate visual cortex: What can we learn for computer vision?." Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.8 (2013): 1847-1871. 23.11.2015 Object recognition - SIFT vs CNNs 5

How machines do it? Hard task for machine ● Different transformations, distortions, scene conditions, viewing ● angles http://manual.qooxdoo.org/2.0.4/_images/Transform.png http://starizona.com/acb/basics/optics/distortion.jpg Most often recognition is done by extracting local features of object ● and trying to match them with features of unknown object 23.11.2015 Object recognition - SIFT vs CNNs 6

Scale Invariant Feature Transform Published by David G. Lowe in 1999 ● Invariant to scaling , rotation and translation ● Partially invariant to illumination changes or affine ● or 3D projection Transforms an image into a large collection of local ● feature vectors ( local descriptors called SIFT keys ) Patented – University of British Columbia ● Lowe, David G. "Object recognition from local scale-invariant features."Computer vision, 1999. The proceedings of the seventh IEEE international conference on. Vol. 2. Ieee, 1999. 23.11.2015 Object recognition - SIFT vs CNNs 7

SIFT steps 1) Scale-space extrema detection 2) Key-point localization Convolving image with Gaussian kernel repeatedly to Finding the extrema (maxima or minima at each ● ● get more and more blurred version of the image level of the pyramid) Calculating the difference image (DoG) as Comparing the extrema to layers above or below to ● ● approximation to Laplacian of Gaussian (LoG) check if it is stable http://docs.opencv.org/master/sift_local_extrema.jpg http://docs.opencv.org/master/sift_dog.jpg 23.11.2015 Object recognition - SIFT vs CNNs 8

SIFT steps (cont) 3) Orientation assignment 4) Description generation Calculation of gradient magnitude and orientation at ● Consider an 8-pixel radius (16x16) around a key-point ● each pixel of the smoothed images in the pyramid in the pyramid level at which the key is detected Determining each key-point's orientation by calculating ● Calculate an 8-bin orientation histogram for each 4x4 ● orientation histogram of its neighborhood region. The descriptor is the 128-dimensional vector containing the histogram values of the 16 regions. 5) Indexing and matching Creating a hash table (dictionary) with descriptors of ● sample images Descriptors extracted from a new image are matched ● to the ones from the dictionary to recognize objects http://www.codeproject.com/KB/recipes/619039/SIFT.JPG 23.11.2015 Object recognition - SIFT vs CNNs 9

SIFT - Example https://www.youtube.com/watch?v=3dY4uvSwiwE 23.11.2015 Object recognition - SIFT vs CNNs 10

Convolutional Neural Network (CNN) ● Follows the principles of visual processing in the brain ● Basic idea introduced by Fukushima in the 1980s ● Improved by Jan LeCunn , most popular model LeNet ● Convolutional neural networks have recently become very popular in image and video processing Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541-551, 1989. 23.11.2015 Object recognition - SIFT vs CNNs 11

CNN – the architecture ● Basic principles: ● Layer types: ● Training: ● input layer ● Backpropagation ● local receptive fields ● weight sharing ● convolutional layer ● Adaptive weights ● subsampling ● subsampling layer ● output layer Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541-551, 1989. 23.11.2015 Object recognition - SIFT vs CNNs 12

CNN – features and feature maps Different feature extractors (filters) emerge at different layers during the training of the ● network Low layer features: lines, contrast, color ● Medium layer features: corners or other edge/color conjunctions, textures ● High layer features: more complex, class specific ● Low level feature Medium level feature High level feature Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014. Springer International Publishing, 2014. 818-833. 23.11.2015 Object recognition - SIFT vs CNNs 13

CNN – Example 1) LeNet 5 http://yann.lecun.com/exdb/lenet/index.html 2) ImageNet 2014: Interface for comparing human performance with the winner GoogLeNet http://cs.stanford.edu/people/karpathy/ilsvrc/ 23.11.2015 Object recognition - SIFT vs CNNs 14

Comparison of SIFT and CNN Biological plausability: Since the most sophisticated vision system is the human one, the intuition is to understand it and apply its elements in computer vision CNN SIFT It is a neural network model, it has Neurons in the inferior temporal cortex ● ● that respond to complex, scale invariant been inspired by the way brain features works. The feature extraction and learning The way of feature extraction and ● ● process of SIFT is very different generalization from simple to complex than the processing in the human is much more similar to the brain processing in the human visual system 23.11.2015 Object recognition - SIFT vs CNNs 15

Comparison of SIFT and CNN (cont.) Complexity and demand for resources: Design complexity, processing power and memory demands, training set, speed of output CNN SIFT Needs experience to make design Simpler deign and less parameters ● ● decisions to set compared to CNN High demand for processing during Less processing power needed, ● ● the training phase, memory needed to memory needed for storing features store the weights of the network for each image The bigger the training set the better Smaller training set ● ● Slower than SIFT Fast ● ● 23.11.2015 Object recognition - SIFT vs CNNs 16

Comparison of SIFT and CNN (cont.) Applicability: Range of problems and scenarios in which SIFT and CNN can be applied. CNN SIFT More relevant for classification and More relevant for identification tasks ● ● categorization tasks, has very good SIFT and SIFT like descriptors are ● generalization abilities used in vide range of vision tasks Currently very popular model for ● Can be used for real time scenarios ● image and video tasks 23.11.2015 Object recognition - SIFT vs CNNs 17

CNN and SIFT - Pros & Cons Good Bad ● Poor generalization ● Identification tasks SIFT ● Not robust to non- ● Simple to implement linear transformations ● Fast ● Lots of processing ● Classification tasks power CNN ● Strongly bio-inspired ● Big training datasets ● Very good ● Parameters to set generalization 23.11.2015 Object recognition - SIFT vs CNNs 18

Questions? Thank you for the attention 23.11.2015 Object recognition - SIFT vs CNNs 19

O b j e c t R e c o g n i t i o n S I F T v s - PowerPoint PPT Presentation

Department of Informatics Intelligent Robotics WS 2015/16 23.11.2015 O b j e c t R e c o g n i t i o n S I F T v s C o n v o l u t i o n a l N e u r a l N e t w o r k s Josip Josifovski

SIFT keypoint detection D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV

Local Feature Descriptors SIFT Various slides from previous courses by: D.A. Forsyth (Berkeley /

SIFT Features and Hough Transform Various slides from previous courses by: D.A. Forsyth (Berkeley

CS4495/6495 Introduction to Computer Vision 4B-L1 SIFT descriptor Point Descriptors Last

Object Recognition using Invariant Local Features Goal: Identify known objects in new images

Review: Matt Brown s Canonical Frames 4/15/2011 2 Multi-Scale Oriented Patches Extract

Feature Detection and Matching Shao-Yi Chien Department of Electrical Engineering

Local Feature Extraction and Learning for Computer Vision Bin Fan, Chinese Academy of Sciences,

Efficient visual search of local features Cordelia Schmid Bag-of-features

Matching and Image Alignment Computer Vision Fall 2018 Columbia University Feature Matching

Boosted Cascade of Simple Features Paul Viola and Michael Jones CVPR 2001 Brendan Morris

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II:

CS201 Computer Vision Lect 08: SIFT Keypoint Detection John Magee 23 Septermber 2014 Slides

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Scale Invariant Region Selection and SIFT Sung-Eui Yoon ( ) Course URL:

Scalable SIFT for NUMA with Actors Frank Feinbube , Lena Herscheid, Christoph Neijenhuis, Peter

Heaps and Heapsort 1 October 2020 OSU CSE 1 Heaps A heap is a binary tree of T that

Computational Photography Si Lu Spring 2018 http://web.cecs.pdx.edu/~lusi/CS510/CS510_Computati

ImageProof: Enabling Authentication for Large-Scale Image Retrieval Shangwei Guo 1 Jianliang Xu 1

E9 205 Machine Learning for Signal Processing 23-8-17 Outline Basics for Image Processing

3D Vision Viktor Larsson Spring 2019 Schedule Feb 18 Introduction Feb 25 Geometry, Camera

Interactive Image Mining Annie Morin 1 , Nguyen-Khang Pham 1,2 1 TEXMEX/IRISA 2 Cantho

Learning Representations for Visual Object Class Recognition Marcin Marszaek Cordelia Schmid

Texture and materials Subhransu Maji CMPSCI 670: Computer Vision December 1, 2016 CMPSCI 670