PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000

Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

Introduction • Perception provides agents with information about the world they inhabit. • A sensor is anything that can change the computational state of the agent in response to a change in the state of the world. • The sensors that agents share with humans are vision, hearing, and touch.

Introduction • The main focus of the sensors will be on the processing of the raw information that they provide. • Where S is the sensory stimulus and W is the world. S=f(W) • In order to gain information about the world we can take the straightforward approach and invert the equation. W=f -1 (S)

Introduction • A drawback of the straightforward approach is that it is trying to solve too difficult a problem. • In many cases, the agent does not need to know everything about the world. • Sometimes just one or two predicates are needed.

Introduction Some of the possible uses for Vision: • Manipulation – Grasping, insertion, needs local shape information and feedback for motor control. • Navigation – Finding clear paths, avoiding obstacles, calculating one’s current velocity and orientation. • Object Recognition – A useful skill for distinguishing between multiple objects.

Outline 9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation

Image Formation • Vision works by gathering light scattered from objects in the scene and creating a 2-D image. • It’s important to the understand the geometry of the process in order to obtain information about the scene.

Image Formation

Image Formation Perspective Project Equations -x/f = X/Z, -y/f = Y/Z => x = (-fX)/Z, y = (-fY)/Z

Image Formation • The Perspective projection is often approximated using orthographic projection, but there is an important difference. • The Orthographic projection does not project vectors through a pinhole. • Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.

Lens Systems • Both human and artificial eyes use a lens. • The lens is wider than a pinhole, allowing more light to enter, increasing the information collected. • The human eye focuses by bending the shape of the lens. •Artificial eyes focus by changing the distance between the lens and the image plane.

Photometry of Image Formation

Photometry of Image Formation • A processed image plane contains a brightness value for each pixel. • The brightness of a pixel p in the image is proportional to amount of light directed toward the camera by the surface patch Sp that projects to pixel p. • The light is characterized as being either Diffuse or Specular reflection.

Photometry of Image Formation • Diffuse reflection redirects light equally in all directions, and is common for dull surfaces. • It is described by the following equation, known as Lambert's formula: E = p E0cos(theta) where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.

Photometry of Image Formation • Phong's formula: E = p E0cosm (theta) • p is the coefficient of Specular reflection • E0 is the intensity of the light source • m is the 'shininess' of the surface • (theta) is the angle between the light direction and surface normal.

Photometry of Image Formation • In real life, surfaces exhibit a combination of diffuse and specular properties. • Modeling this on the computer is what computer graphics is all about. •Rendering realistic images is usually done by ray tracing.

Outline 9.3: Image Processing Operations for Early Vision Edge Detection

Image-Processing Operations Edge Detection • Edges are curves in the image plane across which there is a “significant” change in image brightness. • The goal of edge detection is the construction of an idealized line drawing

Image-Processing Operations •One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change •Consider a 1-D example. Below is an intensity profile for a 1-D image.

Image-Processing Operations •Below we have the derivative of the previous graph. •Here we have a peak at x=18, x=50 and x=75. •These errors are due to the presence of noise in the image.

Image-Processing Operations • This problem is countered by convolving a smoothing function along with the differentiation operation. • The mathematical concept of convolution allows us to perform many useful image-processing operations.

Image-Processing Operations • One standard form of smoothing is to use a Gaussian function. • Now using the idea of convolving with the Gaussian function we can revisit the 1-D example.

Image-Processing Operations • With the convolving applied we can more easily see the edge at x=50. Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.

Image-Processing Operations •Here is an example of using convolving in an 2-D picture of Mona Lisa

Outline 9.4: Extracting 3D Information using Vision Motion Binocular Stereopsis Texture Gradient Shading Contour

Extracting 3-D Information Using Vision We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. Three aspects: 1.Segmentation 2.Position & Orientation 3.Shape To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.

Extracting 3-D Information Using Vision Motion • Optical Flow - resulting motion when a camera moves relative to the 3-D scene.

Extracting 3-D Information Using Vision • To measure Optical Flow, we need to find corresponding points between one time frame and the next. • One formula is Sum of Squared Differences (SSD) SSD(D x , D y ) = ∑ (x,y) (I(x, y, t) - I(x+D x , y+D y , t+D t )) 2

Extracting 3-D Information Using Vision The other formula to show this is Cross-Correlation(CC): CC(D x , D y ) = ∑ (x,y) I(x, y, t)I(x+D x , y+D y , t+D t ) • Cross-Correlation works best when there is texture in the scene. Because there is a significant brightness variation among the pixels.

Extracting 3-D Information Using Vision Binocular Stereopsis •Binocular stereopsis uses multiple images in space. Where as motion used multiple images over time. • Because the scenes will be in a different places relative to the z-axis, if we superpose the two images, there will be disparity in the location of important features.

Extracting 3-D Information Using Vision • This also allows us to easily determine depth. Knowing the distance between the cameras, and the point at which their lines of sight intersect, it only requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.

Extracting 3-D Information Using Vision Texture Gradient • Texture refers to a spatially repeating pattern on a surface that can be sensed visually. •In the images, the apparent size, shape, spacing of the repeating texture elements(texels) vary.

Extracting 3-D Information Using Vision The two main causes for this variation in size are: • Varying distance from the camera to the different texture elements. • Varying orientation of the texel relative to the line of sight from the camera. It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.

Extracting 3-D Information Using Vision Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.

Extracting 3-D Information Using Vision Shading • The variation in the intensity of light received from different portions of a surface in the scene. • Given the image brightness, I (x, y), our hope is to recover the scene geometry and the reflectance properties of the object. •But this has proved difficult to do in anything but the simplest cases.

PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 - PowerPoint PPT Presentation

PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Overview n Perception for robotics Page 1 Overview n Perception for robotics Overview

Intro to Perception Dr. Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Spring

Perception of Affordances Perception of Affordances Final Status of Work Final Status of Work

An Estimating System For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION

Doing Business in Brazil Perception vs Reality Perception x Reality March, 2019 CONTENT

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits Nearey & Smits: Perception of

Human Perception and Memory Semester 2, 2009 1 Vision Human Visual Perception Humans are

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

Psychology 101 Coupling between action and perception Action for perception Action

Chapter 6: Space & Depth Perception Lec 12 Jonathan Pillow, Sensation & Perception (PSY

Intro to Perception Instructor: Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325)

Perception, Planning and Control F1/10 th Racing Perception, planning and control Localization

Deep Nets: What have they ever done for Vision? Alan Yuille Dept. Cognitive Science and

Visual cortex as a general-purpose information-processing device Dr. James A. Bednar Institute

Introduction Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China

Clavicle Opposable thumbs Fingernails Binocular & colour vision Generalised

So Solu lution Backg kground Human skin-surface temperature is an important indicator of

Future of Global Digital Reality Market, Forecasted to 2021 Emerging Applications of Augmented

Intro to Human Visual System Why Should We Be Interested In and Displays Visualization

The Binding Problem(s) 8/25/2010 9:38 AM Jerome Feldman Abstract The neural binding problem

Sambuz

Useful Links

Newsletter

Mail Us