PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 - - PowerPoint PPT Presentation

perception
SMART_READER_LITE
LIVE PREVIEW

PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 - - PowerPoint PPT Presentation

PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation


slide-1
SLIDE 1

Damien Blond Alim Fazal Tory Richard April 11th, 2000

PERCEPTION

slide-2
SLIDE 2

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-3
SLIDE 3

Introduction

  • Perception provides agents with information about the

world they inhabit.

  • A sensor is anything that can change the computational

state of the agent in response to a change in the state of the world.

  • The sensors that agents share with humans are vision,

hearing, and touch.

slide-4
SLIDE 4

Introduction

  • The main focus of the sensors will be on the processing
  • f the raw information that they provide.
  • Where S is the sensory stimulus and W is the world.

S=f(W)

  • In order to gain information about the world we can take

the straightforward approach and invert the equation. W=f-1(S)

slide-5
SLIDE 5

Introduction

  • A drawback of the straightforward approach is that it is

trying to solve too difficult a problem.

  • In many cases, the agent does not need to know

everything about the world.

  • Sometimes just one or two predicates are needed.
slide-6
SLIDE 6

Introduction

Some of the possible uses for Vision:

  • Manipulation – Grasping, insertion, needs local shape

information and feedback for motor control.

  • Navigation – Finding clear paths, avoiding obstacles,

calculating one’s current velocity and orientation.

  • Object Recognition – A useful skill for distinguishing

between multiple objects.

slide-7
SLIDE 7

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-8
SLIDE 8

Outline

9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation

slide-9
SLIDE 9

Image Formation

  • Vision works by gathering light scattered from objects in

the scene and creating a 2-D image.

  • It’s important to the understand the geometry of the

process in order to obtain information about the scene.

slide-10
SLIDE 10

Image Formation

slide-11
SLIDE 11

Image Formation

Perspective Project Equations

  • x/f = X/Z, -y/f = Y/Z

=> x = (-fX)/Z, y = (-fY)/Z

slide-12
SLIDE 12

Image Formation

  • The Perspective projection is often approximated using
  • rthographic projection, but there is an important

difference.

  • The Orthographic projection does not project vectors

through a pinhole.

  • Instead, the vectors run parallel, either perpendicular to
  • r at a consistent angle from the image plane.
slide-13
SLIDE 13

Lens Systems

  • Both human and artificial eyes use a lens.
  • The lens is wider than a pinhole, allowing more light to

enter, increasing the information collected.

  • The human eye focuses by bending the shape of the lens.
  • Artificial eyes focus by changing the distance between the

lens and the image plane.

slide-14
SLIDE 14

Photometry of Image Formation

slide-15
SLIDE 15

Photometry of Image Formation

slide-16
SLIDE 16

Photometry of Image Formation

  • A processed image plane contains a brightness value for

each pixel.

  • The brightness of a pixel p in the image is proportional to

amount of light directed toward the camera by the surface patch Sp that projects to pixel p.

  • The light is characterized as being either Diffuse or

Specular reflection.

slide-17
SLIDE 17

Photometry of Image Formation

  • Diffuse reflection redirects light equally in all directions,

and is common for dull surfaces.

  • It is described by the following equation, known as

Lambert's formula: E = p E0cos(theta) where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.

slide-18
SLIDE 18

Photometry of Image Formation

  • Phong's formula:

E = p E0cosm (theta)

  • p is the coefficient of Specular reflection
  • E0 is the intensity of the light source
  • m is the 'shininess' of the surface
  • (theta) is the angle between the light direction and

surface normal.

slide-19
SLIDE 19

Photometry of Image Formation

  • In real life, surfaces exhibit a combination of diffuse and

specular properties.

  • Modeling this on the computer is what computer

graphics is all about.

  • Rendering realistic images is usually done by ray tracing.
slide-20
SLIDE 20

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-21
SLIDE 21

Outline

9.3: Image Processing Operations for Early Vision Edge Detection

slide-22
SLIDE 22

Image-Processing Operations

Edge Detection

  • Edges are curves in the image plane across which there is

a “significant” change in image brightness.

  • The goal of edge detection is the construction of an

idealized line drawing

slide-23
SLIDE 23

Image-Processing Operations

  • One idea to detect edges is to differentiate the image and

look for places where the brightness undergoes a sharp change

  • Consider a 1-D example. Below is an intensity profile for

a 1-D image.

slide-24
SLIDE 24

Image-Processing Operations

  • Below we have the derivative of the previous graph.
  • Here we have a peak at x=18, x=50 and x=75.
  • These errors are due to the presence of noise in the image.
slide-25
SLIDE 25

Image-Processing Operations

  • This problem is countered by convolving a smoothing

function along with the differentiation operation.

  • The mathematical concept of convolution allows us to

perform many useful image-processing operations.

slide-26
SLIDE 26

Image-Processing Operations

  • One standard form of smoothing is to use a Gaussian

function.

  • Now using the idea of convolving with the Gaussian function

we can revisit the 1-D example.

slide-27
SLIDE 27

Image-Processing Operations

  • With the convolving applied we can more easily see the

edge at x=50. Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.

slide-28
SLIDE 28

Image-Processing Operations

  • Here is an example of using convolving in an 2-D picture
  • f Mona Lisa
slide-29
SLIDE 29

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-30
SLIDE 30

Outline

9.4: Extracting 3D Information using Vision Motion Binocular Stereopsis Texture Gradient Shading Contour

slide-31
SLIDE 31

Extracting 3-D Information Using Vision

We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. Three aspects: 1.Segmentation 2.Position & Orientation 3.Shape To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.

slide-32
SLIDE 32

Extracting 3-D Information Using Vision

Motion

  • Optical Flow - resulting motion when a camera moves

relative to the 3-D scene.

slide-33
SLIDE 33

Extracting 3-D Information Using Vision

  • To measure Optical Flow, we need to find corresponding

points between one time frame and the next.

  • One formula is Sum of Squared Differences (SSD)

SSD(Dx, Dy) = ∑(x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2

slide-34
SLIDE 34

Extracting 3-D Information Using Vision

The other formula to show this is Cross-Correlation(CC): CC(Dx, Dy) = ∑(x,y) I(x, y, t)I(x+Dx, y+Dy, t+Dt)

  • Cross-Correlation works best when there is texture in the
  • scene. Because there is a significant brightness variation

among the pixels.

slide-35
SLIDE 35

Extracting 3-D Information Using Vision

Binocular Stereopsis

  • Binocular stereopsis uses multiple images in space.

Where as motion used multiple images over time.

  • Because the scenes will be in a different places relative

to the z-axis, if we superpose the two images, there will be disparity in the location of important features.

slide-36
SLIDE 36

Extracting 3-D Information Using Vision

  • This also allows us to

easily determine depth. Knowing the distance between the cameras, and the point at which their lines

  • f sight intersect, it only

requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.

slide-37
SLIDE 37

Extracting 3-D Information Using Vision

Texture Gradient

  • Texture refers to a spatially

repeating pattern on a surface that can be sensed visually.

  • In the images, the apparent

size, shape, spacing of the repeating texture elements(texels) vary.

slide-38
SLIDE 38

Extracting 3-D Information Using Vision

The two main causes for this variation in size are:

  • Varying distance from the camera to the different texture

elements.

  • Varying orientation of the texel relative to the line of

sight from the camera. It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.

slide-39
SLIDE 39

Extracting 3-D Information Using Vision

Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.

slide-40
SLIDE 40

Extracting 3-D Information Using Vision

Shading

  • The variation in the intensity of light received from

different portions of a surface in the scene.

  • Given the image brightness, I (x, y), our hope is to

recover the scene geometry and the reflectance properties

  • f the object.
  • But this has proved difficult to do in anything but the

simplest cases.

slide-41
SLIDE 41

Extracting 3-D Information Using Vision

  • The main problem is with dealing with interreflections.
  • In most scenes the surfaces are not only illuminated by

the light sources, but also by the light reflected from other surfaces which serve as a secondary light source.

  • These mutual illumination effects are quite significant.
slide-42
SLIDE 42

Extracting 3-D Information Using Vision

Contour

  • The use of lines in a line drawing to get a vivid

perception of 3-D shapes and layout.

  • Determine the exact significance of each line in an

image.

  • Also called the line labeling problem as the task is to

label each line according to its significance.

slide-43
SLIDE 43

Extracting 3-D Information Using Vision

  • In a simplified world, where all surface marks and

shadows have been removed all the lines can be classified as either limbs or edges.

  • Limbs are the locus point on the surface where the line
  • f sight is tangent to the surface.
  • Edge is a surface normal discontinuity.
  • Each edge can be further broken up into convex, concave

and occluding edges.

slide-44
SLIDE 44

Extracting 3-D Information Using Vision

  • "+" and "-" labels represent convex and concave edges respectively.
  • "<-" and "->" labels represent occluding edges.
  • "<-<-" and "->->" labels represent limbs.
slide-45
SLIDE 45

Extracting 3-D Information Using Vision

In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids –

  • bjects in which exactly three plane surfaces come

together at each vertex.

slide-46
SLIDE 46

Extracting 3-D Information Using Vision

For this particular trihedral world, Huffman and Clowes made an exhaustive list of all the different vertex types and the different ways in which they could be viewed under general view point.

slide-47
SLIDE 47

Extracting 3-D Information Using Vision

They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved

  • bjects.
slide-48
SLIDE 48

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-49
SLIDE 49

Outline

9.5: Using Vision for Manipulation and Navigation Driving Example Lateral Control Longitudinal Control

slide-50
SLIDE 50

Using Vision for Manipulation and Navigation

  • One of the main uses of vision is to provide information

for manipulating objects as well as navigating in a scene while avoiding obstacles.

  • A perfect example of the use of vision is the driving

example.

slide-51
SLIDE 51

Using Vision for Manipulation and Navigation

Figure 24.24: The information needed for visual control of a vehicle on a freeway.

slide-52
SLIDE 52

Using Vision for Manipulation and Navigation

The tasks for the driver in Figure 24.24:

  • 1. Keep moving at a reasonable speed. (v0)
  • 2. Lateral control. (dl = dr)
  • 3. Longitudinal control. (d2 = safe distance)
  • 4. Monitor vehicles in neighboring lanes and be prepared

for action if one of them decides to change lanes.

slide-53
SLIDE 53

Using Vision for Manipulation and Navigation

  • The problem for the driver is to generate appropriate

steering, actuation or braking actions to best accomplish these tasks.

  • Focusing specifically on lateral and longitudinal control,

what information is needed for these tasks?

slide-54
SLIDE 54

Using Vision for Manipulation and Navigation

Lateral Control:

  • The steering control system for the vehicle needs to

detect edges corresponding to the lane marker segments and then needs to fit smooth curves to these.

  • The parameters of these curves carry information about

the lateral position of the car, the direction it is pointing relative to the lane, and the curvature of the lane.

  • The dynamics of the vehicle are also needed.
slide-55
SLIDE 55

Using Vision for Manipulation and Navigation

Longitudinal Control:

  • The driver needs to know the distance to the vehicles in

front.

  • This can be accomplished using binocular

stereopsis or optical flow.

  • The driving example makes one point very clear: for a

specific task, one does not need to recover all the information that in principle can be recovered from an image.

slide-56
SLIDE 56

Outline

9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

slide-57
SLIDE 57

Outline

9.6: Object Representation and Recognition Alignment Method Projective Invariants Representation of Models Matching Models to Images

slide-58
SLIDE 58

Object Representation and Recognition

Problem:

  • Given: a scene consisting of one or more objects chosen

from a collection of objects and an image of the scene taken from an unknown viewer position and orientation.

  • Determine: Which of the objects from the collection are

present in the scene and for each object, determine its position and orientation relative to the viewer.

slide-59
SLIDE 59

Object Representation and Recognition

  • The two fundamental issues that any object recognition

scheme must address are the representation of the models and the matching of models to images.

  • The most common way of representing objects in a

recognition system in 3D is by using generalized cylinders.

slide-60
SLIDE 60

Object Representation and Recognition

  • Examples of

Generalized Cylinders:

slide-61
SLIDE 61

Object Representation and Recognition

Alignment Method:

  • Handy for identifying 3D objects without knowing their

position or orientation in respect to the observer.

  • Accomplishes this by representing the object with a set
  • f m features or distinguishing points in 3D.
  • The points are then subjected to 3D rotation R, followed

by a translation by unknown amount t and projection to give rise to image feature points on the image plane.

slide-62
SLIDE 62

Object Representation and Recognition

  • A disadvantage of this model is that this involves trying

each model in the model library, resulting in a recognition complexity proportional to the number of models in the library.

  • A solution is provided by using geometric invariants as

the shape representation. This model uses the projective invariants measured from the image curves.

slide-63
SLIDE 63

Object Representation and Recognition

  • When an invariant that is measured corresponds to a

value in the library, a recognition hypothesis is generated. This is verified by back projecting the outline just like the alignment method.

  • An advantage of invariant shape representation is that

models can be acquired directly from images without making measurements on the actual objects because the shape descriptors have the same value when measured in any image.

slide-64
SLIDE 64

Object Representation and Recognition

  • Although the computer is capable of recognizing a broad

array of images, there are some images that are currently nearly impossible for the computer to recognize.

slide-65
SLIDE 65

Object Representation and Recognition

  • Other images show ambiguities that humans are capable
  • f handling with little difficulties.
  • Can a computer algorithm distinguish which object is

intended when there are a number of possible objects?

slide-66
SLIDE 66

Object Representation and Recognition

  • Further Examples:
slide-67
SLIDE 67

Object Representation and Recognition

  • There are also other images which exist in 2D but cannot

exist in 3D, can a computer algorithm detect this?

slide-68
SLIDE 68

Summary

  • Perception Agents
  • Perception Sensors
  • The Straightforward Approach
  • Manipulation
  • Navigation
  • Object Recognition
slide-69
SLIDE 69

Summary

  • Perceptive Projection
  • Orthographic Projection
  • Lens Systems
  • Photometry of Image Formation
slide-70
SLIDE 70

Summary

  • Edges
  • Convolving
  • The smoothing Gaussian function
slide-71
SLIDE 71

Summary

  • Motion
  • Binocular stereopsis
  • Texture
  • Shading
  • Contour
slide-72
SLIDE 72

Summary

  • Some main uses of vision
  • Representation of models
  • Matching of models to images
  • Alignment Method
  • Projective Invariants
  • Problems with Object Recognition
slide-73
SLIDE 73

Sources

  • 533 Text book
  • http://sern.ucalgary.ca/courses/CPSC/533/W99/

presentations/L2_24A_Lee_Wang/ http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L1_24A_Kaasten_Steller_Hoang/main.htm http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L1_24_Schebywolok/index.html http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L2_24B_Doering_Grenier/

  • http://www.geocities.com/SoHo/Museum/3828/
  • ptical.html
  • http://members.spree.com/funNgames/katbug/