SLIDE 1
PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 - - PowerPoint PPT Presentation
PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 - - PowerPoint PPT Presentation
PERCEPTION Damien Blond Alim Fazal Tory Richard April 11th, 2000 Outline 9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation
SLIDE 2
SLIDE 3
Introduction
- Perception provides agents with information about the
world they inhabit.
- A sensor is anything that can change the computational
state of the agent in response to a change in the state of the world.
- The sensors that agents share with humans are vision,
hearing, and touch.
SLIDE 4
Introduction
- The main focus of the sensors will be on the processing
- f the raw information that they provide.
- Where S is the sensory stimulus and W is the world.
S=f(W)
- In order to gain information about the world we can take
the straightforward approach and invert the equation. W=f-1(S)
SLIDE 5
Introduction
- A drawback of the straightforward approach is that it is
trying to solve too difficult a problem.
- In many cases, the agent does not need to know
everything about the world.
- Sometimes just one or two predicates are needed.
SLIDE 6
Introduction
Some of the possible uses for Vision:
- Manipulation – Grasping, insertion, needs local shape
information and feedback for motor control.
- Navigation – Finding clear paths, avoiding obstacles,
calculating one’s current velocity and orientation.
- Object Recognition – A useful skill for distinguishing
between multiple objects.
SLIDE 7
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
SLIDE 8
Outline
9.2: Image Formation Pinhole Camera Lens Systems Photometry of Image Formation
SLIDE 9
Image Formation
- Vision works by gathering light scattered from objects in
the scene and creating a 2-D image.
- It’s important to the understand the geometry of the
process in order to obtain information about the scene.
SLIDE 10
Image Formation
SLIDE 11
Image Formation
Perspective Project Equations
- x/f = X/Z, -y/f = Y/Z
=> x = (-fX)/Z, y = (-fY)/Z
SLIDE 12
Image Formation
- The Perspective projection is often approximated using
- rthographic projection, but there is an important
difference.
- The Orthographic projection does not project vectors
through a pinhole.
- Instead, the vectors run parallel, either perpendicular to
- r at a consistent angle from the image plane.
SLIDE 13
Lens Systems
- Both human and artificial eyes use a lens.
- The lens is wider than a pinhole, allowing more light to
enter, increasing the information collected.
- The human eye focuses by bending the shape of the lens.
- Artificial eyes focus by changing the distance between the
lens and the image plane.
SLIDE 14
Photometry of Image Formation
SLIDE 15
Photometry of Image Formation
SLIDE 16
Photometry of Image Formation
- A processed image plane contains a brightness value for
each pixel.
- The brightness of a pixel p in the image is proportional to
amount of light directed toward the camera by the surface patch Sp that projects to pixel p.
- The light is characterized as being either Diffuse or
Specular reflection.
SLIDE 17
Photometry of Image Formation
- Diffuse reflection redirects light equally in all directions,
and is common for dull surfaces.
- It is described by the following equation, known as
Lambert's formula: E = p E0cos(theta) where p describes how dull/shiny the surface is, E0 is the intensity of the light source and (theta) is the angle between the light direction and surface normal.
SLIDE 18
Photometry of Image Formation
- Phong's formula:
E = p E0cosm (theta)
- p is the coefficient of Specular reflection
- E0 is the intensity of the light source
- m is the 'shininess' of the surface
- (theta) is the angle between the light direction and
surface normal.
SLIDE 19
Photometry of Image Formation
- In real life, surfaces exhibit a combination of diffuse and
specular properties.
- Modeling this on the computer is what computer
graphics is all about.
- Rendering realistic images is usually done by ray tracing.
SLIDE 20
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
SLIDE 21
Outline
9.3: Image Processing Operations for Early Vision Edge Detection
SLIDE 22
Image-Processing Operations
Edge Detection
- Edges are curves in the image plane across which there is
a “significant” change in image brightness.
- The goal of edge detection is the construction of an
idealized line drawing
SLIDE 23
Image-Processing Operations
- One idea to detect edges is to differentiate the image and
look for places where the brightness undergoes a sharp change
- Consider a 1-D example. Below is an intensity profile for
a 1-D image.
SLIDE 24
Image-Processing Operations
- Below we have the derivative of the previous graph.
- Here we have a peak at x=18, x=50 and x=75.
- These errors are due to the presence of noise in the image.
SLIDE 25
Image-Processing Operations
- This problem is countered by convolving a smoothing
function along with the differentiation operation.
- The mathematical concept of convolution allows us to
perform many useful image-processing operations.
SLIDE 26
Image-Processing Operations
- One standard form of smoothing is to use a Gaussian
function.
- Now using the idea of convolving with the Gaussian function
we can revisit the 1-D example.
SLIDE 27
Image-Processing Operations
- With the convolving applied we can more easily see the
edge at x=50. Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.
SLIDE 28
Image-Processing Operations
- Here is an example of using convolving in an 2-D picture
- f Mona Lisa
SLIDE 29
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
SLIDE 30
Outline
9.4: Extracting 3D Information using Vision Motion Binocular Stereopsis Texture Gradient Shading Contour
SLIDE 31
Extracting 3-D Information Using Vision
We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. Three aspects: 1.Segmentation 2.Position & Orientation 3.Shape To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.
SLIDE 32
Extracting 3-D Information Using Vision
Motion
- Optical Flow - resulting motion when a camera moves
relative to the 3-D scene.
SLIDE 33
Extracting 3-D Information Using Vision
- To measure Optical Flow, we need to find corresponding
points between one time frame and the next.
- One formula is Sum of Squared Differences (SSD)
SSD(Dx, Dy) = ∑(x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2
SLIDE 34
Extracting 3-D Information Using Vision
The other formula to show this is Cross-Correlation(CC): CC(Dx, Dy) = ∑(x,y) I(x, y, t)I(x+Dx, y+Dy, t+Dt)
- Cross-Correlation works best when there is texture in the
- scene. Because there is a significant brightness variation
among the pixels.
SLIDE 35
Extracting 3-D Information Using Vision
Binocular Stereopsis
- Binocular stereopsis uses multiple images in space.
Where as motion used multiple images over time.
- Because the scenes will be in a different places relative
to the z-axis, if we superpose the two images, there will be disparity in the location of important features.
SLIDE 36
Extracting 3-D Information Using Vision
- This also allows us to
easily determine depth. Knowing the distance between the cameras, and the point at which their lines
- f sight intersect, it only
requires a few simple geometric calculations to determine the depth coordinate z for any given (x, y) coordinate.
SLIDE 37
Extracting 3-D Information Using Vision
Texture Gradient
- Texture refers to a spatially
repeating pattern on a surface that can be sensed visually.
- In the images, the apparent
size, shape, spacing of the repeating texture elements(texels) vary.
SLIDE 38
Extracting 3-D Information Using Vision
The two main causes for this variation in size are:
- Varying distance from the camera to the different texture
elements.
- Varying orientation of the texel relative to the line of
sight from the camera. It is possible to express the rate of change of these texel features, by using some mathematical analysis called texture gradients.
SLIDE 39
Extracting 3-D Information Using Vision
Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.
SLIDE 40
Extracting 3-D Information Using Vision
Shading
- The variation in the intensity of light received from
different portions of a surface in the scene.
- Given the image brightness, I (x, y), our hope is to
recover the scene geometry and the reflectance properties
- f the object.
- But this has proved difficult to do in anything but the
simplest cases.
SLIDE 41
Extracting 3-D Information Using Vision
- The main problem is with dealing with interreflections.
- In most scenes the surfaces are not only illuminated by
the light sources, but also by the light reflected from other surfaces which serve as a secondary light source.
- These mutual illumination effects are quite significant.
SLIDE 42
Extracting 3-D Information Using Vision
Contour
- The use of lines in a line drawing to get a vivid
perception of 3-D shapes and layout.
- Determine the exact significance of each line in an
image.
- Also called the line labeling problem as the task is to
label each line according to its significance.
SLIDE 43
Extracting 3-D Information Using Vision
- In a simplified world, where all surface marks and
shadows have been removed all the lines can be classified as either limbs or edges.
- Limbs are the locus point on the surface where the line
- f sight is tangent to the surface.
- Edge is a surface normal discontinuity.
- Each edge can be further broken up into convex, concave
and occluding edges.
SLIDE 44
Extracting 3-D Information Using Vision
- "+" and "-" labels represent convex and concave edges respectively.
- "<-" and "->" labels represent occluding edges.
- "<-<-" and "->->" labels represent limbs.
SLIDE 45
Extracting 3-D Information Using Vision
In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids –
- bjects in which exactly three plane surfaces come
together at each vertex.
SLIDE 46
Extracting 3-D Information Using Vision
For this particular trihedral world, Huffman and Clowes made an exhaustive list of all the different vertex types and the different ways in which they could be viewed under general view point.
SLIDE 47
Extracting 3-D Information Using Vision
They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved
- bjects.
SLIDE 48
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
SLIDE 49
Outline
9.5: Using Vision for Manipulation and Navigation Driving Example Lateral Control Longitudinal Control
SLIDE 50
Using Vision for Manipulation and Navigation
- One of the main uses of vision is to provide information
for manipulating objects as well as navigating in a scene while avoiding obstacles.
- A perfect example of the use of vision is the driving
example.
SLIDE 51
Using Vision for Manipulation and Navigation
Figure 24.24: The information needed for visual control of a vehicle on a freeway.
SLIDE 52
Using Vision for Manipulation and Navigation
The tasks for the driver in Figure 24.24:
- 1. Keep moving at a reasonable speed. (v0)
- 2. Lateral control. (dl = dr)
- 3. Longitudinal control. (d2 = safe distance)
- 4. Monitor vehicles in neighboring lanes and be prepared
for action if one of them decides to change lanes.
SLIDE 53
Using Vision for Manipulation and Navigation
- The problem for the driver is to generate appropriate
steering, actuation or braking actions to best accomplish these tasks.
- Focusing specifically on lateral and longitudinal control,
what information is needed for these tasks?
SLIDE 54
Using Vision for Manipulation and Navigation
Lateral Control:
- The steering control system for the vehicle needs to
detect edges corresponding to the lane marker segments and then needs to fit smooth curves to these.
- The parameters of these curves carry information about
the lateral position of the car, the direction it is pointing relative to the lane, and the curvature of the lane.
- The dynamics of the vehicle are also needed.
SLIDE 55
Using Vision for Manipulation and Navigation
Longitudinal Control:
- The driver needs to know the distance to the vehicles in
front.
- This can be accomplished using binocular
stereopsis or optical flow.
- The driving example makes one point very clear: for a
specific task, one does not need to recover all the information that in principle can be recovered from an image.
SLIDE 56
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
SLIDE 57
Outline
9.6: Object Representation and Recognition Alignment Method Projective Invariants Representation of Models Matching Models to Images
SLIDE 58
Object Representation and Recognition
Problem:
- Given: a scene consisting of one or more objects chosen
from a collection of objects and an image of the scene taken from an unknown viewer position and orientation.
- Determine: Which of the objects from the collection are
present in the scene and for each object, determine its position and orientation relative to the viewer.
SLIDE 59
Object Representation and Recognition
- The two fundamental issues that any object recognition
scheme must address are the representation of the models and the matching of models to images.
- The most common way of representing objects in a
recognition system in 3D is by using generalized cylinders.
SLIDE 60
Object Representation and Recognition
- Examples of
Generalized Cylinders:
SLIDE 61
Object Representation and Recognition
Alignment Method:
- Handy for identifying 3D objects without knowing their
position or orientation in respect to the observer.
- Accomplishes this by representing the object with a set
- f m features or distinguishing points in 3D.
- The points are then subjected to 3D rotation R, followed
by a translation by unknown amount t and projection to give rise to image feature points on the image plane.
SLIDE 62
Object Representation and Recognition
- A disadvantage of this model is that this involves trying
each model in the model library, resulting in a recognition complexity proportional to the number of models in the library.
- A solution is provided by using geometric invariants as
the shape representation. This model uses the projective invariants measured from the image curves.
SLIDE 63
Object Representation and Recognition
- When an invariant that is measured corresponds to a
value in the library, a recognition hypothesis is generated. This is verified by back projecting the outline just like the alignment method.
- An advantage of invariant shape representation is that
models can be acquired directly from images without making measurements on the actual objects because the shape descriptors have the same value when measured in any image.
SLIDE 64
Object Representation and Recognition
- Although the computer is capable of recognizing a broad
array of images, there are some images that are currently nearly impossible for the computer to recognize.
SLIDE 65
Object Representation and Recognition
- Other images show ambiguities that humans are capable
- f handling with little difficulties.
- Can a computer algorithm distinguish which object is
intended when there are a number of possible objects?
SLIDE 66
Object Representation and Recognition
- Further Examples:
SLIDE 67
Object Representation and Recognition
- There are also other images which exist in 2D but cannot
exist in 3D, can a computer algorithm detect this?
SLIDE 68
Summary
- Perception Agents
- Perception Sensors
- The Straightforward Approach
- Manipulation
- Navigation
- Object Recognition
SLIDE 69
Summary
- Perceptive Projection
- Orthographic Projection
- Lens Systems
- Photometry of Image Formation
SLIDE 70
Summary
- Edges
- Convolving
- The smoothing Gaussian function
SLIDE 71
Summary
- Motion
- Binocular stereopsis
- Texture
- Shading
- Contour
SLIDE 72
Summary
- Some main uses of vision
- Representation of models
- Matching of models to images
- Alignment Method
- Projective Invariants
- Problems with Object Recognition
SLIDE 73
Sources
- 533 Text book
- http://sern.ucalgary.ca/courses/CPSC/533/W99/
presentations/L2_24A_Lee_Wang/ http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L1_24A_Kaasten_Steller_Hoang/main.htm http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L1_24_Schebywolok/index.html http://sern.ucalgary.ca/courses/CPSC/533/W99/ presentations/L2_24B_Doering_Grenier/
- http://www.geocities.com/SoHo/Museum/3828/
- ptical.html
- http://members.spree.com/funNgames/katbug/