Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004 - - PowerPoint PPT Presentation

computer vision
SMART_READER_LITE
LIVE PREVIEW

Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004 - - PowerPoint PPT Presentation

Computer Vision CSPP 56553 Artificial Intelligence March 3, 2004 Roadmap Motivation Computer vision applications Is a Picture worth a thousand words? Low level features Feature extraction: intensity, color High


slide-1
SLIDE 1

Computer Vision

CSPP 56553 Artificial Intelligence March 3, 2004

slide-2
SLIDE 2

Roadmap

  • Motivation

– Computer vision applications

  • Is a Picture worth a thousand words?

– Low level features

  • Feature extraction: intensity, color

– High level features

  • Top-down constraint: shape from stereo, motion,..
  • Case Study: Vision as Modern AI

– Fast, robust face detection (Viola & Jones 2002)

slide-3
SLIDE 3

Perception

  • From observation to facts about world

– Analogous to speech recognition – Stimulus (Percept) S, World W

  • S = g(W)

– Recognition: Derive world from percept

  • W=g’(S)
  • Is this possible?
slide-4
SLIDE 4

Key Perception Problem

  • Massive ambiguity

– Optical illusions

  • Occlusion
  • Depth perception
  • “Objects are closer than they appear”
  • Is it full-sized or a miniature model?
slide-5
SLIDE 5

Image Ambiguity

slide-6
SLIDE 6

Handling Uncertainty

  • Identify single perfect correct solution

– Impossible!

  • Noise, ambiguity, complexity
  • Solution:

– Probabilistic model – P(W|S) = αP(S|W) P(W)

  • Maximize image probability and model probability
slide-7
SLIDE 7

Handling Complexity

  • Don’t solve the whole problem

– Don’t recover every object/position/color…

  • Solve restricted problem

– Find all the faces – Recognize a person – Align two images

slide-8
SLIDE 8

Modern Computer Vision Applications

  • Face / Object detection
  • Medical image registration
  • Face recognition
  • Object tracking
slide-9
SLIDE 9

Vision Subsystems

slide-10
SLIDE 10

Image Formation

slide-11
SLIDE 11

Images and Representations

  • Initially pixel images

– Image as NxM matrix of pixel values – Alternate image codings

  • Grey-scale intensity values
  • Color encoding: intensities of RGB values
slide-12
SLIDE 12

Images

slide-13
SLIDE 13

Grey-scale Images

slide-14
SLIDE 14

Color Images

slide-15
SLIDE 15

Image Features

  • Grey-scale and color intensities

– Directly access image signal values – Large number of measures

  • Possibly noisy
  • Only care about intensities as cues to world
  • Image Features:

– Mid-level representation – Extract from raw intensities – Capture elements of interest for image understanding

slide-16
SLIDE 16

Edge Detection

slide-17
SLIDE 17

Edge Detection

  • Find sharp demarcations in intensity
  • 1) Apply spatially oriented filters
  • E.g. vertical, horizontal, diagonal
  • 2) Label above-threshold pixels with edge orientation
  • 3) Combine edge segments with same orientation:

line

slide-18
SLIDE 18

Top-down Constraints

  • Goal: Extract objects from images

– Approach: apply knowledge about how the world works to identify coherent objects

slide-19
SLIDE 19

Motion: Optical Flow

  • Find correspondences

in sequential images

– Units which move together represent

  • bjects
slide-20
SLIDE 20

Stereo

slide-21
SLIDE 21

Stereo Depth Resolution

slide-22
SLIDE 22

Texture and Shading

slide-23
SLIDE 23

Edge-Based 2-3D Reconstruction

Assume world of solid polyhedra with 3-edge vertices Apply Waltz line labeling – via Constration Satisfaction

slide-24
SLIDE 24

Basic Object Recognition

  • Simple idea:

– extract 3-D shapes from image – match against \shape library"

  • Problems:

– extracting curved surfaces from image – representing shape of extracted object – representing shape and variability of library object classes – improper segmentation, occlusion – unknown illumination, shadows, markings, noise, complexity, etc.

  • Approaches:

– index into library by measuring invariant properties of objects – alignment of image feature with projected library object feature – match image against multiple stored views (aspects) of library object – machine learning methods based on image statistics

slide-25
SLIDE 25

Hand-written Digit Recognition

slide-26
SLIDE 26

Summary

  • Vision is hard:

– Noise, ambiguity, complexity

  • Prior knowledge is essential to constrain problem

– Cohesion of objects, optics, object features

  • Combine multiple cues

– Motion, stereo, shading, texture,

  • Image/object matching:

– Library: features, lines, edges, etc

  • Apply domain knowledge: Optics
  • Apply machine learning: NN, NN, CSP, etc
slide-27
SLIDE 27

Computer Vision Case Study

  • “Rapid Object Detection using a Boosted

Cascade of Simple Features”, Viola/Jones ’01

  • Challenge:

– Object detection:

  • Find all faces in an arbitrary images

– Real-time execution

  • 15 frames per second

– Need simple features, classifiers

slide-28
SLIDE 28

Rapid Object Detection Overview

  • Fast detection with simple local features

– Simple fast feature extraction

  • Small number of computations per pixel
  • Rectangular features

– Feature selection with Adaboost

  • Sequential feature refinement

– Cascade of classifiers

  • Increasingly complex classifiers
  • Repeatedly rule out non-object areas
slide-29
SLIDE 29

Picking Features

  • What cues do we use for object detection?

– Not direct pixel intensities – Features

  • Can encode task specific domain knowledge (bias)

– Difficult to learn directly from data – Reduce training set size

  • Feature system can speed processing
slide-30
SLIDE 30

Rectangle Features

  • Treat rectangles as units

– Derive statistics

  • Two-rectangle features

– Two similar rectangular regions

  • Vertically or horizontally adjacent

– Sum pixels in each region

  • Compute difference between regions
slide-31
SLIDE 31

Rectangle Features II

  • Three-rectangle features

– 3 similar rectangles: horizontally/vertically

  • Sum outside rectangles
  • Subtract from center region
  • Four-rectangle features

– Compute difference between diagonal pairs

  • HUGE feature set: ~180,000
slide-32
SLIDE 32

Rectangle Features

slide-33
SLIDE 33

Computing Features Efficiently

  • Fast detection requires fast feature calculation
  • Rapidly compute intermediate representation

– “Integral image” – Value for point (x,y) is sum of pixels above, left – ii(x,y) = Σx’<=x,y’<=y i(x,y) – Computed by recurrence

  • s(x,y) = s(x,y-1) + i(x,y) , where s(x,y) cumulative row
  • ii(x,y) = ii(x-1,y) + s(x,y)
  • Compute rectangle sum with 4 array references
slide-34
SLIDE 34

Rectangle Feature Summary

  • Rectangle features

– Relatively simple – Sensitive to bars, edges, simple structure

  • Coarse

– Rich enough for effective learning – Efficiently computable

slide-35
SLIDE 35

Learning an Image Classifier

  • Supervised training: +/- examples
  • Many learning approaches possible
  • Adaboost:

– Selects features AND trains classifier – Improves performance of simple classifiers

  • Guaranteed to converge exponentially rapidly

– Basic idea: Simple classifier

  • Boosts performance by focusing on previous errors
slide-36
SLIDE 36

Feature Selection and Training

  • Goal: Pick only useful features from 180000

– Idea: Small number of features effective

  • Learner selects single feature that best

separates +/- ve examples

– Learner selects optimal threshold for each feature – Classifier h(x) = 1 if pf(x)<pθ, 0 otherwise

slide-37
SLIDE 37

Basic Learning Results

  • Initial classification: Frontal faces

– 200 features – Finds 95%, 1/14000 false positive – Very fast

  • Adding features adds to computation time
  • Features interpretable

– Darker region around eyes that nose/cheeks – Eyes are darker than bridge of nose

slide-38
SLIDE 38

Primary Features

slide-39
SLIDE 39

“Attentional Cascade”

  • Goal: Improved classification, reduced time

– Insight: Small – fast – classifiers can reject

  • But have very few false negatives

– Reject majority of uninteresting regions quickly – Focus computation on interesting regions

  • Approach: “Degenerate” decision tree
  • Aka “cascade”
  • Positive results passed to high detection classifiers

– Negative results rejected immediately

slide-40
SLIDE 40

Cascade Schematic

All Sub-window Features CL 1 CL 2 CL 3 F F F T T T More Classifiers Reject Sub-Window

slide-41
SLIDE 41

Cascade Construction

  • Each stage is a trained classifier

– Tune threshold to minimize false negatives – Good first stage classifier

  • Two feature strong classifier – eye/check + eye/nose
  • Tuned: Detect 100%; 40% false positives

– Very computationally efficient

  • 60 microprocessor instructions
slide-42
SLIDE 42

Cascading

  • Goal: Reject bad features quickly

– Most features are bad

  • Reject early in processing, little effort

– Good regions will trigger full cascade

  • Relatively rare
  • Classification is progressively more difficult

– Rejected the most obvious cases already

  • Deeper classifiers more complex, more error-prone
slide-43
SLIDE 43

Cascade Training

  • Tradeoffs: Accuracy vs Cost

– More accurate classifiers: more features, complex – More features, more complex: Slower – Difficult optimization

  • Practical approach

– Each stage reduces false positive rate – Bound reduction in false pos, increase in miss – Add features to each stage until meet target – Add stages until overall effectiveness targets met

slide-44
SLIDE 44

Results

  • Task: Detect frontal upright faces

– Face/non-face training images

  • Face: ~5000 hand-labeled instances
  • Non-face: ~9500 random web-crawl, hand-checked

– Classifier characteristics:

  • 38 layer cascade
  • Increasing number of features: 1,10,25,… : 6061

– Classification: Average 10 features per window

  • Most rejected in first 2 layers
  • Process 384x288 image in 0.067 secs
slide-45
SLIDE 45

Detection Tuning

  • Multiple detections:

– Many subwindows around face will alert – Create disjoint subsets

  • For overlapping boundaries, only report one

– Return average of corners

  • Voting:

– 3 similarly trained detectors

  • Majority rules

– Improves overall

slide-46
SLIDE 46

Conclusions

  • Fast, robust facial detection

– Simple, easily computable features – Simple trained classifiers – Classification cascade allows early rejection

  • Early classifiers also simple, fast

– Good overall classification in real-time

slide-47
SLIDE 47

Some Results

slide-48
SLIDE 48

Vision in Modern Ai

  • Goals:

– Robustness – Multidomain applicability – Automatic acquisition – Speed: Real time

  • Approach:

– Simple mechanisms, feature selection – Machine learning: Tune features, classification