Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. - - PowerPoint PPT Presentation
Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. - - PowerPoint PPT Presentation
Generative Hierarchical Models for Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang) I. Image modeling II. Data likelihood III. Priors: content/context sensitivity I. Image modeling Red herrings? Bayesian
I. Image modeling
- II. Data likelihood
- III. Priors: content/context sensitivity
I. Image modeling
- Red herrings?
- Bayesian (generative) image models
- II. Data likelihood
- III. Priors: content/context sensitivity
Practical vision problems: What is the end-product of processing?
“The more you look, the more you see” machine vision: machine analysis human vision:
Learning Theory: Pure learning
Tree
) (image y ) (label x
) ( box black
Tree No
classifier OPTIMAL box black that so produce x y Given
N N N k k k
,..., , ,
2 1 1
Performance of stressed biological systems: Super-rapid response…
In this circumstance: machine vision achieves biological performance
I. Image modeling
- Red herrings?
- Bayesian (generative) image models
- II. Data likelihood
- III. Priors: content/context sensitivity
- I. Bayesian (generative) image models
Prior
( ) set of possible "interpretations" or "parses" a particular interpretation probability model on * very structured and constrained I x I P x I * organizing principles: hierarchy and reusability (Amit, Buhmann, Felzenszwalb, Mumford, Pogio, Yuille, Zhu, etc.) * non-Markovian (context/content sensitive)
Data likelihood
( | ) ( | ) ( ) P x y P y x P x ( | ) image conditional probability model y P y x
Posterior
I. Image modeling
- II. Data likelihood
- Feature distributions and data
distributions
- Conditional modeling
- Examples: learning templates
- III. Priors: content/context sensitivity
Feature distributions and data distributions
S { }
s s S
y y
pixel intensity at
s
y s S
( ) ( ; ) "feature" e.g. variance of patch histogram of gradients, sift features, etc. template correlation proba
F
f y P f
bility model
image patch
Given a category (e.g. edge, corner, eye, face, (eye,pose),…), model patch through a feature model:
1,...,
, Problem: given samples of eye patches, learn and
N
y y
( ) for short f y
( ) for short
F
P f
1 1 1
( ),..., ( ) ( ( ),..., ( ); , ) ( ( )) Tempting to PRETEND that the data is :
N N N F k k
f y f y L f y f y P f y
1
,..., ( ) ( ( )) ( | ( )) ( ,..., ; , ) ( ( )) ( | ( ))
1 1
BUT the data is y and
N Y F Y N N F k Y k k k
y P y P f y P y F f y L y y P f y P y F f y
Use maximum likelihood…but what is the likelihood?
,
1 ( ) ( ( )) caution: this is different from Y
F
P y P f y Z
The first is fine for estimating (i.e. ), but not fine for estimating (i.e. )
F
P f
I. Image modeling
- II. Data likelihood
- Feature distributions and data
distributions
- Conditional modeling
- Examples: learning templates
- III. Priors: content/context sensitivity
Conditional modeling
( ) ( ) ( ( )) ( | ( )) ( ( )) ( | ). For any category (e.g. "eye") and feature Easy to model ; hard to model
g g g Y F Y g g F Y
g F f Y P y P f y P y F f y P f y P y F f ( ) ( ) ( ) ( ) Proposal: start with a "null" or "background" distribution and choose
- 1. consistent with
, and
- 2. otherwise "as close as possible" to
- g
Y Y g F
- Y
P y P y P f P y
: ( ) ( )
( ) ( ), ( ) arg min ( || ) ( ) ( ( )) ( | ( ))
has distribution
Specifically, given , and a null distribution choose
Y g F
g
- F
Y g
- Y
Y Y P F Y P f g g
- Y
F Y
P f P y P y D P P P y P f y P y F f y
Conditional modeling: a perturbation of the null distribution
( ) ( || ) ( )log ( ) (where is K-L divergence) P y D P P P y dy P y % %
Estimation
1,...,
( ) ( ; ) ( ) ( ( )) ( | ( )) : Given , ), and estimate and
g g N F F g g
- Y
F Y
y y P f P f P y P f y P y F f y
1 1
( ,..., ; , ) ( ( )) .... ( ( ))
argmax argmax
, , N g N F k
- k
F k
L y y P f y P f y
In fact, for arbitrary mixture (e.g. over poses, templates, vector quanta, …):
1 1 1 1 1 1
1 1 1 1 ,..., ,..., ,..., 1 1 ,..., ,..., ,...,
( ,..., ; ,..., ,..., ,..., ) ( ( )) .... ( ( ))
argmax argmax
m m m m m m m m
N m m m g N M F m k m
- m
k F m k
L y y P f y P f y
1
( ) ( ; ) 1,2,..., ( ) ( ( )) ( | ( )) ),
m m m m
g g F m F m M g g
- Y
m F m Y m m m
P f P f m M P y P f y P y F f y
I. Image modeling
- II. Data likelihood
- Feature distributions and data
distributions
- Conditional modeling
- Examples: learning templates
- III. Priors: content/context sensitivity
Example: learning eye templates
1 (1 ( ))
( ) ( ) ( , ), ( ) ( ( )) ( | ( )) ( | ( ))
M M m=1
Take and model eyes as a mixture: =
T m m m m m Tm m m m
T e e
- Y
m C T Y T T m c y
- m
Y T T
f y c y corr T y P y P c y P y C c y e P y C c y
S { }
s s S
y y
pixel intensity at
s
y s S
image patch
2
( ) (0, sample ) iid
T
- Y
C
P P c N 1 | | 10 | | random image patch random S S 15 | | smooth image patch S
: ( ) Null distribution, for estimation
- nly
matters...
T
- Y
- C
P P c
Example: learning eye templates
1 1 1 1 1 1
1 1 1 1 ,..., ,..., ,..., (1 ( )) 1 1 ,..., ,..., ,...,
( ,..., | ,..., , ,..., , ,..., ) ( ( ))
argmax argmax
T M T
With N 500 compute
m m m m T k m m T m m m m m
N m m m T c y N m
- m
k C T k T
L y y T T e P c y
Examples of faces from Feret database
samples from training set learned templates
Example: learning eye templates, mixing over position, scale, and template
Top to bottom: EM iterations
Example: learning (right) eye templates
(1 ( )) (1 ( )) 1 1 1 1
) ( ( ))
M M
What if we forget all this nonsense and just maximize (instead of ?
m T k m m T k m m m T m m
c y N N c y m m
- m
m k k C T k
e e P c y
How good are the templates? A classification experiment…
Classify East Asian and South Asian * mixing over 4 scales, and 8 templates
East Asian: (L) examples of training images (M) progression of EM (R) trained templates South Asian: (L) examples of training images (M) progression of EM (R) trained templates
Classification Rate: 97%
Other examples: noses 16 templates multiple scales, shifts, and rotations
samples from training set learned templates
Other examples: mixture of noses and mouths
samples from training set (1/2 noses, 1/2 mouths) 32 learned templates
Other examples: train on 58 faces …half with glasses…half without
32 learned templates samples from training set 6 learned templates
Other examples: train on 58 faces …half with glasses…half without
6 learned templates random eight of the 58 faces row 2 to 5, top to bottom: templates ordered by posterior likelihood
Other examples: train on 58 faces …half with glasses…half without
top row: the six learned templates row 2 to 5, top to bottom: Training images ordered by correlation
Other examples: train random patches (“sparse representation”)
500 random 15x15 training patches from random internet images 24 10x10 templates
Other examples: coarse representation
( ) ( , ( )), ( ) ( ( ), )?) use where downconvert (go other way for super res.: f y Corr T D y D f y Corr D T y
training of 8 low-res (10x10) templates
Grenander: “pattern synthesis=pattern analysis”
32 samples from mixture model with white noise
Y
P
(approximate) sampling…
(approximate) sampling…
32 samples from mixture model with Caltech 101
Y
P
(approximate) sampling…
32 samples from mixture model with population of smooth image patches
Y
P
I. Image modeling
- II. Data likelihood
- III. Priors: content/context sensitivity
- Hierarchical models and the Markov
dilemma
- Conditional modeling
- Examples: detecting faces and reading
license plates
Hierarchical models and the Markov Dilemma
{0,1} 1 'pair of eyes'
p p
x x {0,1 } 1 'left eye'
l l
x x {0,1} 1 'right eye'
r r
x x
Markov model
Markov property…
Estimation Computation Representation 1 Given , there are probabilistic constraints
- n the poses and
appearances of the left and right eyes.
p
x
I. Image modeling
- II. Data likelihood
- III. Priors: content/context sensitivity
- Hierarchical models and the Markov
dilemma
- Conditional modeling
- Examples: detecting faces and reading
license plates
Hierarchical models and the Markov Dilemma
{0,1} 1 'pair of eyes'
p p
x x {0,1 } 1 'left eye'
l l
x x {0,1} 1 'right eye'
r r
x x
Markov model
0( )
1 ( ) ( ) ( ( ) |1 ( ) 1)
A
More generally Markov distribution e.g. pair of eyes attribute (e.g. relative poses of two eyes) P de
B B
P x x x B a x A a x x sired conditional distribution
1 : ( ( )|1 ( ) 1) ( ( )|1 ( ) 1) 1 ( ) 1 1
( ) argmin ( || ) ( ( ) |1 ( ) 1) ( ) ( ) ( ( ) |1 ( ) 1) Choose then
B A B B
P P A a x x P A a x x x A B B
P x D P P P A a x x P x P x P A a x x
%%
%
I. Image modeling
- II. Data likelihood
- III. Priors: content/context sensitivity
- Hierarchical models and the Markov
dilemma
- Conditional modeling
- Examples: detecting faces and reading
license plates
characters, plate sides generic letter, generic number, L-junctions of sides license plates parts of characters, parts of plate sides plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits) license numbers (3 digits + 3 letters, 4 digits + 2 letters)
Architecture for reading license plates
Markov backbone compositional distribution
Sample random 4-digit strings:
“pattern synthesis=pattern analysis” (Markov versus perturbed Markov)
Original image Zoomed license region Top object: Markov distribution Top object: perturbed (“content-sensitive”) distribution
Markov versus perturbed Markov
Test set: 385 images, mostly from Logan Airport
Courtesy of Visics Corporation
Original Image Top object Top 10 objects Top 25 objects
Image interpretation
Test image Top objects
Image interpretation
- 385 images
- Six plates read with mistakes (>98%)
- Approx. 99.5% characters read correctly
- Zero false positives
Performance
Performance: errors
Performance: errors
Face detection (conditional modeling and overlapping patches)
Blue: best instantiation Green: second-best instantiation Turquoise: third-best
Blue: best instantiation Green: second-best instantiation Turquoise: third-best
In summary….
I am advocating strong representation
- grammatical rules that
are content sensitive
- data models that