Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. - - PowerPoint PPT Presentation

image analysis
SMART_READER_LITE
LIVE PREVIEW

Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. - - PowerPoint PPT Presentation

Generative Hierarchical Models for Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang) I. Image modeling II. Data likelihood III. Priors: content/context sensitivity I. Image modeling Red herrings? Bayesian


slide-1
SLIDE 1

Generative Hierarchical Models for Image Analysis

Stuart Geman

(with E. Borenstein, L.-B. Chang, W. Zhang)

slide-2
SLIDE 2

I. Image modeling

  • II. Data likelihood
  • III. Priors: content/context sensitivity
slide-3
SLIDE 3

I. Image modeling

  • Red herrings?
  • Bayesian (generative) image models
  • II. Data likelihood
  • III. Priors: content/context sensitivity
slide-4
SLIDE 4

Practical vision problems: What is the end-product of processing?

“The more you look, the more you see” machine vision: machine analysis human vision:

slide-5
SLIDE 5

Learning Theory: Pure learning

Tree

) (image y ) (label x

) ( box black 

Tree No

 

classifier OPTIMAL box black that so produce x y Given

N N N k k k

   

  

   ,..., , ,

2 1 1

slide-6
SLIDE 6

Performance of stressed biological systems: Super-rapid response…

In this circumstance: machine vision achieves biological performance

slide-7
SLIDE 7

I. Image modeling

  • Red herrings?
  • Bayesian (generative) image models
  • II. Data likelihood
  • III. Priors: content/context sensitivity
slide-8
SLIDE 8
  • I. Bayesian (generative) image models

Prior

( ) set of possible "interpretations" or "parses" a particular interpretation probability model on * very structured and constrained I x I P x I  * organizing principles: hierarchy and reusability (Amit, Buhmann, Felzenszwalb, Mumford, Pogio, Yuille, Zhu, etc.) * non-Markovian (context/content sensitive)

Data likelihood

( | ) ( | ) ( ) P x y P y x P x  ( | ) image conditional probability model y P y x

Posterior

slide-9
SLIDE 9

I. Image modeling

  • II. Data likelihood
  • Feature distributions and data

distributions

  • Conditional modeling
  • Examples: learning templates
  • III. Priors: content/context sensitivity
slide-10
SLIDE 10

Feature distributions and data distributions

S { }

s s S

y y

 pixel intensity at

s

y s S 

( ) ( ; ) "feature" e.g. variance of patch histogram of gradients, sift features, etc. template correlation proba

F

f y P f

  

bility model

image patch

Given a category (e.g. edge, corner, eye, face, (eye,pose),…), model patch through a feature model:

1,...,

, Problem: given samples of eye patches, learn and

N

y y  

( ) for short f y

( ) for short

F

P f

slide-11
SLIDE 11

1 1 1

( ),..., ( ) ( ( ),..., ( ); , ) ( ( )) Tempting to PRETEND that the data is :

N N N F k k

f y f y L f y f y P f y  



1

,..., ( ) ( ( )) ( | ( )) ( ,..., ; , ) ( ( )) ( | ( ))

1 1

BUT the data is y and

N Y F Y N N F k Y k k k

y P y P f y P y F f y L y y P f y P y F f y  

    

Use maximum likelihood…but what is the likelihood?

,

1 ( ) ( ( )) caution: this is different from Y

F

P y P f y Z  

The first is fine for estimating (i.e. ), but not fine for estimating (i.e. )

F

P f  

slide-12
SLIDE 12

I. Image modeling

  • II. Data likelihood
  • Feature distributions and data

distributions

  • Conditional modeling
  • Examples: learning templates
  • III. Priors: content/context sensitivity
slide-13
SLIDE 13

Conditional modeling

( ) ( ) ( ( )) ( | ( )) ( ( )) ( | ). For any category (e.g. "eye") and feature Easy to model ; hard to model

g g g Y F Y g g F Y

g F f Y P y P f y P y F f y P f y P y F f     ( ) ( ) ( ) ( ) Proposal: start with a "null" or "background" distribution and choose

  • 1. consistent with

, and

  • 2. otherwise "as close as possible" to
  • g

Y Y g F

  • Y

P y P y P f P y

slide-14
SLIDE 14

: ( ) ( )

( ) ( ), ( ) arg min ( || ) ( ) ( ( )) ( | ( ))

has distribution

Specifically, given , and a null distribution choose

Y g F

g

  • F

Y g

  • Y

Y Y P F Y P f g g

  • Y

F Y

P f P y P y D P P P y P f y P y F f y    

Conditional modeling: a perturbation of the null distribution

( ) ( || ) ( )log ( ) (where is K-L divergence) P y D P P P y dy P y   % %

slide-15
SLIDE 15

Estimation

1,...,

( ) ( ; ) ( ) ( ( )) ( | ( )) : Given , ), and estimate and

g g N F F g g

  • Y

F Y

y y P f P f P y P f y P y F f y

 

    

1 1

( ,..., ; , ) ( ( )) .... ( ( ))

argmax argmax

, , N g N F k

  • k

F k

L y y P f y P f y

   

 

 

slide-16
SLIDE 16

In fact, for arbitrary mixture (e.g. over poses, templates, vector quanta, …):

1 1 1 1 1 1

1 1 1 1 ,..., ,..., ,..., 1 1 ,..., ,..., ,...,

( ,..., ; ,..., ,..., ,..., ) ( ( )) .... ( ( ))

argmax argmax

m m m m m m m m

N m m m g N M F m k m

  • m

k F m k

L y y P f y P f y

           

      

 

 

 

1

( ) ( ; ) 1,2,..., ( ) ( ( )) ( | ( )) ),

m m m m

g g F m F m M g g

  • Y

m F m Y m m m

P f P f m M P y P f y P y F f y

 

   

slide-17
SLIDE 17

I. Image modeling

  • II. Data likelihood
  • Feature distributions and data

distributions

  • Conditional modeling
  • Examples: learning templates
  • III. Priors: content/context sensitivity
slide-18
SLIDE 18

Example: learning eye templates

1 (1 ( ))

( ) ( ) ( , ), ( ) ( ( )) ( | ( )) ( | ( ))

M M m=1

Take and model eyes as a mixture: =

T m m m m m Tm m m m

T e e

  • Y

m C T Y T T m c y

  • m

Y T T

f y c y corr T y P y P c y P y C c y e P y C c y

  

  

  

    

 

S { }

s s S

y y

 pixel intensity at

s

y s S 

image patch

slide-19
SLIDE 19

2

( ) (0, sample ) iid

T

  • Y

C

P P c N   1 | | 10 | | random image patch random S S     15 | | smooth image patch S  

: ( ) Null distribution, for estimation

  • nly

matters...

T

  • Y
  • C

P P c

slide-20
SLIDE 20

Example: learning eye templates

1 1 1 1 1 1

1 1 1 1 ,..., ,..., ,..., (1 ( )) 1 1 ,..., ,..., ,...,

( ,..., | ,..., , ,..., , ,..., ) ( ( ))

argmax argmax

T M T

With N 500 compute

m m m m T k m m T m m m m m

N m m m T c y N m

  • m

k C T k T

L y y T T e P c y

         

     

   

 

 

Examples of faces from Feret database

slide-21
SLIDE 21

samples from training set learned templates

Example: learning eye templates, mixing over position, scale, and template

Top to bottom: EM iterations

slide-22
SLIDE 22

Example: learning (right) eye templates

(1 ( )) (1 ( )) 1 1 1 1

) ( ( ))

M M

What if we forget all this nonsense and just maximize (instead of ?

m T k m m T k m m m T m m

c y N N c y m m

  • m

m k k C T k

e e P c y

   

   

       

   

slide-23
SLIDE 23

How good are the templates? A classification experiment…

Classify East Asian and South Asian * mixing over 4 scales, and 8 templates

East Asian: (L) examples of training images (M) progression of EM (R) trained templates South Asian: (L) examples of training images (M) progression of EM (R) trained templates

Classification Rate: 97%

slide-24
SLIDE 24

Other examples: noses 16 templates multiple scales, shifts, and rotations

samples from training set learned templates

slide-25
SLIDE 25

Other examples: mixture of noses and mouths

samples from training set (1/2 noses, 1/2 mouths) 32 learned templates

slide-26
SLIDE 26

Other examples: train on 58 faces …half with glasses…half without

32 learned templates samples from training set 6 learned templates

slide-27
SLIDE 27

Other examples: train on 58 faces …half with glasses…half without

6 learned templates random eight of the 58 faces row 2 to 5, top to bottom: templates ordered by posterior likelihood

slide-28
SLIDE 28

Other examples: train on 58 faces …half with glasses…half without

top row: the six learned templates row 2 to 5, top to bottom: Training images ordered by correlation

slide-29
SLIDE 29

Other examples: train random patches (“sparse representation”)

500 random 15x15 training patches from random internet images 24 10x10 templates

slide-30
SLIDE 30

Other examples: coarse representation

( ) ( , ( )), ( ) ( ( ), )?) use where downconvert (go other way for super res.: f y Corr T D y D f y Corr D T y   

training of 8 low-res (10x10) templates

slide-31
SLIDE 31

Grenander: “pattern synthesis=pattern analysis”

32 samples from mixture model with white noise

Y

P

(approximate) sampling…

slide-32
SLIDE 32

(approximate) sampling…

32 samples from mixture model with Caltech 101

Y

P 

slide-33
SLIDE 33

(approximate) sampling…

32 samples from mixture model with population of smooth image patches

Y

P

slide-34
SLIDE 34

I. Image modeling

  • II. Data likelihood
  • III. Priors: content/context sensitivity
  • Hierarchical models and the Markov

dilemma

  • Conditional modeling
  • Examples: detecting faces and reading

license plates

slide-35
SLIDE 35

Hierarchical models and the Markov Dilemma

{0,1} 1 'pair of eyes'

p p

x x    {0,1 } 1 'left eye'

l l

x x    {0,1} 1 'right eye'

r r

x x   

Markov model

Markov property…

Estimation Computation Representation 1 Given , there are probabilistic constraints

  • n the poses and

appearances of the left and right eyes.

p

x 

slide-36
SLIDE 36

I. Image modeling

  • II. Data likelihood
  • III. Priors: content/context sensitivity
  • Hierarchical models and the Markov

dilemma

  • Conditional modeling
  • Examples: detecting faces and reading

license plates

slide-37
SLIDE 37

Hierarchical models and the Markov Dilemma

{0,1} 1 'pair of eyes'

p p

x x    {0,1 } 1 'left eye'

l l

x x    {0,1} 1 'right eye'

r r

x x   

Markov model

0( )

1 ( ) ( ) ( ( ) |1 ( ) 1)

A

More generally Markov distribution e.g. pair of eyes attribute (e.g. relative poses of two eyes) P de

B B

P x x x B a x A a x x     sired conditional distribution

1 : ( ( )|1 ( ) 1) ( ( )|1 ( ) 1) 1 ( ) 1 1

( ) argmin ( || ) ( ( ) |1 ( ) 1) ( ) ( ) ( ( ) |1 ( ) 1) Choose then

B A B B

P P A a x x P A a x x x A B B

P x D P P P A a x x P x P x P A a x x

     

           

%%

%

slide-38
SLIDE 38

I. Image modeling

  • II. Data likelihood
  • III. Priors: content/context sensitivity
  • Hierarchical models and the Markov

dilemma

  • Conditional modeling
  • Examples: detecting faces and reading

license plates

slide-39
SLIDE 39

characters, plate sides generic letter, generic number, L-junctions of sides license plates parts of characters, parts of plate sides plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits) license numbers (3 digits + 3 letters, 4 digits + 2 letters)

Architecture for reading license plates

slide-40
SLIDE 40

Markov backbone compositional distribution

Sample random 4-digit strings:

“pattern synthesis=pattern analysis” (Markov versus perturbed Markov)

slide-41
SLIDE 41

Original image Zoomed license region Top object: Markov distribution Top object: perturbed (“content-sensitive”) distribution

Markov versus perturbed Markov

slide-42
SLIDE 42

Test set: 385 images, mostly from Logan Airport

Courtesy of Visics Corporation

slide-43
SLIDE 43

Original Image Top object Top 10 objects Top 25 objects

Image interpretation

slide-44
SLIDE 44

Test image Top objects

Image interpretation

slide-45
SLIDE 45
  • 385 images
  • Six plates read with mistakes (>98%)
  • Approx. 99.5% characters read correctly
  • Zero false positives

Performance

slide-46
SLIDE 46

Performance: errors

slide-47
SLIDE 47

Performance: errors

slide-48
SLIDE 48

Face detection (conditional modeling and overlapping patches)

slide-49
SLIDE 49

Blue: best instantiation Green: second-best instantiation Turquoise: third-best

slide-50
SLIDE 50

Blue: best instantiation Green: second-best instantiation Turquoise: third-best

slide-51
SLIDE 51

In summary….

I am advocating strong representation

  • grammatical rules that

are content sensitive

  • data models that

model the data