SLIDE 1 Statistical Image Models
Eero Simoncelli
Howard Hughes Medical Institute, Center for Neural Science, and Courant Institute of Mathematical Sciences New York University
SLIDE 2 Photographic Images
Diverse specialized structures:
- edges/lines/contours
- shadows/highlights
- smooth regions
- textured regions
SLIDE 3 Photographic Images
Diverse specialized structures:
- edges/lines/contours
- shadows/highlights
- smooth regions
- textured regions
Occupy a small region of the full space
SLIDE 4 space of all images t y p i c a l i m a g e s
One could describe this set as a deterministic manifold....
SLIDE 5
SLIDE 6
SLIDE 7
- Step edges are rare (lighting, junctions, texture, noise)
SLIDE 8
- Step edges are rare (lighting, junctions, texture, noise)
- One scale’s texture is another scale’s edge
SLIDE 9
- Step edges are rare (lighting, junctions, texture, noise)
- One scale’s texture is another scale’s edge
- Need seamless transitions from isolated features to
dense textures
SLIDE 10 One could describe this set as a deterministic manifold....
space of all images t y p i c a l i m a g e s
SLIDE 11 One could describe this set as a deterministic manifold.... But seems more natural to use probability
space of all images t y p i c a l i m a g e s
SLIDE 12 One could describe this set as a deterministic manifold.... But seems more natural to use probability
space of all images t y p i c a l i m a g e s
P(x)
SLIDE 13 “Applications”
- Engineering: compression, denoising, restoration,
enhancement/modification, synthesis, manipulation
[Hubel ‘95]
SLIDE 14 “Applications”
- Engineering: compression, denoising, restoration,
enhancement/modification, synthesis, manipulation
- Science: optimality principles for neurobiology (evolution,
development, learning, adaptation)
[Hubel ‘95]
SLIDE 15 Density models
nonparametric parametric/ constrained
SLIDE 16 Density models
nonparametric parametric/ constrained build a histogram from lots of
SLIDE 17 Density models
nonparametric parametric/ constrained build a histogram from lots of
use “natural constraints” (geometry/photometry
computation, maxEnt)
SLIDE 18 Density models
nonparametric parametric/ constrained build a histogram from lots of
use “natural constraints” (geometry/photometry
computation, maxEnt) historical trend (technology driven)
SLIDE 19 50 100 150 200 250
histogram Original image
Range: [0, 237] Dims: [256, 256]
SLIDE 20 50 100 150 200 250
histogram Original image
Range: [0, 237] Dims: [256, 256]
50 100 150 200 250
histogram Equalized image
Range: [1.99, 238] Dims: [256, 256]
SLIDE 21 50 100 150 200 250
histogram Original image
Range: [0, 237] Dims: [256, 256]
50 100 150 200 250
histogram Equalized image
Range: [1.99, 238] Dims: [256, 256]
SLIDE 22
General methodology
Observe “interesting” Joint Statistics Transform to Optimal Representation
SLIDE 23
General methodology
Observe “interesting” Joint Statistics Transform to Optimal Representation
SLIDE 24
General methodology
Observe “interesting” Joint Statistics Transform to Optimal Representation
“Onion peeling”
SLIDE 25 Evolution of image models
- I. (1950’s): Fourier + Gaussian
- II. (mid 80’s - late 90’s): Wavelets + kurtotic marginals
- III. (mid 90’s - present): Wavelets + local context
- local amplitude (contrast)
- local orientation
- IV. (last 5 years): Hierarchical models
SLIDE 26 I(x,y) I(x+1,y) I(x,y) I(x+2,y) I(x,y) I(x+4,y)
10 1 Spatial separation (pixels)
Correlation
a. b.
Pixel correlation
SLIDE 27 I(x,y) I(x+4,y)
10 20 30 40 1 Spatial separation (pixels)
Correlation
b.
I(x,y) I(x+1,y) I(x,y) I(x+2,y) I(x,y) I(x+4,y)
10 1 Spatial separation (pixels)
Correlation
a. b.
Pixel correlation
SLIDE 28
Translation invariance
Assuming translation invariance,
SLIDE 29
Translation invariance
Assuming translation invariance, => covariance matrix is Toeplitz (convolutional)
SLIDE 30
Translation invariance
Assuming translation invariance, => covariance matrix is Toeplitz (convolutional) => eigenvectors are sinusoids
SLIDE 31
Translation invariance
Assuming translation invariance, => covariance matrix is Toeplitz (convolutional) => eigenvectors are sinusoids => can diagonalize (decorrelate) with F.T.
SLIDE 32
Translation invariance
Assuming translation invariance, => covariance matrix is Toeplitz (convolutional) => eigenvectors are sinusoids => can diagonalize (decorrelate) with F.T. Power spectrum captures full covariance structure
SLIDE 33 Spectral power
Structural:
F(sω) = spF(ω) F(ω) ∝ 1 ωp
[Ritterman 52; DeRiugin 56; Field 87; Tolhurst 92; Ruderman/Bialek 94; ...]
Assume scale-invariance: then:
SLIDE 34 Spectral power
Structural:
F(sω) = spF(ω) F(ω) ∝ 1 ωp
[Ritterman 52; DeRiugin 56; Field 87; Tolhurst 92; Ruderman/Bialek 94; ...]
Assume scale-invariance: then:
1 2 3 1 2 3 4 5 6
Log10 spatialfrequency (cycles/image) Log10 power
Empirical:
SLIDE 35 Principal Components Analysis (PCA) + whitening
20
20 4
4
20
20
a. b. c.
SLIDE 36
PCA basis for image blocks
SLIDE 37
PCA basis for image blocks
PCA is not unique
SLIDE 38
Maximum entropy (maxEnt)
E (f(x)) = c f(x) = x2 f(x) = |x| The density with maximal entropy satisfying pME(x) ∝ exp (−λf(x)) is of the form Examples: where λ depends on c
SLIDE 39 Model I (Fourier/Gaussian)
Basis set: Image: Coefficient density:
SLIDE 40 F -1
1/f2 P(c)
Gaussian model is weak
P(x)
F−1
ω−2
SLIDE 41 F -1
1/f2 P(c)
Gaussian model is weak
a. b.
ω2 F F−1
P(x)
F−1
ω−2
SLIDE 42 F -1
1/f2 P(c)
Gaussian model is weak
a. b.
ω2 F F−1
20
20 4
4
20
20
a. b. c.
P(x)
F−1
ω−2
SLIDE 43 Bandpass Filter Responses
500 500 10
10
10
Filter Response Probability
Response histogram Gaussian density
[Burt&Adelson 82; Field 87; Mallat 89; Daugman 89, ...]
SLIDE 44 “Independent” Components Analysis (ICA)
For Linearly Transformed Factorial (LTF) sources: guaranteed independence
(with some minor caveats)
4
4 20
20
20
20
4
4
a. b. c. d.
[Comon 94; Cardoso 96; Bell/Sejnowski 97; ...]
SLIDE 45 ICA on image blocks
[Olshausen/Field ’96; Bell/Sejnowski ’97] [example obtained with FastICA, Hyvarinen]
SLIDE 46 Marginal densities
P(x) ∝ exp −|x/s|p
[Mallat 89; Simoncelli&Adelson 96; Moulin&Liu 99; ...]
Well-fit by a generalized Gaussian:
Wavelet coefficient value log(Probability) p = 0.46 H/H = 0.0031 Wavelet coefficient value log(Probability) p = 0.58 H/H = 0.0011 Wavelet coefficient value log(Probability) p = 0.48 H/H = 0.0014
SLIDE 47 Kurtosis vs. bandwidth
0.5 1 1.5 2 2.5 3 4 6 8 10 12 14 16 Filter Bandwidth (octaves) Sample Kurtosis
[after Field 87]
Note: Bandwidth matters much more than orientation
[see Bethge 06]
SLIDE 48 Octave-bandwidth representations
Filter: Spatial Frequency Selectivity:
SLIDE 49 Model II (LTF)
Basis set: Image: Coefficient density:
SLIDE 50 LTF also a weak model...
Sample Gaussianized
Sample ICA-transformed and Gaussianized
SLIDE 51
Trouble in paradise
SLIDE 52 Trouble in paradise
- Biology: Visual system uses a cascade
- Where’s the retina? The LGN?
- What happens after V1? Why don’t responses get
sparser? [Baddeley etal 97; Chechik etal 06]
SLIDE 53 Trouble in paradise
- Biology: Visual system uses a cascade
- Where’s the retina? The LGN?
- What happens after V1? Why don’t responses get
sparser? [Baddeley etal 97; Chechik etal 06]
- Statistics: Images don’t obey ICA source model
- Any bandpass filter gives sparse marginals [Baddeley 96]
=> Shallow optimum [Bethge 06; Lyu & Simoncelli 08]
- The responses of ICA filters are highly dependent
[Wegmann & Zetzsche 90, Simoncelli 97]
SLIDE 54 Conditional densities
40
50
0.2 0.6 1
40 0.2 0.6 1
40
40
[Simoncelli 97; Schwartz&Simoncelli 01]
SLIDE 55 [Schwartz&Simoncelli 01]
SLIDE 56
- Large-magnitude subband coefficients are found at
neighboring positions, orientations, and scales.
SLIDE 57 Method 1: Conditional Gaussian
[Simoncelli 97; Buccigrossi&Simoncelli 99; see also ARCH models in econometrics!]
Modeling heteroscedasticity
(i.e., variable variance)
P (xn|{xk}) ∼ N
wnk |xk|2 + σ2
SLIDE 58 Joint densities
adjacent near far
100 100 150 100 50 50 100 150 100 100 150 100 50 50 100 150 100 100 150 100 50 50 100 150 500 500 150 100 50 50 100 150 100 100 150 100 50 50 100 150 100 100 150 100 50 50 100 150 100 100 150 100 50 50 100 150 100 100 150 100 50 50 100 150 500 500 150 100 50 50 100 150 100 100 150 100 50 50 100 150
[Simoncelli, ‘97; Wainwright&Simoncelli, ‘99]
- Nearby: densities are approximately circular/elliptical
- Distant: densities are approximately factorial
SLIDE 59 ICA-transformed joint densities
d=2 d=16 d=32
kurtosis
data (ICA’d): factorialized: sphericalized:
4 6 8 10 12 !/4 !/2 3!/4 ! 4 6 8 10 12 !/4 !/2 3!/4 ! 4 6 8 10 12 !/4 !/2 3!/4 !
SLIDE 60 ICA-transformed joint densities
d=2 d=16 d=32
kurtosis
data (ICA’d): factorialized: sphericalized:
4 6 8 10 12 !/4 !/2 3!/4 ! 4 6 8 10 12 !/4 !/2 3!/4 ! 4 6 8 10 12 !/4 !/2 3!/4 !
- Local densities are elliptical (but non-Gaussian)
- Distant densities are factorial
[Wegmann&Zetzsche ‘90; Simoncelli ’97; + many recent models]
SLIDE 61 3 6 9 12 15 18 20 0.05 0.1 0.15 0.2
kurtosis
blk size = 3x3
blk spherical factorial
Spherical vs LTF
3x3 7x7 15x15
3 6 9 12 15 18 20 0.05 0.1 0.15 0.2
kurtosis
blk size = 7x7
blk spherical factorial 3 6 9 12 15 18 20 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
kurtosis
blk size = 11x11
blk spherical factorial
data (ICA’d): factorialized: sphericalized:
- Histograms, kurtosis of projections of image blocks onto random
unit-norm basis functions.
- These imply data are closer to spherical than factorial
[Lyu & Simoncelli 08]
SLIDE 62
- Zetzsche & Krieger, 1999;
- Huang & Mumford, 1999;
- Wainwright & Simoncelli, 2000;
- Hyvärinen and Hoyer, 2000;
- Parra et al., 2001;
- Srivastava et al., 2002;
- Sendur & Selesnick, 2002;
- Teh et al., 2003;
- Gehler and Welling, 2006
- Lyu & Simoncelli, 2008
- etc.
non-Gaussian elliptical observations and models of natural images:
SLIDE 63
- is Gaussian,
- and are independent
- is elliptically symmetric, with covariance
- marginals of are leptokurtotic
[Wainwright&Simoncelli 99]
Modeling heteroscedasticity
Method 2: Hidden scaling variable for each patch
Gaussian scale mixture (GSM)
[Andrews & Mallows 74]:
u z
z > 0
∝ Cu
SLIDE 64
- Empirically, z is approximately lognormal
[Portilla etal, icip-01]
- Alternatively, can use Jeffrey’s noninformative prior
[Figueiredo&Nowak, ‘01; Portilla etal, ‘03]
GSM - prior on z
pz(z) = exp (−(log z − µl)2/(2σ2
l ))
z(2πσ2
l )1/2
pz(z) ∝ 1/z
SLIDE 65 GSM simulation
!!" " !" #"
"
#"
!
!!" " !" #"
"
#"
!
Image data GSM simulation
[Wainwright & Simoncelli, NIPS*99]
SLIDE 66 Model III (GSM)
Basis set: Image: Coefficient density:
SLIDE 67 Original coefficients Normalized by √z marginal
500 500 10 8 6 4 2 Log probability 5 5 10 9 8 7 6 5 4 Log probability
joint
100 50 50 100 100 50 50 100 2 4 6 8 2 4 6 8
subband
[Schwartz&Simoncelli 01] [Ruderman&Bialek 94]
SLIDE 68 First Order Ideal Conditional Model 1 2 3 4 5 6 1 2 3 4 5 6 Empirical Conditional Entropy Model Encoding cost (bits/coeff) Gaussian Model Generalized Laplacian 3 4 5 2.5 3 3.5 4 4.5 5 5.5 Empirical First Order Entropy (bits/coeff) Model Encoding Cost (bits/coeff)
[Buccigrossi & Simoncelli 99]
SLIDE 69 Bayesian denoising
- Additive Gaussian noise:
- Bayes’ least squares solution is conditional mean:
y = x + w
P(y|x) ∝ exp[−(y − x)2/2σ2
w]
ˆ x(y) = I E(x|y)
=
SLIDE 70
If signal is Gaussian, BLS estimator is linear:
denoised (ˆ x) noisy (y)
ˆ x(y) = σ2
x
σ2
x + σ2 n
· y
=> suppress fine scales, retain coarse scales
SLIDE 71 Non-Gaussian coefficients
[Burt&Adelson ‘81; Field ‘87; Mallat ‘89; Daugman ‘89; etc]
!!"" " !"" #"
!$
#"
!%
#"
"
&'()*+,-*./01.* 2+0343'(')5
94:..'41,;*1.')5,,
SLIDE 72
- II. BLS for non-Gaussian prior
- Assume marginal distribution [Mallat ‘89]:
- Then Bayes estimator is generally nonlinear:
P(x) ∝ exp −|x/s|p
p = 2.0 p = 1.0 p = 0.5
[Simoncelli & Adelson, ‘96]
SLIDE 73 MAP shrinkage
p=2.0 p=1.0 p=0.5
[Simoncelli 99]
SLIDE 74 Denoising: Joint
I E(x| y) =
dz P(z|
y) I E(x| y, z) =
dz P(z|
y)
zCu(zCu + Cw)−1
y
ctr
where P(z| y) = P( y|z) P(z) P y , P( y|z) = exp(− yT(zCu + Cw)−1 y/2)
Numerical computation of solution is reasonably efficient if
- ne jointly diagonalizes Cu and Cw ...
[Portilla, Strela, Wainwright, Simoncelli, ’03]
IPAM, 9/04 20
SLIDE 75 Example estimators
Estimators for the scalar and single-neighbor cases
NOISY COEFF. ESTIMATED COEFF. !w
!!" " !" !#" " #" !!" " !" $%&'()*%+,,- $%&'()./0+$1 +'1&2/1+3)*%+,,-
[Portilla etal 03]
SLIDE 76 Comparison to other methods
Results averaged over 3 images
!" #" $" %" &" !$'& !$ !#'& !# !!'& !! !"'& " "'& ()*+,-*.)/-0123 /456,(74-.)/-0123 )89!:1:;<=>? .=9@A?-9?=@BC8D !" #" $" %" &" !$'& !$ !#'& !# !!'& !! !"'& " "'& ()*+,-*.)/-0123 ,456748+91:;<= :>6965#8*>?6<
[Portilla etal 03]
SLIDE 77
Original Noisy (22.1 dB) Matlab’s wiener2 (28 dB) BLS-GSM (30.5 dB)
SLIDE 78
Original Noisy (8.1 dB) UndWvlt Thresh (19.0 dB) BLS-GSM (21.2 dB)
SLIDE 79
Real sensor noise
400 ISO denoised
SLIDE 80 GSM summary
- GSM captures local variance
- Underlying Gaussian leads to simple computation
- Excellent denoising results
- What’s missing?
- Global model of z variables [Wainwright etal 99;
Romberg etal ‘99; Hyvarinen/Hoyer ‘02; Karklin/ Lewicki ‘02; Lyu/Simoncelli 08]
- Explicit geometry: phase and orientation
SLIDE 81 Global models for z
- Non-overlapping neighborhoods, tree-structured z
[Wainwright etal 99; Romberg etal ’99]
- Field of GSMs: z is an exponentiated GMRF, is
a GMRF, subband is the product
[Lyu&Simoncelli 08]
z
Coarse scale Fine scale
SLIDE 82 Lena Boats
! "#"! $! !# %! "## !& !' !$ !" # "
σ
∆()*+,
! "#"! $! !# %! "## !& !' !$ !" # "
σ
∆()*+,
FoE kSVD GSM BM3D FoGSM
[Lyu&Simoncelli, PAMI 08]
State-of-the-art denoising
SLIDE 83 2-band steerable pyramid: Image decomposition in terms of multi-scale gradient measurements
Measuring Orientation
[Simoncelli et.al., 1992; Simoncelli & Freeman 1995]
SLIDE 84
Multi-scale gradient basis
SLIDE 85 Multi-scale gradient basis
- Multi-scale bases: efficient representation
SLIDE 86 Multi-scale gradient basis
- Multi-scale bases: efficient representation
- Derivatives: good for analysis
- Local Taylor expansion of image structures
- Explicit geometry (orientation)
SLIDE 87 Multi-scale gradient basis
- Multi-scale bases: efficient representation
- Derivatives: good for analysis
- Local Taylor expansion of image structures
- Explicit geometry (orientation)
- Combination:
- Explicit incorporation of geometry in basis
- Bridge between PDE / harmonic analysis
approaches
SLIDE 88
magnitude
[Hammond&Simoncelli 06; cf. Oppenheim and Lim 81]
SLIDE 89 Importance of local orientation
Randomized orientation Randomized magnitude
[Hammond&Simoncelli 05]
SLIDE 90 Reconstruction from orientation
- Reconstruction by projections onto convex sets
- Resilient to quantization
Quantized to 2 bits
[Hammond&Simoncelli 06]
Original
SLIDE 91 Image patches related by rotation
two-band steerable pyramid coefficients
[Hammond&Simoncelli 06]
SLIDE 92 raw patches rotated patches
Rotated Patches
PCA of normalized gradient patches
[Hammond&Simoncelli 06]
SLIDE 93 Orientation-Adaptive GSM model
patch rotation operator hidden magnitude/orientation variables Model a vectorized patch of wavelet coefficients as:
[Hammond&Simoncelli 06]
SLIDE 94 Orientation-Adaptive GSM model
patch rotation operator hidden magnitude/orientation variables Model a vectorized patch of wavelet coefficients as: Conditioned on ; is zero mean gaussian with covariance
[Hammond&Simoncelli 06]
SLIDE 95 [Hammond&Simoncelli 06]
Estimation of C(θ) from noisy data
noisy patch unknown, approximate by measured from noisy data. Assuming independent and noise rotationally invariant (assuming w.l.o.g. E[z] =1 )
SLIDE 96 Bayesian MMSE Estimator
[Hammond&Simoncelli 06]
SLIDE 97 Bayesian MMSE Estimator
condition on and integrate
[Hammond&Simoncelli 06]
SLIDE 98 Bayesian MMSE Estimator
condition on and integrate
[Hammond&Simoncelli 06]
SLIDE 99 Bayesian MMSE Estimator
condition on and integrate
Wiener estimate
[Hammond&Simoncelli 06]
SLIDE 100 Bayesian MMSE Estimator
condition on and integrate
has covariance separable prior for hidden variables Wiener estimate
[Hammond&Simoncelli 06]
SLIDE 101 Bayesian MMSE Estimator
condition on and integrate
has covariance separable prior for hidden variables Wiener estimate
[Hammond&Simoncelli 06]
SLIDE 102
13.1 dB gsm2 12.4 dB σ = 40 noisy 2.81 dB
SLIDE 103 Locally adaptive covariance
- Karklin & Lewicki 08: Each patch is Gaussian,
with covariance constructed from a weighted outer- product of fixed vectors:
- Guerrero-Colon, Simoncelli & Portilla 08: Each
patch is a mixture of GSMs (MGSMs):
p( x) =
Pk
x; zkCk) dzk p( y) =
exp(−|yn|) Bn =
wnk bk bk
T
log C( y) =
ynBn p( x) = G ( x; C( y))
SLIDE 104 MGSMs generative model
Patch chosen from with probabilities Parameters:
- Covariances
- Scale densities
- Component probabilities
- Number of components
Ck {√z1 u1, √z2 u2, ...√zK uK} {P1, P2, ..., PK} pk(zk) Pk K Parameters can be fit to data of one or more images by maximizing likelihood (EM-like)
[Guerrero-Colon, Simoncelli, Portilla 08]
SLIDE 105 MGSM “segmentation”
First six eigenvectors
covariance matrices
image 1 2 4
[Guerrero-Colon, Simoncelli, Portilla 08]
SLIDE 106
MGSM “segmentation”
Eigenvectors of GSM components represent invariant subspaces: “generalized complex cells”
SLIDE 107 Potential of local homogeneous models?
- marginal statistics [var,skew,kurtosis]
- local raw correlations
- local variance correlations
- local phase correlations
Consider an implicit model: maxEnt subject to constraints on subband coefficients:
[Portilla & Simoncelli 00;
- cf. Zhu, Wu & Mumford 97]
SLIDE 108
Visual texture
SLIDE 109
Visual texture
Homogeneous, with repeated structures
SLIDE 110
Visual texture
Homogeneous, with repeated structures “You know it when you see it”
SLIDE 111 All Images
Texture Images Equivalence class (visually indistinguishable)
SLIDE 112 Iterative synthesis algorithm
Synthesis Analysis
Transform
Measure Statistics
Example Texture Random Seed Synthesized Texture
Transform
Measure Statistics
Adjust Inverse Transform
[Portilla&Simoncelli 00; cf. Heeger&Bergen ‘95]
SLIDE 113
Examples: Artificial
SLIDE 114
Photographic, quasi-periodic
SLIDE 115
Photographic, aperiodic
SLIDE 116
Photographic, structured
SLIDE 117
Photographic, color
SLIDE 118
Non-textures?
SLIDE 119
Texture mixtures
SLIDE 120
Texture mixtures
Convex combinations in parameter space
SLIDE 121
Texture mixtures
Convex combinations in parameter space => Parameter space includes non-textures
SLIDE 122 Summary
- Fusion of empirical data with structural principles
- Statistical models have led to state-of-the-art image
processing, and are relevant for biological vision
- Local adaptation to {variance, orientation,
phase, ...} gives improvement, but makes learning harder
- Cascaded representations emerge naturally
- There’s still much room for improvement!
SLIDE 123 Cast
- Local GSM model: Martin Wainwright, Javier Portilla
- GSM Denoising: Javier Portilla, Martin Wainwright,
Vasily Strela
- Variance-adaptive compression: Robert Buccigrossi
- Local orientation and OAGSM: David Hammond
- Field of GSMs: Siwei Lyu
- Mixture of GSMs: Jose-Antonio Guerrero-Colón,
Javier Portilla
- Texture representation/synthesis: Javier Portilla