Deconvolution with ADMM EE367/CS448I: Computational Imaging and - - PowerPoint PPT Presentation
Deconvolution with ADMM EE367/CS448I: Computational Imaging and - - PowerPoint PPT Presentation
Deconvolution with ADMM EE367/CS448I: Computational Imaging and Display stanford.edu/class/ee367 Lecture 6 Gordon Wetzstein Stanford University Lens as Optical Low-pass Filter point source on focal plane maps to point focal plane Lens
Lens as Optical Low-pass Filter
- point source on focal plane maps to point
focal plane
- away from focal plane: out of focus blur
focal plane blurred point
Lens as Optical Low-pass Filter
- shift-invariant convolution
focal plane
Lens as Optical Low-pass Filter
Lens as Optical Low-pass Filter
point spread function (PSF): c
x b
sharp image measured, blurred image
b = c∗ x
convolution kernel is called point spread function (PSF)
Lens as Optical Low-pass Filter
point spread function (PSF): c
x b
sharp image measured, blurred image
b = c∗ x
diffraction-limited PSF of circular aperture (aka “Airy” pattern):
PSF, OTF, MTF
- point spread function (PSF) is fundamental concept in optics
- ptical transfer function (OTF) is (complex) Fourier transform of PSF
- modulation transfer function (MTF) is magnitude of OTF
- example:
PSF OTF=F{PSF} MTF=|OTF|
PSF, OTF, MTF
PSF OTF=F{PSF} MTF=|OTF|
- example:
Deconvolution
- given measurements b and convolution kernel c, what is x?
* =
b c x
?
Deconvolution with Inverse Filtering
- naive solution: apply inverse kernel
! x = c−1 ∗b = F−1 F b
{ }
F c
{ }
⎧ ⎨ ⎩ ⎫ ⎬ ⎭ x ! x
Deconvolution with Inverse Filtering & Noise
- naive solution: apply inverse kernel
- Gaussian noise,
! x = c−1 ∗b = F−1 F b
{ }
F c
{ }
⎧ ⎨ ⎩ ⎫ ⎬ ⎭ ! x σ = 0.05
Deconvolution with Inverse Filtering & Noise
- results: terrible!
- why? this is an ill-posed problem (division by (close to) zero in frequency
domain) à noise is drastically amplified!
- need to include prior(s) on images to make up for lost data
- for example: noise statistics (signal to noise ratio)
Deconvolution with Wiener Filtering
- apply inverse kernel and don’t divide by 0
! x = F−1 F c
{ }
2
F c
{ }
2 + 1SNR
⋅ F b
{ }
F c
{ }
⎧ ⎨ ⎪ ⎩ ⎪ ⎫ ⎬ ⎪ ⎭ ⎪
amplitude-dependent damping factor!
𝑇𝑂𝑆 = 𝑛𝑓𝑏𝑜 𝑡𝑗𝑜𝑏𝑚 ≈ 0.5 𝑜𝑝𝑗𝑡𝑓 𝑡𝑢𝑒 = 𝜏
Deconvolution with Wiener Filtering
x ! x
Naïve inverse filter Wiener
Deconvolution with Wiener Filtering
σ = 0.05 σ = 0.1 σ = 0.01
Deconvolution with Wiener Filtering
- results: not too bad, but noisy
- this is a heuristic à dampen noise amplification
- idea: promote sparse gradients (edges)
- is finite differences operator, i.e. matrix
Total Variation
minimize
x
Cx − b 2
2 + λTV(x) = minimize x
Cx − b 2
2 + λ ∇x 1
x 1 = xi
i
∑
∇
−1 1 −1 1 ! −1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥
Rudin et al. 1992
Total Variation
∗ −1 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥
∇yx x ∇xx
∗ −1 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥
express (forward finite difference) gradient as convolution!
better: isotropic
Total Variation
∇xx
( )
2 + ∇yx
( )
2
x ∇xx
( )
2 +
∇yx
( )
2
easier: anisotropic
Total Variation
- for simplicity, this lecture only discusses anisotropic TV:
- problem: l1-norm is not differentiable, can’t use inverse filtering
- however: simple solution for data fitting along and simple solution
for TV alone à split problem!
TV(x) = ∇xx 1 + ∇yx
1 =
∇x ∇y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ x
1
minimize f (x)+ g(z) subject to Ax + Bz = c f (x) = Cx − b 2
2
g(z) = λ z 1 A = ∇, B = −I, c = 0
Deconvolution with ADMM
- split deconvolution with TV prior:
- general form of ADMM (alternating direction method of multiplies):
minimize Cx − b 2
2 + λ z 1
subject to ∇x = z
minimize f (x)+ g(z) subject to Ax + Bz = c f (x) = Cx − b 2
2
g(z) = λ z 1 A = ∇, B = −I, c = 0
Deconvolution with ADMM
- split deconvolution with TV prior:
- general form of ADMM (alternating direction method of multiplies):
minimize Cx − b 2
2 + λ z 1
subject to ∇x = z
minimize f (x)+ g(z) subject to Ax + Bz = c f (x) = Cx − b 2
2
g(z) = λ z 1 A = ∇, B = −I, c = 0
Deconvolution with ADMM
- split deconvolution with TV prior:
- general form of ADMM (alternating direction method of multiplies):
minimize Cx − b 2
2 + λ z 1
subject to ∇x = z
minimize f (x)+ g(z) subject to Ax + Bz = c f (x) = Cx − b 2
2
g(z) = λ z 1 A = ∇, B = −I, c = 0
Deconvolution with ADMM
- split deconvolution with TV prior:
- general form of ADMM (alternating direction method of multiplies):
minimize Cx − b 2
2 + λ z 1
subject to ∇x = z
ADMM
- Lagrangian (bring constraints into objective = penalty method):
- augmented Lagrangian:
minimize f (x)+ g(z) subject to Ax + Bz = c Lρ(x,y,z) = f (x)+ g(z)+ yT (Ax + Bz − c)+ (ρ / 2) Ax + Bz − c 2
2
L(x,y,z) = f (x)+ g(z)+ yT (Ax + Bz − c)
dual variable or Lagrange multiplier additional penalty term
ADMM
- augmented Lagrangian is differentiable under mild conditions (usually
better convergence etc.)
minimize f (x)+ g(z) subject to Ax + Bz = c Lρ(x,y,z) = f (x)+ g(z)+ yT (Ax + Bz − c)+ (ρ / 2) Ax + Bz − c 2
2
- ADMM consists of 3 steps per iteration k:
ADMM
minimize f (x)+ g(z) subject to Ax + Bz = c xk+1 := argmin
x
Lρ(x,zk,yk) zk+1 := argmin
z
Lρ(xk+1,z,yk) yk+1 := yk + ρ(Axk+1 + Bzk+1 − c)
ADMM
- ADMM consists of 3 steps per iteration k:
minimize f (x)+ g(z) subject to Ax + Bz = c xk+1 := argmin
x
f (x)+ (ρ / 2) Ax + Bzk − c + uk
( )
zk+1 := argmin
z
g(z)+ (ρ / 2) Axk+1 + Bz − c + uk
( )
uk+1 := uk + Axk+1 + Bzk+1 − c
constant
u = (1/ ρ)y
scaled dual variable:
ADMM
- ADMM consists of 3 steps per iteration k:
minimize f (x)+ g(z) subject to Ax + Bz = c xk+1 := argmin
x
f (x)+ (ρ / 2) Ax + Bzk − c + uk
2 2
( )
zk+1 := argmin
z
g(z)+ (ρ / 2) Axk+1 + Bz − c + uk
2 2
( )
uk+1 := uk + Axk+1 + Bzk+1 − c
split f(x) and g(x) into independent problems! (u connects them)
u = (1/ ρ)y
scaled dual variable:
Deconvolution with ADMM
- ADMM consists of 3 steps per iteration k:
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 xk+1 := argmin
x
1 2 Cx − b 2
2 + (ρ / 2) ∇x − zk + uk 2 2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟ zk+1 := argmin
z
λ z 1 + (ρ / 2) ∇xk+1 − z + uk
2 2
( )
uk+1 := uk + ∇xk+1 − zk+1
Deconvolution with ADMM
- 1. x-update:
xk+1:= argmin
x
1 2 Cx − b 2
2 + (ρ / 2) ∇x − zk + uk 2 2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟ CTC + ρ∇T∇
( )x = CTb + ρ∇Tv ( )
constant, say
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 v = zk − uk ∇Tv = ∇x ∇y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥
T
v = ∇x
Tv1 + ∇y Tv2
solve normal equations
Deconvolution with ADMM
- 1. x-update:
- inverse filtering:
à may blow up, but that’s okay
xk+1:= argmin
x
1 2 Cx − b 2
2 + (ρ / 2) ∇x − zk + uk 2 2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟ x = CTC + ρ∇T∇
( )
−1 CTb + ρ∇Tv
( )
constant, say
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 v = zk − uk
xk+1 = F−1 F c
{ }
* ⋅F b
{ }+ ρ F ∇x
{ }
* ⋅F v1
{ }+ F ∇y
{ }
* ⋅F v2
{ }
( )
F c
{ }
* ⋅F c
{ }+ ρ F ∇x
{ }
* ⋅F ∇x
{ }+ F ∇y
{ }
* ⋅F ∇y
{ }
( )
⎧ ⎨ ⎪ ⎩ ⎪ ⎫ ⎬ ⎪ ⎭ ⎪
precompute!
Deconvolution with ADMM
- 2. z-update:
- l1-norm is not differentiable! yet, closed-form solution via el
elem ement ent-wise se so soft thresh sholding:
zk+1 := argmin
z
λ z 1 + (ρ / 2) ∇xk+1 − z + uk
2 2
( )
:= argmin
z
λ z 1 + (ρ / 2) z − a 2
2
constant, say
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 zk+1 := Sλ/ρ(a) Sκ (a) = a −κ a >κ a ≤κ a +κ a < −κ ⎧ ⎨ ⎪ ⎩ ⎪ = (a −κ )+ − (−a −κ )+ a = ∇xk+1 + uk
κ = λ / ρ
Deconvolution with ADMM
for k=1:max_iters
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 xk+1 := argmin
x
1 2 C ρ∇ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ x − b ρv ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ 2
2
⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ zk+1 := Sλ/ρ(∇xk+1 + uk) uk+1 := uk + ∇xk+1 − zk+1
inverse filtering element-wise threshold trivial
Deconvolution with ADMM
for k=1:max_iters
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0 xk+1 := argmin
x
1 2 C ρ∇ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ x − b ρv ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ 2
2
⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ zk+1 := Sλ/ρ(∇xk+1 + uk) uk+1 := uk + ∇xk+1 − zk+1
inverse filtering element-wise threshold trivial à easy! J
Deconvolution with ADMM
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0
Wiener filtering ADMM with anisotropic TV, λ = 0.01, ρ = 10
Deconvolution with ADMM
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0
λ = 0.1, ρ = 10 λ = 0.05, ρ = 10 λ = 0.01, ρ = 10
- too much TV: “patchy”, too little TV: noisy
Deconvolution with ADMM
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0
Wiener filtering ADMM with anisotropic TV, λ = 0.1, ρ = 10
Deconvolution with ADMM
minimize 1 2 Cx − b 2
2 + λ z 1
subject to ∇x − z = 0
λ = 0.1, ρ = 10 λ = 0.05, ρ = 10 λ = 0.01, ρ = 10
- too much TV: okay because image
actually has sparse gradients!
Outlook ADMM
- powerful tool for many computational imaging problems
- include generic prior in g(z), just need to derive proximal operator
- example priors: noise statistics, sparse gradient, smoothness, …
- weighted sum of different priors also possible
- anisotropic TV is one of the easiest priors
minimize
x,z
{ }
f (x)+ g(z) subject to Ax = z minimize
x
1 2 Ax − b 2
2 data fidelity
! " # $ # + Γ(x)
regularization
%
Remember!
- implement matrix-free operations for Ax and A’x if efficient (e.g.
multiplications and divisions in frequency space)
- split difficult problems (e.g., inverse problems with non-
differentiable priors) into easier subproblems - ADMM
Homework 3
- implement:
- filtering
- inverse filtering and Wiener filtering
- deconvolution with ADMM + (anisotropic) TV prior
- notes for ADMM implementation:
- initialize U, Z, X with 0
- implement with matrix-free form: all FT multiplications / divisions
- in 2D, finite differences matrix becomes
(anisotropic form), use matrix free-operations as well!
- see note notes in HW
- check ADMM example scripts: ht
http://web.st stanford.edu/~ /~boyd yd/papers/ s/ad admm/ /
∇ = ∇x ∇y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥
Notes for Homework 3
I ∈ℜM ×N, X ∈ℜMN×1 U ∈ℜ2MN×1, Z ∈ℜ2MN×1
- signal-to-noise ratio (SNR):
- peak signal-to-noise ratio (PSNR):
(always in dB)
- residual is value of objective function:
- convergence: residual for increasing iterations (should always decrease!)
Notes for Homework 3
1 2 Cx − b 2
2 + λ
∇x ∇y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ x
1
1 2 Cx − b 2
2
not regularized: regularized:
MSE = 1 mn xtarget − xest
( )
n
∑
m
∑
2
PSNR = 10⋅log10 max(xtarget )2 MSE ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = 10⋅log10 1 MSE ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ SNR = P
signal
P
noise
SNRdB = 10⋅log10 P
signal
P
noise
⎛ ⎝ ⎜ ⎞ ⎠ ⎟
References and Further Reading
- Boyd
yd, Parikh kh, Chu, Pe Peleato, Eckst kstein, “Dist stributed Optimiza zation and Statist stical Learning vi via the Alternating Direction Method of Multipliers” s”, Foundations s and Trends s in Machine Learning, 2011
- A. Chambolle, T. Pock “A first-order primal-dual algorithm for convex problems with applications in imaging”, Journal of Mathematical Imaging
and Vision, 2011
- Boreman, “Modulation Transfer Function in Optical and ElectroOptical Systems”, SPIE Publications, 2001
- Rudin, Osher, Fatemi, “Nonlinear total variation based noise removal algorithms”, Physica D: Nonlinear Phenomena 60, 1
- http://www.imagemagick.org/Usage/fourier/