First-Order Algorithms for Approximate TV-Regularized Image - - PowerPoint PPT Presentation

first order algorithms for approximate tv regularized
SMART_READER_LITE
LIVE PREVIEW

First-Order Algorithms for Approximate TV-Regularized Image - - PowerPoint PPT Presentation

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34 Motivation and


slide-1
SLIDE 1

First-Order Algorithms for Approximate TV-Regularized Image Denoising

Stephen Wright

University of Wisconsin-Madison

Vienna, July 2009

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34

slide-2
SLIDE 2

1

Motivation and Introduction

2

Image Processing: Denoising

3

Primal-Dual Methods (Zhu and Chan)

4

GPU Implementations +Mingqiang Zhu, Tony Chan (Image Denoising), Sangkyun Lee (GPU Implementation)

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 2 / 34

slide-3
SLIDE 3

Motivation

Many applications give rise to optimization problems for which simple, approximate solutions are required, rather than complex exact solutions. Occam’s Razor Data quality doesn’t justify exactness Possibly more robust to data perturbations (not “overoptimized”) Easier to actuate / implement / store simple solutions Conforms better to prior knowledge. When formulated with variable x ∈ Rn, simplicity is often manifested as sparsity in x (few nonzero components) or in a simple transformation of x.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 3 / 34

slide-4
SLIDE 4

Formulating and Solving

Two basic ingredients: Underlying optimization problem — often of data-fitting type Regularization term or constraints to “encourage” sparsity — often nonsmooth. Usually very large problems. Need techniques from Large-scale optimization Nonsmooth optimization Inverse Problems, PDEs, ... Domain- and application-specific knowledge.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 4 / 34

slide-5
SLIDE 5

Image Processing: TV Denoising

Rudin-Osher-Fatemi (ROF) model (ℓ2-TV). Given a domain Ω ⊂ R2 and an observed image f : Ω → R, seek a restored image u : Ω → R that preserves edges while removing noise. The regularized image u can typically be stored more economically. Seek to “minimize” both u − f 2 and the total-variation (TV) norm

  • Ω |∇u| dx.

Use constrained formulations, or a weighting of the two objectives: min

u

P(u) :=

|∇u| dx + λ 2 u − f 2

2.

The minimizing u tends to have regions in which u is constant (∇u = 0). More “cartoon-like” when λ is small.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 5 / 34

slide-6
SLIDE 6

TV-Regularized Image Denoising

min

u

P(u) :=

|∇u| dx + λ 2 u − f 2

2.

Difficult to apply gradient-projection-type approaches (like GPSR or SpaRSA for compressed sensing) as: In the constrained formulation with a feasible set that allows easy projection (needed for GPSR) - TV is more complicated than

  • Ω |u|.

The SpaRSA subproblem has the same form as the original problem (since ATA = λI) and hence just as hard to solve. However, if we discretize and take the dual we obtain a problem amenable to gradient-projection approaches.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 6 / 34

slide-7
SLIDE 7

Dual Formulation

Redefine the TV seminorm:

|∇u| = max

w∈C 1

0 (Ω), |w|≤1

∇u · w = max

|w|≤1

−u ∇ · w, where w : Ω → R2. Rewrite primal formulation as min

u

max

w∈C 1

0 (Ω), |w|≤1

−u ∇ · w + λ 2 u − f 2

2.

Exchange min and max, and do the inner minimization wrt u explicitly: u = f + 1 λ∇ · w. Thus obtain the dual: max

w∈C 1

0 (Ω), |w|≤1 D(w) := λ

2

  • f 2

2 −

  • 1

λ∇ · w + f

  • 2

2

  • .

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 7 / 34

slide-8
SLIDE 8

Discretization

Assume Ω = [0, 1] × [0, 1], discretization with an n × n regular grid, where uij approximates u at (i − 1)/(n − 1) (j − 1)/(n − 1)

  • ∈ Ω,

i, j = 1, 2, . . . , n. The discrete approximation to the TV norm is thus TV(u) =

  • 1≤i,j,≤n

(∇u)i,j, where (∇u)1

i,j =

ui+1,j − ui,j if i < n if i = n (∇u)2

i,j =

ui,j+1 − ui,j if j < n if j = n.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 8 / 34

slide-9
SLIDE 9

By reorganizing the N = n2 components of u into a vector v ∈ RN, and f into a vector g ∈ RN, we write the discrete primal ROF model as min

v N

  • l=1

AT

l v2 + λ

2 v − g2

2,

where Al is an N × 2 matrix with at most 4 nonzero entries (+1 or −1). Introduce a vector representation x ∈ R2N of w : Ω → R2. Obtain the discrete dual ROF (scaled and shifted): min

x∈X

1 2Ax − λg2

2

where X := {(x1; x2; . . . ; xN) ∈ R2N : xl ∈ R2, xl2 ≤ 1 for all l = 1, 2, . . . , N}, where A = [A1, A2, . . . , AN] ∈ RN×2N.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 9 / 34

slide-10
SLIDE 10

Set X ⊂ R2N is a Cartesian product of N unit circles in R2. Projections

  • nto X are trivial. Can apply gradient projection ideas. (Curvature of the

boundaries of X adds some interesting twists.) The discrete primal-dual solution (v, x) is a saddle point of ℓ(v, x) := xTATv + λ 2 v − g2

2

  • n the space RN × X. Since the discrete primal is strictly convex, we have:
  • Proposition. Let {xk} be any sequence in X whose accumulation points

are all stationary for the dual problem. Then {vk} defined by vk = g − 1 λAxk converges to the unique solution of the primal problem. The required property of {xk} holds for any reasonable gradient projection algorithm.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 10 / 34

slide-11
SLIDE 11

Other Methods

Embedding in a parabolic PDE [ROF, 1992] Apply Newton-like method to the optimality conditions for a smoothed version, in which |∇u| is replaced by

  • |∇u|2 + β.

Parameter β > 0 is decreased between Newton steps (path-following). [Chan, Golub, Mulet, 1999] Semismooth Newton on a perturbed version of the optimality

  • conditions. [Hinterm¨

uller, Stadler, 2006] See also Hinterm¨ ller’s talk

  • n Monday: semismooth methods for the ℓ1-TV formulation.

SOCP [Goldfarb, Yin, 2005]. First-order method similar to gradient projection with fixed step size. [Chambolle, 2004]

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 11 / 34

slide-12
SLIDE 12

Variants of Gradient Projection

min

x∈X F(x),

where F(x) := 1 2Ax − λg2

2

GP methods choose αk and set xk(αk) := PX(xk − αk∇F(xk)), then choose γk ∈ (0, 1] and set xk+1 := xk + γk(xk(αk) − xk). Choosing αk and γk: αk ≡ α constant, converges for α < 0.25. Barzilai-Borwein formulae; cyclic variants; alternating variants that switches adaptively between the formulae. γk ≡ 1 (non-monotone) or γk minimizes F in [0, 1] (monotone).

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 12 / 34

slide-13
SLIDE 13

Sequential Quadratic Programming

Optimality conditions for the dual: There are Lagrange multipliers zl ∈ R, l = 1, 2, . . . , N, such that AT

l (Ax − λg) + 2zlxl = 0,

l = 1, 2, . . . , N, 0 ≤ zl ⊥ xl2 − 1 ≤ 0. At iteration k, define the active set Ak ⊂ {1, 2, . . . , N} as the l for which xk

l = 1 and the gradients points outward; do a Newton-like step on:

AT

l (Ax − λg) + 2xlzl = 0,

l = 1, 2, . . . , N, xl2

2 − 1 = 0,

l ∈ Ak, zl = 0, l / ∈ Ak. Using the Hessian approximation ATA ≈ α−1

k I leads to

∆xk

l = −

  • α−1

k

+ 2zk+1

l

−1 [∇F(xk)]l + 2zk+1

l

xk

l

  • ,

l = 1, 2, . . . , N.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 13 / 34

slide-14
SLIDE 14

Computational Results

Two images: SHAPE (128 × 128) and CAMERAMAN (256 × 256). Gaussian noise added with variance .01. λ = 0.045 for both examples. Tested many variants. Report here on Chambolle, with α ≡ .248 Nonmonotone GPBB Nonmonotone GBPP with SQP augmentation GPABB - alternating adaptively between BB formulae [Serafini, Zanghirati, Zanni, 2004] CGM with adaptively decreasing β. Convergence declared when relative duality gaps falls below tol.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 14 / 34

slide-15
SLIDE 15

Figure: SHAPE: original (left) and noisy (right)

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 15 / 34

slide-16
SLIDE 16

Figure: Denoised SHAPE: Tol=10−2 (left) and Tol=10−4 (right).

Little visual difference between loose and tight stopping criterion: “convergence in the eyeball norm.”

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 16 / 34

slide-17
SLIDE 17

SHAPE Results

tol=10−2 tol=10−3 tol=10−4 tol=10−5 Alg its time its time its time its time Chambolle 18 0.22 168 1.97 1054 12.3 7002 83.4 GPBB-NM 10 0.18 48 0.79 216 3.6 1499 25.9 GPCBBZ-NM 10 0.24 50 1.12 210 4.7 1361 31.5 GPABB 13 0.29 57 1.20 238 5.0 1014 22.6 CGM 6 5.95 10 10.00 13 12.9 18 19.4

Table: Runtimes (MATLAB on MacBook) for Denoising Algorithms

Nonmonotone GPBB generally reliable. Most GPBB variants dominate Chambolle. CGM becomes the fastest between 10−4 and 10−5.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 17 / 34

slide-18
SLIDE 18

10 20 30 40 50 60 70 80 10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

BB−NM Chambolle CGM

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 18 / 34

slide-19
SLIDE 19

Figure: CAMERAMAN: original (left) and noisy (right)

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 19 / 34

slide-20
SLIDE 20

Figure: Denoised CAMERAMAN: Tol=10−2 (left) and Tol=10−4 (right).

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 20 / 34

slide-21
SLIDE 21

CAMERAMAN Results

tol=10−2 tol=10−3 tol=10−4 tol=10−5 Alg its time its time its time its time Chambolle 27 1.07 163 6.46 827 32.8 3464 137.8 GPBB-NM 16 0.86 48 2.59 183 9.7 721 39.2 GPCBBZ-NM 16 1.17 53 3.91 202 14.6 729 53.8 GPABB 16 1.06 47 3.11 179 11.9 563 37.4 CGM 6 12.53 10 21.15 14 29.7 16 34.1

Table: Runtimes (MATLAB on Linux PC) for Denoising Algorithms

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 21 / 34

slide-22
SLIDE 22

Primal-Dual Approach

Recall that we seek a saddle point: min

v∈RN max x∈X ℓ(v, x) := xTATv + λ

2 v − g2

2

[Zhu, Chan 2008] suggest taking alternating gradient projection steps in primal and dual spaces (PDHG): xk+1 = PX(xk + τk∇xℓ(vk, xk)) vk+1 = vk − σk∇xℓ(vk, xk+1). The choice of stepengths τk > 0 and σk > 0 is nonintuitive, however: τk := (.2 + .8k)λ σk := (.5 − 1/(3 + .2k))/τk. Note that τk → ∞! But the projection keeps the steps small.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 22 / 34

slide-23
SLIDE 23

Discussion

This approach extends to deblurring: min

u

|∇u| dx + λ 2 Ku − f 2

2,

where K is a linear blurring operator. Min-max discretized problem: min

v∈RN max x∈X ℓ(v, x) := xTATv + λ

2 Bv − g2

2,

where B is discretization of K. When K is a convolution, can use DFTs to perform operations with B and BT. Convergence analysis? Not much known. Some connections with Russian literature on optimal descent methods for convex optimization and saddle point problems, but they are not convincing. The presence of the bounded set X and the steplength τk → ∞ are critical elements not easily acocunted for in general analysis.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 23 / 34

slide-24
SLIDE 24

Numerical Results

5 10 15 10

6

10

4

10

2

10 CPU Times (S) Relative Duality Gap PDHG Chambolle CGM 20 40 60 80 100 10

6

10

4

10

2

10 CPU Times (S) Relative Duality Gap PDHG Chambolle CGM 100 200 300 400 500 600 10

6

10

4

10

2

10 CPU Times (S) Relative Duality Gap PDHG Chambolle CGM

Figure: Duality gap of denoising algorithms, for three problems (1282, 2562,and 5122.)

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 24 / 34

slide-25
SLIDE 25

Graphical Processing Units

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 25 / 34

slide-26
SLIDE 26

Modern GPUs

GPUs have been envolved: For massive 3D graphics and real-time rendering. Programmability. Parallelization. Cheap!

At $250, GeForce 9800 GX2 provides 128 × 2 cores, 512 × 2MB memory with > 60GB/s bandwidth. In comparison, our PC (Dell Precision 5400) has 4 cores, 4GB of memory with 8GB/s bandwidth.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 26 / 34

slide-27
SLIDE 27

Why GPU? Faster Computation

!"#$% !"#$% &' &'(% (% )$ )$$# $#% &*( *(% +,-%

  • %

)$ )$$. $.% )$ )$$/ $/% )$ )$$0 $0% )$ )$$1 $1% )$ )$$2 $2% !"#/% !"#/% !".$% !".$% 30$% 30$% 304% 304% 32$% 32$% 35)% 35)% 36)$$% &*( *(% !78% 9'-% 9'-% 9':% &*( *(%

GT200 = GeForce GTX 280 G92 = GeForce 9800 GTX! G80!=!GeForce!8800!GTX G71!=!GeForce!7900!GTX! G70!=!GeForce!7800!GTX! NV40!=!GeForce!6800!Ultra! NV35!=!GeForce!FX!5950!Ultra! NV30!=!GeForce!FX!5800!

#;$%3 %3<=% >7-?)% >7-?)%@*7% @*7% #;)%3 %3<=% <'-,?-A7B(% 32$% 32$% CDA-'% CDA-'% Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 27 / 34

slide-28
SLIDE 28

Why GPU? Larger Bandwidth

!"#$% !"#$% !".$% !".$% 304% 304% 32$% 32$% 32$% 32$% CDA-'% CDA-'% !7 !7-AEB77F% G-?HI7AA%J %JJ% K77FI-?HA% <'-,?-A7B(% Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 28 / 34

slide-29
SLIDE 29

General Computation on GPUs

GPGPU - General Purpose Computation using GPUs. Trend: GPGPU using OpenGL API (2000∼). OpenGL is an industrial standard. Not designed for computation. GPGPU using vendor-specific softwares (2007∼). Software depends on a specific vendor. Better performance than using OpenGL. NVidia: CUDA GPGPU using OpenCL (2009∼?). An open-standard API for GPGPU.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 29 / 34

slide-30
SLIDE 30

GPU Internals

! Device

Multiprocessor N Multiprocessor 2 Multiprocessor 1 Device Memory Shared Memory Instruction Unit Processor 1 Registers

!

Processor 2 Registers Processor M Registers Constant Cache Texture Cache

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 30 / 34

slide-31
SLIDE 31

CUDA: Compute Unified Device Architecture

Example: Addition of two N × N matrices.

__global__ void matAdd(float A[N][N], float B[N][N], float C[N][N]) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; if (i < N && j < N) C[i][j] = A[i][j] + B[i][j]; } int main() { // Kernel invocation dim3 dimBlock(16, 16); dim3 dimGrid((N + dimBlock.x – 1) / dimBlock.x, (N + dimBlock.y – 1) / dimBlock.y); matAdd<<<dimGrid, dimBlock>>>(A, B, C); }

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 31 / 34

slide-32
SLIDE 32

Implementing PDHG on GPUs

The major operations required can be implemented efficiently on GPUs. Computation of ∇u, ∇ · w require regular data access patterns and

  • nly local transfers.

two-dimensional DFT and inverse DFT operates from the CUFFT Library can be used for the deblurring variant of PDHG. Level-1 BLAS operations supported: norms, inner products, vector additions. Single precision arithmetic on GPUs leads to diffences in numerics (e.g. number of iterations).

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 32 / 34

slide-33
SLIDE 33

Image Denoising

Image size Tol CPU GPU Speedup iters time (s) iters time (s) total iter 1282 1.e-2 11 0.03 11 0.02 2 2 1.e-4 79 0.21 79 0.02 11 11 1.e-6 338 0.90 329 0.07 14 13 2562 1.e-2 13 0.17 13 0.02 9 9 1.e-4 68 0.81 68 0.03 32 32 1.e-6 304 3.57 347 0.11 33 38 5122 1.e-2 12 0.95 12 0.03 31 31 1.e-4 54 3.96 54 0.05 76 76 1.e-6 222 16.08 238 0.19 84 90 10242 1.e-2 14 5.42 14 0.08 64 64 1.e-4 69 25.80 69 0.24 106 106 1.e-6 296 103.54 324 1.02 102 111 20482 1.e-2 13 31.41 13 0.28 114 114 1.e-4 67 149.24 67 0.90 165 165 1.e-6 319 694.16 338 4.12 169 179

Table: Computational results of image denoising (λ=0.041.)

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 33 / 34

slide-34
SLIDE 34

Image Deblurring

Image size Blur kernel CPU GPU Speedup iters time (s) iters time (s) total iter 1282 m-motion 31 0.15 31 0.02 6 6 s-motion 106 0.49 106 0.05 10 10 m-Gaussian 88 0.41 88 0.04 10 10 s-Gaussian 66 0.32 66 0.04 9 9 2562 m-motion 27 0.55 27 0.04 14 14 s-motion 79 1.57 79 0.08 20 20 m-Gaussian 44 0.88 44 0.05 17 17 s-Gaussian 39 0.79 39 0.05 17 17 5122 m-motion 34 3.94 34 0.14 28 28 s-motion 72 8.23 72 0.26 31 31 m-Gaussian 44 5.07 44 0.17 29 29 s-Gaussian 37 4.27 37 0.15 29 29 10242 m-motion 31 19.39 30 0.42 46 45 s-motion 75 46.00 74 0.95 48 48 m-Gaussian 44 27.07 44 0.59 46 46 s-Gaussian 41 24.76 41 0.55 45 45 20482 m-motion 33 113.38 33 2.31 49 49 s-motion 79 263.07 79 5.26 50 50 m-Gaussian 49 166.36 49 3.34 50 50 s-Gaussian 48 163.72 48 3.28 50 50

Table: Computational results of image deblurring. ‘m-’ denotes ‘mild’ and ‘s-’ denotes ‘severe’,

while ‘motion’ and ‘Gaussian’ indicate the type of blur kernel.

Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 34 / 34