First-Order Algorithms for Approximate TV-Regularized Image - PowerPoint PPT Presentation

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34

Motivation and Introduction 1 Image Processing: Denoising 2 Primal-Dual Methods (Zhu and Chan) 3 GPU Implementations 4 +Mingqiang Zhu, Tony Chan (Image Denoising), Sangkyun Lee (GPU Implementation) Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 2 / 34

Motivation Many applications give rise to optimization problems for which simple, approximate solutions are required, rather than complex exact solutions. Occam’s Razor Data quality doesn’t justify exactness Possibly more robust to data perturbations (not “overoptimized”) Easier to actuate / implement / store simple solutions Conforms better to prior knowledge. When formulated with variable x ∈ R n , simplicity is often manifested as sparsity in x (few nonzero components) or in a simple transformation of x . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 3 / 34

Formulating and Solving Two basic ingredients: Underlying optimization problem — often of data-fitting type Regularization term or constraints to “encourage” sparsity — often nonsmooth. Usually very large problems. Need techniques from Large-scale optimization Nonsmooth optimization Inverse Problems, PDEs, ... Domain- and application-specific knowledge. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 4 / 34

Image Processing: TV Denoising Rudin-Osher-Fatemi (ROF) model ( ℓ 2 -TV). Given a domain Ω ⊂ R 2 and an observed image f : Ω → R , seek a restored image u : Ω → R that preserves edges while removing noise. The regularized image u can typically be stored more economically. Seek to “minimize” both � u − f � 2 and � the total-variation (TV) norm Ω |∇ u | dx . Use constrained formulations, or a weighting of the two objectives: � |∇ u | dx + λ 2 � u − f � 2 min P ( u ) := 2 . u Ω The minimizing u tends to have regions in which u is constant ( ∇ u = 0). More “cartoon-like” when λ is small. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 5 / 34

TV-Regularized Image Denoising � |∇ u | dx + λ 2 � u − f � 2 min P ( u ) := 2 . u Ω Difficult to apply gradient-projection-type approaches (like GPSR or SpaRSA for compressed sensing) as: In the constrained formulation with a feasible set that allows easy � projection (needed for GPSR) - TV is more complicated than Ω | u | . The SpaRSA subproblem has the same form as the original problem (since A T A = λ I ) and hence just as hard to solve. However, if we discretize and take the dual we obtain a problem amenable to gradient-projection approaches. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 6 / 34

Dual Formulation Redefine the TV seminorm: � � � |∇ u | = max ∇ u · w = max − u ∇ · w , w ∈ C 1 | w |≤ 1 0 (Ω) , | w |≤ 1 Ω Ω Ω where w : Ω → R 2 . Rewrite primal formulation as � − u ∇ · w + λ 2 � u − f � 2 min max 2 . u w ∈ C 1 0 (Ω) , | w |≤ 1 Ω Exchange min and max, and do the inner minimization wrt u explicitly: u = f + 1 λ ∇ · w . Thus obtain the dual: � � 2 � � 0 (Ω) , | w |≤ 1 D ( w ) := λ 1 � f � 2 � � max 2 − λ ∇ · w + f . � � 2 w ∈ C 1 � � 2 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 7 / 34

Discretization Assume Ω = [0 , 1] × [0 , 1], discretization with an n × n regular grid, where u ij approximates u at � ( i − 1) / ( n − 1) � ∈ Ω , i , j = 1 , 2 , . . . , n . ( j − 1) / ( n − 1) The discrete approximation to the TV norm is thus � TV ( u ) = � ( ∇ u ) i , j � , 1 ≤ i , j , ≤ n where � u i +1 , j − u i , j if i < n ( ∇ u ) 1 i , j = 0 if i = n � u i , j +1 − u i , j if j < n ( ∇ u ) 2 i , j = 0 if j = n . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 8 / 34

By reorganizing the N = n 2 components of u into a vector v ∈ R N , and f into a vector g ∈ R N , we write the discrete primal ROF model as N l v � 2 + λ � � A T 2 � v − g � 2 min 2 , v l =1 where A l is an N × 2 matrix with at most 4 nonzero entries (+1 or − 1). Introduce a vector representation x ∈ R 2 N of w : Ω → R 2 . Obtain the discrete dual ROF (scaled and shifted): 1 2 � Ax − λ g � 2 min 2 x ∈ X X := { ( x 1 ; x 2 ; . . . ; x N ) ∈ R 2 N : x l ∈ R 2 , where � x l � 2 ≤ 1 for all l = 1 , 2 , . . . , N } , where A = [ A 1 , A 2 , . . . , A N ] ∈ R N × 2 N . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 9 / 34

Set X ⊂ R 2 N is a Cartesian product of N unit circles in R 2 . Projections onto X are trivial. Can apply gradient projection ideas. (Curvature of the boundaries of X adds some interesting twists.) The discrete primal-dual solution ( v , x ) is a saddle point of ℓ ( v , x ) := x T A T v + λ 2 � v − g � 2 2 on the space R N × X . Since the discrete primal is strictly convex, we have: Proposition. Let { x k } be any sequence in X whose accumulation points are all stationary for the dual problem. Then { v k } defined by v k = g − 1 λ Ax k converges to the unique solution of the primal problem. The required property of { x k } holds for any reasonable gradient projection algorithm. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 10 / 34

Other Methods Embedding in a parabolic PDE [ROF, 1992] Apply Newton-like method to the optimality conditions for a |∇ u | 2 + β . � smoothed version, in which |∇ u | is replaced by Parameter β > 0 is decreased between Newton steps (path-following). [Chan, Golub, Mulet, 1999] Semismooth Newton on a perturbed version of the optimality uller, Stadler, 2006] See also Hinterm¨ conditions. [Hinterm¨ ller’s talk on Monday: semismooth methods for the ℓ 1 -TV formulation. SOCP [Goldfarb, Yin, 2005]. First-order method similar to gradient projection with fixed step size. [Chambolle, 2004] Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 11 / 34

Variants of Gradient Projection F ( x ) := 1 2 � Ax − λ g � 2 min x ∈ X F ( x ) , where 2 GP methods choose α k and set x k ( α k ) := P X ( x k − α k ∇ F ( x k )) , then choose γ k ∈ (0 , 1] and set x k +1 := x k + γ k ( x k ( α k ) − x k ) . Choosing α k and γ k : α k ≡ α constant, converges for α < 0 . 25. Barzilai-Borwein formulae; cyclic variants; alternating variants that switches adaptively between the formulae. γ k ≡ 1 (non-monotone) or γ k minimizes F in [0 , 1] (monotone). Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 12 / 34

Sequential Quadratic Programming Optimality conditions for the dual: There are Lagrange multipliers z l ∈ R , l = 1 , 2 , . . . , N , such that A T l ( Ax − λ g ) + 2 z l x l = 0 , l = 1 , 2 , . . . , N , 0 ≤ z l ⊥ � x l � 2 − 1 ≤ 0 . At iteration k , define the active set A k ⊂ { 1 , 2 , . . . , N } as the l for which � x k l � = 1 and the gradients points outward; do a Newton-like step on: A T l ( Ax − λ g ) + 2 x l z l = 0 , l = 1 , 2 , . . . , N , � x l � 2 2 − 1 = 0 , l ∈ A k , z l = 0 , l / ∈ A k . Using the Hessian approximation A T A ≈ α − 1 k I leads to � − 1 � � � ∆ x k α − 1 + 2 z k +1 [ ∇ F ( x k )] l + 2 z k +1 x k l = − , l = 1 , 2 , . . . , N . l k l l Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 13 / 34

Computational Results Two images: SHAPE (128 × 128) and CAMERAMAN (256 × 256). Gaussian noise added with variance . 01. λ = 0 . 045 for both examples. Tested many variants. Report here on Chambolle, with α ≡ . 248 Nonmonotone GPBB Nonmonotone GBPP with SQP augmentation GPABB - alternating adaptively between BB formulae [Serafini, Zanghirati, Zanni, 2004] CGM with adaptively decreasing β . Convergence declared when relative duality gaps falls below tol . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 14 / 34

Figure: SHAPE: original (left) and noisy (right) Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 15 / 34

Figure: Denoised SHAPE: Tol=10 − 2 (left) and Tol=10 − 4 (right). Little visual difference between loose and tight stopping criterion: “convergence in the eyeball norm.” Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 16 / 34

SHAPE Results tol=10 − 2 tol=10 − 3 tol=10 − 4 tol=10 − 5 Alg its time its time its time its time Chambolle 18 0.22 168 1.97 1054 12.3 7002 83.4 GPBB-NM 10 0.18 48 0.79 216 3.6 1499 25.9 GPCBBZ-NM 10 0.24 50 1.12 210 4.7 1361 31.5 GPABB 13 0.29 57 1.20 238 5.0 1014 22.6 CGM 6 5.95 10 10.00 13 12.9 18 19.4 Table: Runtimes (MATLAB on MacBook) for Denoising Algorithms Nonmonotone GPBB generally reliable. Most GPBB variants dominate Chambolle. CGM becomes the fastest between 10 − 4 and 10 − 5 . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 17 / 34

7 10 BB−NM Chambolle CGM 6 10 5 10 4 10 3 10 2 10 1 10 0 10 0 10 20 30 40 50 60 70 80 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 18 / 34

First-Order Algorithms for Approximate TV-Regularized Image - PowerPoint PPT Presentation

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34 Motivation and

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

3. First-Order Theories 3- 1 First-Order Theories First-order theory T defined by Signature

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

Using first order logic (Ch. 8-9) Review: First order logic In first order logic, we have objects

First Order Logic: First-order resolution. Valentin Goranko DTU Informatics September 2010 V

First-order logic 6 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 6 1 6 First-Order Logic

1 Feature Extraction and Description Visual Vocabulary Construction From a database of

Designs of Orthogonal Filter Banks and Orthogonal Cosine-Modulated Filter Banks Jie Yan

A two-step sequential linear programming algorithm for MINLP problems: An application to gas

1/31/2007 Massachusetts Institute of Technology Context Model Selection Finite Horizon

Module 9 LAO* CS 886 Sequential Decision Making and Reinforcement Learning University of

Lecture 4 Homework Hw 1 and 2 will be reoped after class for every body. New deadline

Searching Sorting and Searching arrays Given an array of ints find the index of the

Searching Algorithms by Dharmin Shah and Jeff Carter CSSE 221-02 Fundamentals of Software