Trust Regions in Large-Scale Optimization and Regularization - PowerPoint PPT Presentation

Trust Regions in Large-Scale Optimization and Regularization Marielba Rojas Department of Informatics and Mathematical Modelling Technical University of Denmark Visiting Delft University of Technology, The Netherlands GAMM Workshop on Applied and Numerical Linear Algebra Technische Universit¨ at Hamburg-Harburg Hamburg, Germany September 11-12, 2008 1

Part of this work is joint with Sandra A. Santos, Campinas, Brazil Danny C. Sorensen, Rice, USA Thanks Wake Forest University, CERFACS, and T.U. Delft. 2

Outline Trust Regions in Optimization Trust Regions in Regularization The Trust-Region Subproblem (TRS) Methods for the large-scale TRS Comparisons Applications Concluding Remarks 3

Trust Regions in Optimization 4

Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. 5

Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . 6

Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . Two strategies to move from x k to x k +1 = x k + d : Line Search and Trust Region. 7

Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . Two strategies to move from x k to x k +1 = x k + d : Line Search and Trust Region. Consider the following quadratic model of f at x k q k ( d ) = f ( x k ) + ∇ f ( x k ) T d + 1 2 d T Hd , where H is a symmetric matrix. 8

Unconstrained Optimization Line Search Methods : 9

Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . 10

Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . Trust-Region Methods : 11

Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . Trust-Region Methods : R n s . t . � d � ≤ ∆ k , ∆ k > 0 } . Find a minimizer of q k in { d ∈ I R n s . t . � d � ≤ ∆ k } is the trust region: { d ∈ I a region where we trust the model q k to be a good representation of f . ∆ k is the trust-region radius. d k is the step. Do not require positive definite H . 12

Unconstrained Optimization Remarks: 13

Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. 14

Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust. 15

Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust. The Levenberg-Marquardt Method ( 1944, 1963 ) for nonlinear least squares problems is considered as the first trust-region method ( Mor´ e 1978 ). 16

Trust-Region Methods Given x 0 and ∆ 0 begin k := 1; ∆ := ∆ 0 ; repeat set d k as a solution to min q k ( d ) s.t. � d � ≤ ∆; ̺ := f ( x k ) − f ( x k +1 ) q k (0) − q k ( d k ) ; % gain factor if ̺ > 0 . 75 ∆ := 2 ∗ ∆; end if ̺ < 0 . 25 ∆ := ∆ / 3; end if ̺ > 0 x k := x k − 1 + d k ; end k := k + 1; until convergence end 17

Trust-Region Methods Main calculation per iteration: Trust-Region Subproblem (TRS) min 2 d T Hd + g T d 1 s . t . � d � ≤ ∆ where: g = ∇ f ( x k ). H is a symmetric matrix, usually an approximation to ∇ 2 f ( x k ). ∆ > 0. 18

Trust Regions in Regularization 19

Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. 20

Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. is equivalent to ( see Eld´ en 1977 ) 21

Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. is equivalent to ( see Eld´ en 1977 ) 1 2 � Ax − b � 2 min ( TRS ) 2 s . t . � x � 2 ≤ ∆ where ∆ > 0, plays the role of the regularization parameter. 22

Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. 23

Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I 24

Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: 25

Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words). 26

Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words). Example 2: min 2 � Ax − b � 2 s . t . � x � 2 ≤ ∆ , x ≥ 0. 1 Could be solved with a trust-region-based method ( R & Steihaug 2002 ). 27

TRS in Optimization and Regularization Optimization Regularization Several TRS Linear: One TRS Nonlinear, Constrained: Several TRS (potential) Hard Case (potential) Hard Case (Near HC) not common likely 28

The Trust-Region Subproblem 29

The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. 30

The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. In optimization: H ≈ ∇ 2 f ( x k ), g = ∇ f ( x k ). 31

The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. In optimization: H ≈ ∇ 2 f ( x k ), g = ∇ f ( x k ). In (linear) regularization: H = A T A , g = − A T b . 32

Characterization of solutions. Gay 1981, Sorensen 1982. x ∗ with � x ∗ � ≤ ∆ is a solution of TRS with Lagrange multiplier λ ∗ , if and only if (i) ( H − λ ∗ I ) x ∗ = − g . (ii) H − λ ∗ I positive semidefinite. (iii) λ ∗ ≤ 0. (iv) λ ∗ ( � x ∗ � − ∆) = 0 33

Characterization of solutions. Gay 1981, Sorensen 1982. x ∗ with � x ∗ � ≤ ∆ is a solution of TRS with Lagrange multiplier λ ∗ , if and only if (i) ( H − λ ∗ I ) x ∗ = − g . (ii) H − λ ∗ I positive semidefinite. (iii) λ ∗ ≤ 0. (iv) λ ∗ ( � x ∗ � − ∆) = 0 Remark: � x � − ∆ = 0 is the secular equation. 34

Trust Regions in Large-Scale Optimization and Regularization - PowerPoint PPT Presentation

Trust Regions in Large-Scale Optimization and Regularization Marielba Rojas Department of Informatics and Mathematical Modelling Technical University of Denmark Visiting Delft University of Technology, The Netherlands GAMM Workshop on Applied

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

regions and cities the role of the European Committee of the Regions Startup Europe Regions

Networks and large scale optimization Open Data Science Conference Boston, May 2018 Sam Safavi

Optimization for data processing at a large scale Sparsity4PSL Summer School Emilie Chouzenoux

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

LSTRS 1.2: MATLAB Software for Large-Scale Trust-Regions Subproblems and Regularization Marielba

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard

Dynamics, robustness and fragility Private trust Public trust of trust Conclusions Dusko

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

Trust But Verify Trust But Verify Trust But Verify Trust But Verify What Is CEC Entertainment?

Gods stories Gods stories Trust Trust To Rely Upon Something Totally Trust trust:

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

The Geographic Regions of the US and NC The Geography and 4 Regions of NC NC Geographic

A guaranteed a posteriori error estimator for certified boundary variation algorithm Matteo

Mathematical modeling in spectroscopic and hybrid tissue property imaging Habib Ammari

Recovering discontinuous conductivity from internal current : case of the ultrasonically-induced

Electrical impedance tomography with punctual electrodes Fabrice Delbary 1 , Rainer Kre 1 1

Mi Microcontr troller r Archi hitectur tures and nd Ge General al-Pu Purpose I/O

5 7 pm in ACS 130 Midterm Friday, February 14 No tutorials this week No

Two identical small charged spheres, each having a mass of 3x10 -2 kg, hang in equilibrium. The

Calculate Change in Potential Energy If you move a small test charge (so small that it will not