Trust Regions in Large-Scale Optimization and Regularization - - PowerPoint PPT Presentation

trust regions in large scale optimization and
SMART_READER_LITE
LIVE PREVIEW

Trust Regions in Large-Scale Optimization and Regularization - - PowerPoint PPT Presentation

Trust Regions in Large-Scale Optimization and Regularization Marielba Rojas Department of Informatics and Mathematical Modelling Technical University of Denmark Visiting Delft University of Technology, The Netherlands GAMM Workshop on Applied


slide-1
SLIDE 1

Trust Regions in Large-Scale Optimization and Regularization

Marielba Rojas

Department of Informatics and Mathematical Modelling Technical University of Denmark Visiting Delft University of Technology, The Netherlands

GAMM Workshop on Applied and Numerical Linear Algebra Technische Universit¨ at Hamburg-Harburg Hamburg, Germany September 11-12, 2008

1

slide-2
SLIDE 2

Part of this work is joint with Sandra A. Santos, Campinas, Brazil Danny C. Sorensen, Rice, USA Thanks Wake Forest University, CERFACS, and T.U. Delft.

2

slide-3
SLIDE 3

Outline

Trust Regions in Optimization Trust Regions in Regularization The Trust-Region Subproblem (TRS) Methods for the large-scale TRS Comparisons Applications Concluding Remarks

3

slide-4
SLIDE 4

Trust Regions in Optimization

4

slide-5
SLIDE 5

Unconstrained Optimization

min f (x)

x∈I Rn

where f (x) is a nonlinear, twice continuously-differentiable function.

5

slide-6
SLIDE 6

Unconstrained Optimization

min f (x)

x∈I Rn

where f (x) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x0, x1, . . . , xk such that f (xk+1) < f (xk). Each xk minimizes a simple (linear, quadratic) model of f .

6

slide-7
SLIDE 7

Unconstrained Optimization

min f (x)

x∈I Rn

where f (x) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x0, x1, . . . , xk such that f (xk+1) < f (xk). Each xk minimizes a simple (linear, quadratic) model of f . Two strategies to move from xk to xk+1 = xk + d: Line Search and Trust Region.

7

slide-8
SLIDE 8

Unconstrained Optimization

min f (x)

x∈I Rn

where f (x) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x0, x1, . . . , xk such that f (xk+1) < f (xk). Each xk minimizes a simple (linear, quadratic) model of f . Two strategies to move from xk to xk+1 = xk + d: Line Search and Trust Region. Consider the following quadratic model of f at xk qk(d) = f (xk) + ∇f (xk)Td + 1

2d THd,

where H is a symmetric matrix.

8

slide-9
SLIDE 9

Unconstrained Optimization

Line Search Methods:

9

slide-10
SLIDE 10

Unconstrained Optimization

Line Search Methods:

Find the minimizer dk of the convex quadratic qk(d). Search along dk for a suitable step length α. αdk is the step. Require positive definite H.

10

slide-11
SLIDE 11

Unconstrained Optimization

Line Search Methods:

Find the minimizer dk of the convex quadratic qk(d). Search along dk for a suitable step length α. αdk is the step. Require positive definite H.

Trust-Region Methods:

11

slide-12
SLIDE 12

Unconstrained Optimization

Line Search Methods:

Find the minimizer dk of the convex quadratic qk(d). Search along dk for a suitable step length α. αdk is the step. Require positive definite H.

Trust-Region Methods:

Find a minimizer of qk in {d ∈ I Rns.t.d ≤ ∆k, ∆k > 0}. {d ∈ I Rns.t.d ≤ ∆k} is the trust region: a region where we trust the model qk to be a good representation of f . ∆k is the trust-region radius. dk is the step. Do not require positive definite H.

12

slide-13
SLIDE 13

Unconstrained Optimization

Remarks:

13

slide-14
SLIDE 14

Unconstrained Optimization

Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point.

14

slide-15
SLIDE 15

Unconstrained Optimization

Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust.

15

slide-16
SLIDE 16

Unconstrained Optimization

Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust. The Levenberg-Marquardt Method (1944, 1963) for nonlinear least squares problems is considered as the first trust-region method (Mor´

e 1978).

16

slide-17
SLIDE 17

Trust-Region Methods

Given x0 and ∆0 begin k := 1; ∆ := ∆0; repeat set dk as a solution to min qk(d) s.t. d ≤ ∆; ̺ := f (xk)−f (xk+1)

qk(0)−qk(dk) ;

% gain factor if ̺ > 0.75 ∆ := 2 ∗ ∆; end if ̺ < 0.25 ∆ := ∆/3; end if ̺ > 0 xk := xk−1 + dk; end k := k + 1; until convergence end

17

slide-18
SLIDE 18

Trust-Region Methods

Main calculation per iteration: Trust-Region Subproblem (TRS) min

1 2d THd + g Td

s.t. d ≤ ∆ where: g = ∇f (xk). H is a symmetric matrix, usually an approximation to ∇2f (xk). ∆ > 0.

18

slide-19
SLIDE 19

Trust Regions in Regularization

19

slide-20
SLIDE 20

Regularization: Linear

Tikhonov Regularization: min 1 2Ax − b2

2 + λx2 2 x∈I Rn

A ∈ I Rm×n, m ≥ n large, from ill-posed problems. b ∈ I Rm, containing noise, and ATb = 0. λ > 0 is the Tikhonov regularization parameter.

20

slide-21
SLIDE 21

Regularization: Linear

Tikhonov Regularization: min 1 2Ax − b2

2 + λx2 2 x∈I Rn

A ∈ I Rm×n, m ≥ n large, from ill-posed problems. b ∈ I Rm, containing noise, and ATb = 0. λ > 0 is the Tikhonov regularization parameter.

is equivalent to (see Eld´

en 1977)

21

slide-22
SLIDE 22

Regularization: Linear

Tikhonov Regularization: min 1 2Ax − b2

2 + λx2 2 x∈I Rn

A ∈ I Rm×n, m ≥ n large, from ill-posed problems. b ∈ I Rm, containing noise, and ATb = 0. λ > 0 is the Tikhonov regularization parameter.

is equivalent to (see Eld´

en 1977)

min 1 2Ax − b2

2

(TRS) s.t.

x2≤∆

where ∆ > 0, plays the role of the regularization parameter.

22

slide-23
SLIDE 23

Regularization: Nonlinear, Constrained

min f (x) + λg(x)

x∈S

where f , g are nonlinear functions, S ⊂ I Rn, and λ is a regularization parameter.

23

slide-24
SLIDE 24

Regularization: Nonlinear, Constrained

min f (x) + λg(x)

x∈S

where f , g are nonlinear functions, S ⊂ I Rn, and λ is a regularization parameter. Example 1: min F(x)2

2 + λx2 2 s.t. x ∈ I

Rn, F : I Rn → I Rm.

24

slide-25
SLIDE 25

Regularization: Nonlinear, Constrained

min f (x) + λg(x)

x∈S

where f , g are nonlinear functions, S ⊂ I Rn, and λ is a regularization parameter. Example 1: min F(x)2

2 + λx2 2 s.t. x ∈ I

Rn, F : I Rn → I Rm. Could be solved with a trust-region method:

25

slide-26
SLIDE 26

Regularization: Nonlinear, Constrained

min f (x) + λg(x)

x∈S

where f , g are nonlinear functions, S ⊂ I Rn, and λ is a regularization parameter. Example 1: min F(x)2

2 + λx2 2 s.t. x ∈ I

Rn, F : I Rn → I Rm. Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words).

26

slide-27
SLIDE 27

Regularization: Nonlinear, Constrained

min f (x) + λg(x)

x∈S

where f , g are nonlinear functions, S ⊂ I Rn, and λ is a regularization parameter. Example 1: min F(x)2

2 + λx2 2 s.t. x ∈ I

Rn, F : I Rn → I Rm. Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words). Example 2: min

1 2Ax − b2 s.t. x2 ≤ ∆, x ≥ 0.

Could be solved with a trust-region-based method (R & Steihaug 2002).

27

slide-28
SLIDE 28

TRS in Optimization and Regularization

Optimization Regularization Several TRS Linear: One TRS Nonlinear, Constrained: Several TRS (potential) Hard Case (potential) Hard Case (Near HC) not common likely

28

slide-29
SLIDE 29

The Trust-Region Subproblem

29

slide-30
SLIDE 30

The Trust-Region Subproblem (TRS)

min 1 2x THx + g Tx s.t.

x≤∆

H ∈ I Rn×n, H = HT, n large. g ∈ I Rn, g = 0. ∆ > 0. · is the Euclidean norm.

30

slide-31
SLIDE 31

The Trust-Region Subproblem (TRS)

min 1 2x THx + g Tx s.t.

x≤∆

H ∈ I Rn×n, H = HT, n large. g ∈ I Rn, g = 0. ∆ > 0. · is the Euclidean norm. In optimization: H ≈ ∇2f (xk), g = ∇f (xk).

31

slide-32
SLIDE 32

The Trust-Region Subproblem (TRS)

min 1 2x THx + g Tx s.t.

x≤∆

H ∈ I Rn×n, H = HT, n large. g ∈ I Rn, g = 0. ∆ > 0. · is the Euclidean norm. In optimization: H ≈ ∇2f (xk), g = ∇f (xk). In (linear) regularization: H = ATA, g = −ATb.

32

slide-33
SLIDE 33

Characterization of solutions. Gay 1981, Sorensen 1982.

x∗ with x∗ ≤ ∆ is a solution of TRS with Lagrange multiplier λ∗, if and only if

(i) (H − λ∗I)x∗ = −g. (ii) H − λ∗I positive semidefinite. (iii) λ∗ ≤ 0. (iv) λ∗ (x∗ − ∆) = 0

33

slide-34
SLIDE 34

Characterization of solutions. Gay 1981, Sorensen 1982.

x∗ with x∗ ≤ ∆ is a solution of TRS with Lagrange multiplier λ∗, if and only if

(i) (H − λ∗I)x∗ = −g. (ii) H − λ∗I positive semidefinite. (iii) λ∗ ≤ 0. (iv) λ∗ (x∗ − ∆) = 0

Remark: x − ∆ = 0 is the secular equation.

34

slide-35
SLIDE 35

Solutions to TRS

Notation:

δ1 ≤ δ2 ≤ . . . ≤ δn are the eigenvalues of H. S1 is the eigenspace associated with δ1, the smallest eigenvalue of H.

35

slide-36
SLIDE 36

One Interior Solution (standard case)

36

slide-37
SLIDE 37

One Interior Solution (standard case)

H positive definite and ∆ > H−1g.

37

slide-38
SLIDE 38

One Interior Solution (standard case)

H positive definite and ∆ > H−1g. Solution is x = −H−1g, λ = 0.

38

slide-39
SLIDE 39

One Interior Solution (standard case)

H positive definite and ∆ > H−1g. Solution is x = −H−1g, λ = 0. λ = 0 ⇒ constraint is not active.

39

slide-40
SLIDE 40

One Interior Solution (standard case)

H positive definite and ∆ > H−1g. Solution is x = −H−1g, λ = 0. λ = 0 ⇒ constraint is not active.

40

slide-41
SLIDE 41

One Boundary Solution (standard case)

41

slide-42
SLIDE 42

One Boundary Solution (standard case)

H positive definite and ∆ ≤ H−1g, or

42

slide-43
SLIDE 43

One Boundary Solution (standard case)

H positive definite and ∆ ≤ H−1g, or H indefinite or positive semidefinite and singular, and ∆ ≤ (H − δ1I)†g.

43

slide-44
SLIDE 44

One Boundary Solution (standard case)

H positive definite and ∆ ≤ H−1g, or H indefinite or positive semidefinite and singular, and ∆ ≤ (H − δ1I)†g. Solution is x = −(H − λI)−1g, λ < δ1.

44

slide-45
SLIDE 45

One Boundary Solution (standard case)

H positive definite and ∆ ≤ H−1g, or H indefinite or positive semidefinite and singular, and ∆ ≤ (H − δ1I)†g. Solution is x = −(H − λI)−1g, λ < δ1. λ satisfies secular equation g T(H − λI)−2g − ∆2 = 0.

45

slide-46
SLIDE 46

One Boundary Solution (standard case)

H positive definite and ∆ ≤ H−1g, or H indefinite or positive semidefinite and singular, and ∆ ≤ (H − δ1I)†g. Solution is x = −(H − λI)−1g, λ < δ1. λ satisfies secular equation g T(H − λI)−2g − ∆2 = 0.

46

slide-47
SLIDE 47

Multiple Boundary Solutions (hard case)

47

slide-48
SLIDE 48

Multiple Boundary Solutions (hard case)

H indefinite, g ⊥ S1, and ∆ > (H − δ1I)†g.

48

slide-49
SLIDE 49

Multiple Boundary Solutions (hard case)

H indefinite, g ⊥ S1, and ∆ > (H − δ1I)†g. Solutions are x = −(H − δ1I)†g + z, λ = δ1, with z ∈ S1 such that x = ∆.

49

slide-50
SLIDE 50

Multiple Boundary Solutions (hard case)

H indefinite, g ⊥ S1, and ∆ > (H − δ1I)†g. Solutions are x = −(H − δ1I)†g + z, λ = δ1, with z ∈ S1 such that x = ∆. g ⊥ S1: potential hard case.

50

slide-51
SLIDE 51

Multiple Boundary Solutions (hard case)

H indefinite, g ⊥ S1, and ∆ > (H − δ1I)†g. Solutions are x = −(H − δ1I)†g + z, λ = δ1, with z ∈ S1 such that x = ∆. g ⊥ S1: potential hard case.

51

slide-52
SLIDE 52

Multiple Interior and Boundary Solutions (hard case)

52

slide-53
SLIDE 53

Multiple Interior and Boundary Solutions (hard case)

H positive semidefinite and singular, g ⊥ S1, and ∆ > H†g.

53

slide-54
SLIDE 54

Multiple Interior and Boundary Solutions (hard case)

H positive semidefinite and singular, g ⊥ S1, and ∆ > H†g. Solutions are x = −H†g + z, λ = 0, with z ∈ S1 such that x ≤ ∆.

54

slide-55
SLIDE 55

Multiple Interior and Boundary Solutions (hard case)

H positive semidefinite and singular, g ⊥ S1, and ∆ > H†g. Solutions are x = −H†g + z, λ = 0, with z ∈ S1 such that x ≤ ∆.

55

slide-56
SLIDE 56

Linear Ill-Posed Problems: potential (near) HC (g ≈⊥ S1)

Problem heat from P.C. Hansen’s Regularization Tools. m = n = 1000; ◦ QTg; * δi.

56

slide-57
SLIDE 57

Methods for Large-Scale TRS

57

slide-58
SLIDE 58

Characterization of solutions. Gay 1981, Sorensen 1982.

x∗ with x∗ ≤ ∆ is a solution of TRS with Lagrange multiplier λ∗, if and only if

(i) (H − λ∗I)x∗ = −g. (ii) H − λ∗I positive semidefinite. (iii) λ∗ ≤ 0. (iv) λ∗ (x∗ − ∆) = 0.

Remark: x − ∆ = 0 is the secular equation.

58

slide-59
SLIDE 59

Methods for Large-Scale TRS

59

slide-60
SLIDE 60

Methods for Large-Scale TRS

Approximate:

60

slide-61
SLIDE 61

Methods for Large-Scale TRS

Approximate:

Steihaug 1983. GLTR: Gould et al. 1999. SSM: Hager 2001. Return a point on the boundary of the trust region.

61

slide-62
SLIDE 62

Methods for Large-Scale TRS

Approximate:

Steihaug 1983. GLTR: Gould et al. 1999. SSM: Hager 2001. Return a point on the boundary of the trust region.

Nearly-Exact:

62

slide-63
SLIDE 63

Methods for Large-Scale TRS

Approximate:

Steihaug 1983. GLTR: Gould et al. 1999. SSM: Hager 2001. Return a point on the boundary of the trust region.

Nearly-Exact:

Mor´ e and Sorensen 1983: Newton’s method on 1 xλ2 − 1 ∆2 = 0. Golub and von Matt 1991. Moments, quadrature, Lanczos bidiagonalization to compute lower and upper bounds for xλ

2, ∆ < H†g.

Sorensen 1997. SDP: Rendl and Wolkowicz 1997, Fortin and Wolkowicz 2004. LSTRS: R, Santos and Sorensen 2000, 2008. Rational interpolation + parameterized eigenvalue problems.

63

slide-64
SLIDE 64

Secular Functions and Equations

Let H = Q diag(δ1, δ2, . . . , δn) QT and γ = QTg.

64

slide-65
SLIDE 65

Secular Functions and Equations

Let H = Q diag(δ1, δ2, . . . , δn) QT and γ = QTg. Suppose x ∈ I Rn such that (H − λI)x = −g.

65

slide-66
SLIDE 66

Secular Functions and Equations

Let H = Q diag(δ1, δ2, . . . , δn) QT and γ = QTg. Suppose x ∈ I Rn such that (H − λI)x = −g. Define φ(λ) = −g Tx =

n

  • i=1

γ2

i

δi − λ

66

slide-67
SLIDE 67

Secular Functions and Equations

Let H = Q diag(δ1, δ2, . . . , δn) QT and γ = QTg. Suppose x ∈ I Rn such that (H − λI)x = −g. Define φ(λ) = −g Tx =

n

  • i=1

γ2

i

δi − λ Then φ′(λ) = x Tx.

67

slide-68
SLIDE 68

Secular Functions and Equations

Let H = Q diag(δ1, δ2, . . . , δn) QT and γ = QTg. Suppose x ∈ I Rn such that (H − λI)x = −g. Define φ(λ) = −g Tx =

n

  • i=1

γ2

i

δi − λ Then φ′(λ) = x Tx. Secular Equation: φ′(λ) = ∆2.

68

slide-69
SLIDE 69

Secular Equations - standard case

φ′(λ) φ(λ)

69

slide-70
SLIDE 70

Secular Equations - hard case

φ′(λ) φ(λ)

70

slide-71
SLIDE 71

Potential (near) hard case: g ≈⊥ S1

71

slide-72
SLIDE 72

Ill-posed problems: g ≈⊥ Si, i = 1, 2, . . . , k

72

slide-73
SLIDE 73

LSTRS - standard case

−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0.5 −2 −1 1 2 3 4 5

α*−λ φ(λ) φ(λ*) + ∆2λ

λ*

λ φ(λ), α−λ

73

slide-74
SLIDE 74

LSTRS - (near) hard case

−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

λ φ(λ), α−λ

λk λk−1 λk+1

αk+1−λ φ(λ) φ(λ)

74

slide-75
SLIDE 75

Comparisons

SSM - Sequential Subspace Method. Hager 2001. SDP - Semidefinite Programming approach. Rendl and Wolkowicz 1997, Fortin and Wolkowicz 2004. GLTR - Generalized Lanczos Trust Region method. Gould, Lucidi, Roma, and Toint 1999. LSTRS. R, Santos and Sorensen 2000, 2008.

75

slide-76
SLIDE 76

Comparisons

SSM - Sequential Subspace Method. Hager 2001. SDP - Semidefinite Programming approach. Rendl and Wolkowicz 1997, Fortin and Wolkowicz 2004. GLTR - Generalized Lanczos Trust Region method. Gould, Lucidi, Roma, and Toint 1999. LSTRS. R, Santos and Sorensen 2000, 2008. All: matrix-free. LSTRS, SDP, SSM: limited-memory.

76

slide-77
SLIDE 77

Average results for 2-D Laplacian, n = 1024. Easy Case.

METHOD MVP STORAGE

(H − λ I)x+g g

LSTRS 127.1 10 2.32 ×10−6 SSM 67.3 10 9.53×10−7 SSMd 67.3 10 9.53×10−7 SDP 595 10 3.17×10−5 GLTR 81.6 41.3 8.56×10−6

77

slide-78
SLIDE 78

Average results for 2-D Laplacian, n = 1024. Hard Case.

METHOD MVP STORAGE

(H − λ I)x+g g

LSTRS 252.6 10 6.91 ×10−6 SSM 377.9 10 1.42×10−6 SSMd 377.9 10 1.42×10−6 SDP 2023.8 10 5.76×10−2 GLTR 151.8 76.4 8.37×10−6

78

slide-79
SLIDE 79

Inverse Heat Equation, n = 1000. Mildly Ill-Posed.

METHOD MVP STORAGE

(H − λ I)x+g g x−xIP xIP

LSTRS 265 8 9.12×10−6 6.13×10−4 SSM 700 8 2.99×10−9 2.41×10−4 SSMd 649 8 2.74×10−9 4.57×10−4 SDP 5700 8 2.73×10−7 3.63×10−4

79

slide-80
SLIDE 80

Inverse Heat Equation, n = 1000. Severely Ill-Posed.

METHOD MVP STORAGE

(H − λ I)x+g g x−xIP xIP

LSTRS 552 8 7.05×10−6 5.49×10−2 SSM 512 8 1.81×10−7 3.75×10−2 SSMd 215 8 2.04×10−7 2.25×10−2 SDP 4600 8 2.27×10−4 2.08×10−1

80

slide-81
SLIDE 81

Applications (LSTRS)

81

slide-82
SLIDE 82

Inverse Interpolation: Bathymetry of the Sea of Galilee

−45 −40 −35 −30 −25 −20 −15 −10 −5 198 200 202 204 206 208 210 212 235 240 245 250 255 x y

Dimension: 40401. Vectors: 5. MVP: 206.

82

slide-83
SLIDE 83

Image Restoration

50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250

True image Blurred and noisy image

50 100 150 200 250 50 100 150 200 250

Dimension:

65536

Vectors:

7

MVP:

201

LSTRS restoration

Using function blur from P.C. Hansen’s Regularization Tools.

83

slide-84
SLIDE 84

Other applications

Large-scale non-negative regularization (R & Steihaug 2002). Confidence intervals for solutions of large-scale discrete ill-posed problems (Eld´

en, Hansen & R 2005).

84

slide-85
SLIDE 85

Other applications

Large-scale non-negative regularization (R & Steihaug 2002). Confidence intervals for solutions of large-scale discrete ill-posed problems (Eld´

en, Hansen & R 2005).

Large-scale computer vision. Kahl and collaborators, Lund University, Sweden. 3D electrical impedance tomography. Soleimani and collaborators, University of Bath, U.K.

85

slide-86
SLIDE 86

Concluding Remarks

Trust regions yield efficient methods for solving general nonlinear optimization problems, and for both linear and nonlinear regularization problems. TRS is the main calculation in trust-region methods. The special features of the TRS in regularization problems influence the design of methods. There exist efficient methods for solving large-scale TRS arising in general optimization problems and in regularization.

86

slide-87
SLIDE 87

REFERENCES

Trust-Region Methods: A.R. Conn, N.I.M. Gould, and Ph.L. Toint. Trust-Region Methods, SIAM, Philadelphia, 2000.

  • J. Nocedal and S.J. Wright. Numerical Optimization.

Springer, New York, 2nd. ed., 2006. Trust Regions and Regularization, LSTRS: Thesis, Papers and Software can be downloaded from: http://www.imm.dtu.dk/∼mr

87