Derivative-Free Optimization of Noisy Functions via Quasi-Newton - PowerPoint PPT Presentation

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods: Experiments and Theory Richard Byrd University of Colorado, Boulder Albert Berahas Jorge Nocedal Northwestern University Northwestern University Huatulco, Jan 2018 1

My thanks to the organizers. Saludos y Gracias a Don Goldfarb 2

Numerical Results We compare our Finite Difference L-BFGS Method (FD-LM) to Model interpolation trust region method (MB) of Conn, Scheinberg, Vicente. Their method, DFOtr, is: a simple implementation not designed for fast execution does not include a geometry phase Our goal is not to determine which method “wins”. Rather 1. Show that the FD-LM method is robust 2. Show that FD-LM is not wasteful in function evaluations 3

Adaptive Finite Difference L-BFGS Method Estimate noise ε f Compute h by forward or central differences [(4-8) function evaluations] Compute g k While convergence test not satisfied: d = − H k g k [L-BFGS procedure] ( x + , f + , flag ) = LineSearch( x k , f k , g k , d k , f s ) IF flag=1 [line search failed] (x + , f + , h ) = Recovery( x k , f k , g k , d k , max iter ) endif x k + 1 = x + , f k + 1 = f + Compute g k + 1 [finite differences using h ] s k = x k + 1 − x k , y k = g k + 1 − g k T y k ≤ 0 Discard ( s k , y k ) if s k k = k + 1 endwhile 4

Test problems Plotting f ( x k ) − φ * vs no. of f evaluations We show results for 4 representative problems 5

Numerical Results – Stochastic Additive Noise f ( x ) = φ ( x ) + ε ( x ) ε ( x ) ~ U ( − ξ , ξ ) ξ ∈ [10 − 8 , … ,10 − 1 ] s271 s334 Stochastic Additive Noise:1e-08 Stochastic Additive Noise:1e-08 10 4 10 2 DFOtr DFOtr FDLM (FD) FDLM (FD) 10 2 FDLM (CD) FDLM (CD) 10 0 10 0 10 -2 F(x)-F * F(x)-F * 10 -2 10 -4 10 -4 10 -6 10 -6 10 -8 10 -10 10 -8 100 200 300 400 500 600 50 100 150 200 250 300 Number of function evaluations Number of function evaluations s271 s334 Stochastic Additive Noise:1e-02 Stochastic Additive Noise:1e-02 10 3 10 2 DFOtr DFOtr FDLM (FD) FDLM (FD) 10 2 FDLM (CD) FDLM (CD) 10 1 10 1 10 0 F(x)-F * F(x)-F * 10 0 10 -1 10 -1 10 -2 10 -2 6 10 -3 10 -3 100 200 300 400 500 600 50 100 150 200 250 300 Number of function evaluations Number of function evaluations

Numerical Results – Stochastic Additive Noise (continued) f ( x ) = φ ( x ) + ε ( x ) ε ( x ) ~ U ( − ξ , ξ ) ξ ∈ [10 − 8 , … ,10 − 1 ] s293 s289 Stochastic Additive Noise:1e-08 Stochastic Additive Noise:1e-08 10 8 10 0 DFOtr DFOtr 10 6 FDLM (FD) FDLM (FD) FDLM (CD) FDLM (CD) 10 -2 10 4 10 2 10 -4 F(x)-F * F(x)-F * 10 0 10 -6 10 -2 10 -4 10 -8 10 -6 10 -8 10 -10 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 Number of function evaluations Number of function evaluations s293 s289 Stochastic Additive Noise:1e-02 Stochastic Additive Noise:1e-02 10 8 10 0 DFOtr FDLM (FD) 10 6 FDLM (CD) DFOtr 10 4 10 -1 FDLM (FD) F(x)-F * F(x)-F * FDLM (CD) 10 2 10 0 10 -2 10 -2 7 10 -4 10 -3 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 Number of function evaluations Number of function evaluations

Numerical Results – Stochastic Additive Noise – Performance Profiles τ = 10 -5 τ = 10 -5 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 DFOtr DFOtr FDLM (FD) FDLM (FD) 0.1 0.1 FDLM (CD) FDLM (CD) 0 0 1 2 4 1 2 4 8 16 Performance Ratio Performance Ratio 8

Numerical Results – Stochastic Multiplicative Noise – Performance Profiles τ = 10 -5 τ = 10 -5 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 DFOtr DFOtr FDLM (FD) FDLM (FD) 0.1 0.1 FDLM (CD) FDLM (CD) 0 0 1 2 4 1 2 4 8 Performance Ratio Performance Ratio 9

Numerical Results – Hybrid Method – Recovery Mechanism • As Jorge mentioned in Part I, our algorithm has a recovery mechanism • This procedure is very important for the stable performance of the method • Principle recovery mechanism is to re-estimate h • HYBRID METHOD: If h is acceptable, then we switch from Forward to Central differences 10

Numerical Results – Hybrid FC Method – Stochastic Additive Noise s241 s267 s208 s246 Stochastic Multiplicative Noise:1e-02 Stochastic Multiplicative Noise:1e-02 Stochastic Additive Noise:1e-08 Stochastic Additive Noise:1e-06 10 3 10 1 10 2 10 1 DFOtr DFOtr 10 0 10 0 FDLM (FD) 10 2 FDLM (FD) 10 0 FDLM (CD) FDLM (CD) FDLM (HYBRID) 10 -1 FDLM (HYBRID) 10 -1 10 1 10 -2 10 -2 10 -2 DFOtr F(x)-F * 10 0 F(x)-F * F(x)-F * F(x)-F * FDLM (FD) 10 -3 10 -4 10 -3 FDLM (CD) 10 -1 FDLM (HYBRID) 10 -4 10 -4 10 -6 10 -2 10 -5 DFOtr 10 -5 FDLM (FD) 10 -8 10 -3 FDLM (CD) 10 -6 10 -6 FDLM (HYBRID) 10 -4 10 -7 10 -10 10 -7 50 100 150 200 250 300 50 100 150 200 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200 50 100 150 200 250 300 Number of function evaluations Number of function evaluations Number of function evaluations Number of function evaluations 11

Numerical Results – Hybrid Method FC – Stochastic Multiplicative Noise s241 s267 Stochastic Multiplicative Noise:1e-02 Stochastic Multiplicative Noise:1e-02 10 3 10 1 DFOtr 10 0 FDLM (FD) 10 2 FDLM (CD) FDLM (HYBRID) 10 -1 10 1 10 -2 F(x)-F * 10 0 F(x)-F * 10 -3 10 -1 10 -4 10 -2 10 -5 DFOtr FDLM (FD) 10 -3 FDLM (CD) 10 -6 FDLM (HYBRID) 10 -4 10 -7 50 100 150 200 250 300 50 100 150 200 250 300 350 400 450 500 Number of function evaluations Number of function evaluations 12

Numerical Results – Conclusions • Both methods are fairly reliable • FD-LM method not wasteful in terms of function evaluations • No method dominates • Central difference appears to be more reliable, but is twice as expensive per iteration • Hybrid approach shows promise 13

Convergence analysis 1. What can we prove about the algorithm proposed here? 2. We first note that there is a theory for the Implicit Filtering Method of Kelley – which is a finite difference BFGS method • He establishes deterministic convergence guarantees to the solution • Possible because it is assumed that noise can be diminished as needed at every iteration • Similar to results on Sampling methods for stochastic objctives 3. In our analysis we assume that noise does not go to zero • We prove convergence to a neighborhood of the solution whose radius depends on the noise level in the function • Results of this type were pioneered by Nedic-Bertsekas for incremental gradient method with constant steplengths 4. We prove two sets of results for strongly convex functions • Fixed steplength • Armijo line search 5. Up to now, little analysis of line search with noise 14

Discussion 1. The algorithm proposed here is complex, particularly if the recovery mechanism is included 2. The effect that noisy function evaluations and finite difference gradient approximations have on the line search are difficult to analyze 3. In fact: the study of stochastic line searches is one of our current research projects 4. How should results be stated: • in expectation? • in probability? • what assumptions on the noise are realistic? • φ ( x ) some results in the literature assume the true function value is available • This field is emerging 15

Context of our analysis 1. We will bypass these thorny issues by assuming that • Noise in the function and gradient are bounded ≤ C f ≤ C g ‖ ε ( x ) ‖ ‖ e ( x ) ‖ • And consider a general gradient method with errors x k + 1 = x k − α k H k g k • g k is any approximation to the gradient • could stand for a finite difference approximation or some other • treatment is general • to highlight the novel aspects of this analysis we assume H k =I 16

Fixed Steplength Analysis Iteration x k + 1 = x k − α g k f ( x ) = φ ( x ) + ε ( x ) Recall Assume µ I ≺ ∇ 2 φ ( x k ) ≺ LI Define g k = ∇ φ ( x k ) + e ( x k ) ≤ C g ‖ e ( x ) ‖ Theorem. If α < 1/ L then for all k φ ( x k + 1 − φ N ) ≤ (1 − α µ )[ φ ( x k ) − φ N ] 2 φ N ≡ φ * + C g 2 µ best possible objective value Therefore, 2 φ k − φ * ≤ (1 − α µ ) k ( φ 0 − φ N ) + C g 2 µ 17

Idea behind the proof 18

Line Search Our algorithm uses a line search Move away from fixed steplengths and exploit the power of line searches Very little work on noisy line searches How should sufficient decrease be defined? Introduce new Armijo condition: f ( x k + α d k ) ≤ f ( x k ) + c 1 α g k T d k + ε A where α = max{1, τ , τ 2 , … } and ε A > 2 C f 19

Line Search Analysis New Armijo condition: f ( x k + α d k ) ≤ f ( x k ) + c 1 α g k T d k + ε A where α = max{1, τ , τ 2 , … } and ε A > 2 C f Because of relaxation term Armijo is always satisfied for alpha <<1. But how long will the step be? Consider 2 sets of iterates: Case 1: Gradient error is small relative to gradient. Step of 1/L is accepted, and good progress is made. Case 2: Gradient error is large relative to gradient. Step could be poor, but size of step is only of order C g 20

Derivative-Free Optimization of Noisy Functions via Quasi-Newton - PowerPoint PPT Presentation

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods: Experiments and Theory Richard Byrd University of Colorado, Boulder Albert Berahas Jorge Nocedal Northwestern University Northwestern University

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Transport for the 1D Schr odinger equation via quasi-free systems (Collaboration with V.

Derivative Free Optimization Optimization and AMS Masters - University Paris Saclay Exercices -

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Montlake Project Monthly Construction Update March 4, 2020 Thank you for joining us. Our

Peak Performance Introducing Nick Bishop What to expect this evening! ! Elite Athletes

Load Management Programs Water Heater / Air Conditioning Control A/C - 7.5 minute Cycle

The main areas of action are the following: - Vibrations, seismic and shocks . Engineering and

Multi-Contact Compliant Motion Control for Robotic Manipulators Jaeheung Park , Rui Cortesao

By Peter Grace Concerned Citizens of Brisbane and Pacifica STOP Jet Noise NOW! Brisbane, 94005

CIVIC ASSOCIATION COMMUNITY MEETING December 10, 2018 Agenda City of Alexandria Potomac

Performance of Indoor Powerline Networks Presentation by: David Kleinschmidt Paper By: Rohan

Derivative-Free Optimization of Noisy Functions via Quasi-Newton - PowerPoint PPT Presentation

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods: Experiments and Theory Richard Byrd University of Colorado, Boulder Albert Berahas Jorge Nocedal Northwestern University Northwestern University

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Transport for the 1D Schr odinger equation via quasi-free systems (Collaboration with V.

Derivative Free Optimization Optimization and AMS Masters - University Paris Saclay Exercices -

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities &amp; Securities &amp; Derivative Derivative Litigation Repor t t Litigation Repor

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities Board of India Guest Lecture Convergence of Derivative and Cash Markets Andrew Sheng

Montlake Project Monthly Construction Update March 4, 2020 Thank you for joining us. Our

Peak Performance Introducing Nick Bishop What to expect this evening! ! Elite Athletes

Load Management Programs Water Heater / Air Conditioning Control A/C - 7.5 minute Cycle

The main areas of action are the following: - Vibrations, seismic and shocks . Engineering and

Multi-Contact Compliant Motion Control for Robotic Manipulators Jaeheung Park , Rui Cortesao

By Peter Grace Concerned Citizens of Brisbane and Pacifica STOP Jet Noise NOW! Brisbane, 94005

CIVIC ASSOCIATION COMMUNITY MEETING December 10, 2018 Agenda City of Alexandria Potomac

Performance of Indoor Powerline Networks Presentation by: David Kleinschmidt Paper By: Rohan

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report