[PPT] - Why adjoint based least squares solving ought to be optimal Andreas PowerPoint Presentation

SLIDE 1

Why adjoint based least squares solving

ught to be optimal

Andreas Griewank

Department of Mathematics, Humboldt-Universit¨ at zu Berlin, Germany School of Information Sciences, Yachaytech, Ibarra, Ecuador September 2, 2015

Numerical Methods for Large-Scale Nonlinear Problems and their Applications

ICERM, Brown University, Providence, RI with thanks to Andrea Walther(PAB) and Sebastian Schlenkrich(TUD) Sandra Schneider(HUB) and Claudia Tutsch(CLU)

Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving

2. September

1 / 4

SLIDE 2

Setting

Problem

min ϕ(x) ≡ 1

2F(x)2 2

for F : Rn → Rm with n ≤ m First order optimality condition (necessary) 0 = ∇ϕ(x∗) ≡ F(x∗)⊤F ′(x∗) ∈ Rn Second order optimality condition (sufficient) 1 > κ∗ ≡ R−⊤

∗ m

i=1

Fi(x)∇2Fi(x)R−1

∗ 2

with F ′(x∗) = Q∗R∗

Derivative availability and cost

OPS

˙

y ≡ F ′(x) ˙ x

OPS
y ≡ F(x)
≤ 3

, 4 ≥ OPS

¯

x⊤ ≡ ¯ y⊤F ′(x)

OPS
y ≡ F(x)
Andreas Griewank (HU-Berlin)

Theoretically optimal least squares solving

2. September

2 / 4

SLIDE 3

Gauss Adjoint Broyden Method

Tangent conditions for B ≈ F ′

B+s = y ≡ F ′(x+) s ∈ Rm and B⊤

+σ = F ′(x+)⊤σ ∈ Rn

Transposed Broyden Update

B+ = B + σσ⊤ σ

⊤σ (F ′(x+) − B)

for σ = y and σ = r ≡ y − Bs yields rank-two update, which can be implemented in O(mn) operations.

Resulting Properties

Frobenius norm change minimality, domain transformation invariance, and heredity on affine systems F(x) = Ax − b.

Quasi-Gauss-Newton Iteration

x+ = x − α(B⊤B)−1∇ϕ(x) with α by Andersen(m=1)

Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving

2. September

3 / 4

SLIDE 4

Provable Properties

Global convergence

0 = inf

k ∇ϕ(xk)

⇐ x0 ∈ {ϕ(x) ≤ c} compact and rank(F ′(x)) = n

Asymptotic R-rate in overdetermined case (m > n)

0 = inf

k xk − x∗ ⇒ lim sup k→∞

xk − x∗

1 k ≤ κ∗ < 1

Asymptotic order in consistent case (m = n)

lim inf

k→∞ |log(xk − x∗)| 1 k ≥ ρn ≈ 1 + log(n) n

with 1 = ρn+1

n

− ρn

n

On affine problems

Finite termination in ≤ n steps, (´ a la GMRES when m = n and B0 = I.)

Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving

2. September

4 / 4

SLIDE 5

1) Department of Mathematics, Humboldt University at Berlin 2) School of Information Sciences, Yachaytech, Ibarra, Eucador

Numerical Methods for Large-Scale Nonlinear Problems and Their Applications ICERM at Brown University

Piecewise linearizations of nonsmooth equations and their numerical solution

Andreas Griewank1,2 Tom Streubel1 Richard Hasenfelder1

SLIDE 6

evaluation procedures

single assignment code acyclic directed computational graph function expression assume to be a chain of functions from some Library and the absolute value function the expression can be recast as single assignment code here is a dependence relation generating a partial order

SLIDE 7

Algorithmic Piecewise Linearization - I

tangent mode secant mode

 propagate piecewise linear rather than linear approximations  therefor replace difgerentiable elementals by its linear tangent/secant model  as well as absolute value function by itself

basic idea

SLIDE 8

For any single assignment evaluate an increment either choose

 one reference point

(tangent mode)

 two reference points (secant mode)

These increments depends on reference point(s) and preceding increments. So we write



(tangent mode)



(secant mode)

Algorithmic Piecewise Linearization - II

SLIDE 9

where

is called tangent piecewise linear model of at and satisfjes is called secant piecewise linear model of at if

Inhomogeneous tangent model Inhomogeneous secant model

Algorithmic Piecewise Linearization - III

SLIDE 10

Algorithmic Piecewise Linearization - IV

 Algorithmic piecewise linearization can be performed by slight modifjcations

f common AD-Tools (e.g. Adol-C)

see autodifg.org →

 general properties of PL functions  Lipschitz continuous  consists of linear and

absolute value functions

 correspond to a polyhedral

subdivision

 a polyhedron with non empty

interior is called essential

 Implication chain (by S. Scholtes):  openness is equivalent to coherent orientation:

SLIDE 11

Approximation properties of PL models

Implications:

 

For some (algorithmically computable) Lipschitz constant

simplifjes to, if

For some (algorithmically computable) Lipschitz constant

SLIDE 12

Newton via successive piecewise linearization I

Let be a root of a algorithm . If for a fjxed radius then is called feasible tangent mode iteration. if again for a fjxed radius then is called feasible secant mode iteration, where and set-valued inverses

Tangent mode Secant mode

SLIDE 13

Quadratic or golden ration convergence rate

assume feasibility of secant mode iteration as well as (local strong metric regularity) satisfjed, then the secant mode iteration converges with Golden ratio rate to the root assume feasibility of tangent mode iteration as well as (local strong metric regularity) satisfjed, the tangent mode iteration converges quadratically (rate ) to

Tangent mode Secant mode

SLIDE 14

Newton via successive piecewise linearization II

strong metric regularity in i.e. is implied by openness of the restriction of to

So far we know

feasibility of both iterations is implied by injectivity of

Open Newton Conjecture: feasibility is already guaranteed in case of openness of

SLIDE 15

SLIDE 16

SLIDE 17

SLIDE 18

SLIDE 19

SLIDE 20

 For any vector take its angle from

polar coordinate representation and map it by some difgerentiable (right picture) or bijective function (picture below)

 thereby preserve its euclidean norm

A 2D oscillating test example

SLIDE 21

A 2D oscillating test example – homogen. part

 upper half of is

stretched (blue)

 lower half is

compressed (red)

 is bijective and Lipschitz

continuous

 the line is kept

fjxed

 almost everywhere

difgerentiable, but not at

rigin

SLIDE 22

A 2D oscillating test example

SLIDE 23

A 2D oscillating test example

SLIDE 24

A 2D oscillating test example

SLIDE 25

A 2D oscillating test example

SLIDE 26

A 2D oscillating test example

SLIDE 27

Piecewise linear subproblem I

Defjnition: Abs-normal Form PL

 Any piecewise linear function can be represented this way  the matrix is of strict lower triangular form

thus can be evaluated explicitly and element wise

 the abs-normal form is numerically stable use as data structure  the signature of is defjned as follows

each one corresponds to a polyhedron from the polyhedral subdivision of

 Task: search a root such that

SLIDE 28

Piecewise linear subproblem II

 one can simplify the polyhedral structure of a given problem

Find

 evaluate Schur-complement of and defjne

(we refer this as original piecewise linear problem or short OPL) (we refer this as complementary piecewise linear problem or short CPL)

 CPL's and LCP's are equivalent formulations via Möbius transformation  there is a one-to-one solution correspondence between OPL and CPL

Find

SLIDE 29

Full step Newton method I

 both are generalized Newton methods in the sense of Qi and Sun  But we seek global rather than local convergence criteria  Converges from every starting point towards a solution if either

is satisfjed and the root is unique

 for a essential signature is always a limiting Jacobian of the underlying PL function

where

By the one to one solution correspondence search a root of one of the two systems

(OPL) (CPL) where

r

SLIDE 30

conditions for contractivity

r

 OPL: Assume from the abs-normal Form to be regular then if

both conditions are satisfjed.

 CPL: both conditions are satisfjed if

r

Verify the conditions is NP-hard but one can fjnd suffjcient conditions:

Full step Newton method II

SLIDE 31

Restricted Newton method

Piecewise-Newton

(OPL) (CPL)

 here is called critical multiplier and maximal s.t. the Newton step doesn't leave the

closure of the polyhedron corresponding to the chosen essential Signature

 the step is shrunk by non smoothness arising on its direction  the paths

are bifurcation free for almost all starting points and also for the CPL

 if the Problem is c.o. then the piecewise Newton converges from everywhere to a root

Under the assumption of coherent orientation (c.o.):

SLIDE 32

Outlook

 proof open Newton conjecture  further develop PL Algebra Package Plan-C (C++)

→ method optimization and comparison

 Branin's modifjcation for PL-Newton on PL equation systems

(for non open problems)

 use clipped Models to preserve global properties (i.e. symmetric, bounded)  extension to euclidean norm or algebraic inclusion

SLIDE 33

References

 A.G.: On stable piecewise linearization and generalized algorithmic difgerentiation.

Optimization Methods and Software, (2013).

 P. Boeck, B. Gompil, A. Griewank, R. Hasenfelder, N. Strogies: Experiments with

Generalized Midpoint and Trapezoidal Rules on two Nonsmooth ODE's, Mongolian Journal of Mathematics, (2014).

 A.G., M. Radons, T. Streubel, J.U. Bernd: On solving piecewise linear systems in abs-

normal form. Linear Algebra and Applications, (2014).

 A.G., A. Walther, S. Fiege, T. Bosse: On Lipschitz optimization based on gray-box

piecewise linearization, Mathematical Programming, (2015).

 A.G. et al: Plan-C, Piecewise Linear functions in Abs Normal form, software under

development.

 M. Radons, A.G., T. Streubel, J.U. Bernd: IFIP TC 7 / 2013 System Modelling and

Optimization September 8-13, (2013), Proceeding paper published in System Modeling and Optimization, Springer, (2014)

SLIDE 34