Convergence of Local Search Sebastian U. Stich a,b joint work with - - PowerPoint PPT Presentation

convergence of local search
SMART_READER_LITE
LIVE PREVIEW

Convergence of Local Search Sebastian U. Stich a,b joint work with - - PowerPoint PPT Presentation

ThRaSH Workshop 2012 Convergence of Local Search Sebastian U. Stich a,b joint work with uller a,b , Bernd G artner a Christian L. M Institute of Theoretical Computer Science Department of Computer Science a , Swiss Institute of


slide-1
SLIDE 1

ThRaSH Workshop 2012

Convergence of Local Search

Sebastian U. Sticha,b joint work with Christian L. M¨ ullera,b, Bernd G¨ artnera

Institute of Theoretical Computer Science Department of Computer Sciencea, Swiss Institute of Bioinformaticsb

ETH Z¨ urich Mai 3, 2012

  • S. Stich

Random Pursuit

slide-2
SLIDE 2

Introduction Local Search Outlook

Table of contents

1

Introduction Black-box setting Convex functions

2

Local Search Definition Convergence Examples

3

Outlook Outlook and Open Problems

  • S. Stich

Random Pursuit

slide-3
SLIDE 3

Introduction Local Search Outlook Black-box setting Convex functions

Black-box optimization

Given: f : E → R Goal: min

x∈E f(x)

x → →      f(x) Problem class f convex Oracle access to (f(x)) Complexity: number of oracle calls sufficient to solve any problem of the class Solution: y : f(y) − min

x∈E f(x) ≤ ǫ

  • S. Stich

Random Pursuit

slide-4
SLIDE 4

Introduction Local Search Outlook Black-box setting Convex functions

Convex functions

first-order condition: f(y) ≥ f(x) + ∇f(x), y − x , ∀x, y ∈ E

f(x) (x0, f(x0)) f(x0) + ∇f(x0), y − x

  • S. Stich

Random Pursuit

slide-5
SLIDE 5

Introduction Local Search Outlook Black-box setting Convex functions

Convex functions II

Quadratic upper bound: f(y) ≤ f(x)+∇f(x), y − x+ L 2 y − x2 f(x) (x0, f(x0)) f(x0) + ∇f(x0), y − x Quadratic lower bound: (strongly convex) f(y) ≥ f(x)+∇f(x), y − x+ µ 2 y − x2 f(x) (x0, f(x0)) f(x0) + ∇f(x0), y − x

We call κ := L/µ condition number; µ · In ∇2f(x) L · In Only (!) for strongly convex: x − x∗2 ≤ 2

µ(f(x) − f(x∗))

  • S. Stich

Random Pursuit

slide-6
SLIDE 6

Introduction Local Search Outlook Definition Convergence Examples

Local search

σk xk 1 − γk xk+1 = xk + σkuk Sufficient decrease 0 < γ ≤ γk ≤ 1: f(xk+1) ≤ (1 − γk)f(xk) + γk min

t∈R f(xk + tuk)

  • S. Stich

Random Pursuit

slide-7
SLIDE 7

Introduction Local Search Outlook Definition Convergence Examples

Sufficient decrease

Sufficient decrease 0 < γ ≤ γk ≤ 1: f(xk+1) ≤ (1 − γk)f(xk) + γk min

t∈R f(xk + tuk)

⇒ f(xk) − f(xk+1) ≥ γk (f(xk) − f(xk + tuk)) ∀t ∈ R Set t = −

  • ∇f(xk)

∇f(xk)2 , uk

  • :=βk

·∇f(xk)2 L

  • S. Stich

Random Pursuit

slide-8
SLIDE 8

Introduction Local Search Outlook Definition Convergence Examples

Single step progress

We use this t together with our assumptions. Quadratic upper bound: f(xk) − f(xk+1) ≥ γ 2β2

k ∇f(xk)2 2

Quadratic lower bound: f(xk) − f(xk+1) ≥ γµ L β2

k (f(xk) − f(x∗))

Progress: f(xk+1) − f(x∗)

  • :=fk+1

  • 1 − β2

k

γ κ

  • · (f(xk) − f(x∗))
  • :=fk
  • S. Stich

Random Pursuit

slide-9
SLIDE 9

Introduction Local Search Outlook Definition Convergence Examples

Global convergence

After N steps: ln fN − ln f0 ≤

N−1

  • k=0

ln fk+1 − ln fk ≤

N−1

  • k=0

ln

  • 1 − β2

k

γ κ

  • ≤ −

N−1

  • k=0

β2

k

γ κ fN ≤ f0 · exp

  • −γ

κ

N−1

  • k=0

β2

k

  • S. Stich

Random Pursuit

slide-10
SLIDE 10

Introduction Local Search Outlook Definition Convergence Examples

Convergence with high probability

β2 = v, u2 v ∈ Sn−1 , u ∼ Sn−1 v u E

  • β2

= E

  • v, u2

= 1 n Var

  • β2

= Var

  • v, u2

≤ 2 n2 P N−1

  • k=1

v, uk2 < (1 − ǫ)N n

  • ≤ Var
  • β2

N · 1 ǫ2 E [β2]2 = 2 ǫ2N

  • S. Stich

Random Pursuit

slide-11
SLIDE 11

Introduction Local Search Outlook Definition Convergence Examples

Convergence with high probability

For N = Ω(n): fN ≤ f0·exp

  • −γ

κ

N−1

  • k=0

β2

k

  • ≤ f0·exp
  • −γ

κ · (1 − ǫ)N n

  • w.h.p.
  • S. Stich

Random Pursuit

slide-12
SLIDE 12

Introduction Local Search Outlook Definition Convergence Examples

Example I - Random Pursuit

u ∼ Sn−1 + approximate line search: σk ∈

  • 0.5 · arg min

h∈R f(xk + hu) , arg min h∈R f(xk + hu) + δ

  • σ

x

  • S. Stich

Random Pursuit

slide-13
SLIDE 13

Introduction Local Search Outlook Definition Convergence Examples

Example II - Random Gradient Method [Nesterov 2011]

u ∼ Sn−1 + estimated stepsize: σk ≈ − 1 Lf′(u, xk) σ x

  • S. Stich

Random Pursuit

slide-14
SLIDE 14

Introduction Local Search Outlook Definition Convergence Examples

Example III & IV Different spaces

Discrete space Matrices f : Rn×n → R

  • S. Stich

Random Pursuit

slide-15
SLIDE 15

Introduction Local Search Outlook Definition Convergence Examples

Example V (?) - Optimal 1/5 rule

u ∼ N(0, In) + ’1/5’- rule: P [f(xk + σku) ≤ f(xk)] = const

  • S. Stich

Random Pursuit

slide-16
SLIDE 16

Introduction Local Search Outlook Outlook and Open Problems

Outlook and Open Problems

Possible:

Concentration for N = Ω(log n) Different search directions (not only Sn−1) Non-isotropic sampling Interesting spaces (e.g. matrices) Smooth convex functions

Would be very nice:

Constraint handling Apply to 1/5-rule stepsize rule [like (1+1)-ES]

Open:

Extension of the model to ”almost convex functions”

Thank you

  • S. Stich

Random Pursuit

slide-17
SLIDE 17

References

References

S.U. Stich, C.L. M¨ uller, B. G¨

  • artner. Optimization of convex

functions with Random Pursuit 2011. S.U. Stich. Convergence of Local Search, Manuscript 2012.

  • S. Stich

Random Pursuit