On Conservative Policy Iteration Bruno Scherrer INRIA Lorraine, - - PowerPoint PPT Presentation

on conservative policy iteration
SMART_READER_LITE
LIVE PREVIEW

On Conservative Policy Iteration Bruno Scherrer INRIA Lorraine, - - PowerPoint PPT Presentation

On Conservative Policy Iteration Bruno Scherrer INRIA Lorraine, LORIA ICML 2014 1 / 13 Motivation / Context Large Markov Decision Process A policy space A reference policy On-Policy data from Can we compute a


slide-1
SLIDE 1

On Conservative Policy Iteration

Bruno Scherrer

INRIA Lorraine, LORIA

ICML 2014

1 / 13

slide-2
SLIDE 2

Motivation / Context

  • Large Markov Decision Process
  • A policy space Π
  • A reference policy π ∈ Π
  • On-Policy data from π

Can we compute a provably better policy ?

  • Conservative Policy Iteration (Kakade & Langford, 2002; Kakade, 2003)
  • When local (gradient) optimization induces a (good) global

performance guarantee

2 / 13

slide-3
SLIDE 3

Motivation / Context

  • Large Markov Decision Process
  • A policy space Π
  • A reference policy π ∈ Π
  • On-Policy data from π

Can we compute a provably better policy ?

  • Conservative Policy Iteration (Kakade & Langford, 2002; Kakade, 2003)
  • When local (gradient) optimization induces a (good) global

performance guarantee

2 / 13

slide-4
SLIDE 4

Motivation / Context

  • Large Markov Decision Process
  • A policy space Π
  • A reference policy π ∈ Π
  • On-Policy data from π

Can we compute a provably better policy ?

  • Conservative Policy Iteration (Kakade & Langford, 2002; Kakade, 2003)
  • When local (gradient) optimization induces a (good) global

performance guarantee

2 / 13

slide-5
SLIDE 5

Outline

1 Markov Decision Processes 2 Conservative Policy Iteration 3 Practical Issues for a Guaranteed Improvement

3 / 13

slide-6
SLIDE 6

Outline

1 Markov Decision Processes 2 Conservative Policy Iteration 3 Practical Issues for a Guaranteed Improvement

4 / 13

slide-7
SLIDE 7

Infinite-Horizon Markov Decision Process

(Puterman, 1994; Bertsekas & Tsitsiklis, 1996; Sutton & Barto, 1998)

Markov Decision Process (MDP):

  • X is the state space,
  • A is the action space,
  • r : X → R is the reward function,

(rt = r(xt))

  • p : X × A → ∆X is the transition function.

(xt+1 ∼ p(·|xt, at)) Problem: Find a policy π : X → A that maximizes the value vπ(x) for all x: vπ(x) =E ∞

  • t=0

γtrt

  • x0 = x, {∀t, at= π(xt)}
  • .

(γ ∈ (0, 1))

5 / 13

slide-8
SLIDE 8

Infinite-Horizon Markov Decision Process

(Puterman, 1994; Bertsekas & Tsitsiklis, 1996; Sutton & Barto, 1998)

Markov Decision Process (MDP):

  • X is the state space,
  • A is the action space,
  • r : X → R is the reward function,

(rt = r(xt))

  • p : X × A → ∆X is the transition function.

(xt+1 ∼ p(·|xt, at)) Problem: Find a policy π : X → A that maximizes the value vπ(x) for all x: vπ(x) =E ∞

  • t=0

γtrt

  • x0 = x, {∀t, at= π(xt)}
  • .

(γ ∈ (0, 1))

5 / 13

slide-9
SLIDE 9

Notations

  • For any policy π, vπ is the unique solution of the Bellman

equation: ∀x, vπ(x) = r(x) + γ

  • y∈X

p(y|x, π(x))vπ(y) ⇔ vπ = Tπvπ ⇔ vπ = r + γPπvπ ⇔ vπ = (I − γPπ)−1r.

  • The optimal value v∗ is the unique solution of the Bellman
  • ptimality equation:

∀x, v∗(x) = max

a∈A

  • r(x) + γ
  • y∈X

p(y|x, a)v∗(y)

v∗ = Tv∗ ⇔ v∗ = max

π

Tπv∗.

  • π is a greedy policy w.r.t. v , written π = Gv, iff

∀x, π(x) ∈ arg max

a∈A

  • r(x) + γ
  • y∈X

p(y|x, a)v(y)

Tπv = Tv.

6 / 13

slide-10
SLIDE 10

Outline

1 Markov Decision Processes 2 Conservative Policy Iteration 3 Practical Issues for a Guaranteed Improvement

7 / 13

slide-11
SLIDE 11

Approximate Policy Iteration

(Exact) Policy Iteration πk+1 ← Gvπk (where vπk = Tπkvπk)

  • Guaranteed improvement in all states
  • π is (ǫ, ν)-approximately greedy with respect to v, written

π = Gǫ(ν, v), iff νT(Tv − Tπv) = Ex∼ν {[Tv](x) − [Tπv](x)} ≤ ǫ. API (Bertsekas & Tsitsiklis, 1996) πk+1 ← Gǫ(ν, vπk)

  • Performance may decrease in all states!

8 / 13

slide-12
SLIDE 12

Approximate Policy Iteration

(Exact) Policy Iteration πk+1 ← Gvπk (where vπk = Tπkvπk)

  • Guaranteed improvement in all states
  • π is (ǫ, ν)-approximately greedy with respect to v, written

π = Gǫ(ν, v), iff νT(Tv − Tπv) = Ex∼ν {[Tv](x) − [Tπv](x)} ≤ ǫ. API (Bertsekas & Tsitsiklis, 1996) πk+1 ← Gǫ(ν, vπk)

  • Performance may decrease in all states!

8 / 13

slide-13
SLIDE 13

Approximate Policy Iteration

(Exact) Policy Iteration πk+1 ← Gvπk (where vπk = Tπkvπk)

  • Guaranteed improvement in all states
  • π is (ǫ, ν)-approximately greedy with respect to v, written

π = Gǫ(ν, v), iff νT(Tv − Tπv) = Ex∼ν {[Tv](x) − [Tπv](x)} ≤ ǫ. API (Bertsekas & Tsitsiklis, 1996) πk+1 ← Gǫ(ν, vπk)

  • Performance may decrease in all states!

8 / 13

slide-14
SLIDE 14

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-15
SLIDE 15

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-16
SLIDE 16

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-17
SLIDE 17

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-18
SLIDE 18

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-19
SLIDE 19

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-20
SLIDE 20

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-21
SLIDE 21

Conservative Policy Iteration as a Projected Gradient Ascent Algorithm

  • π: current policy
  • π′: alternative policy
  • πα = (1 − α)π + απ′: α-mixture of π and π′

Taylor expansion of α → νTvπα = Ex∼ν[vπα(x)] around α = 0: νT(vπα − vπ) = νT[(I − γPπα)−1r − vπ] = νT(I − γPπα)−1(r − vπ + γPπαvπ) = νT[(I − γPπ)−1 + o(α)](Tπαvπ − Tπvπ) = νT[(I − γPπ)−1 + o(α)]α(Tπ′vπ − Tπvπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

with dT

ν,π = (1 − γ)νT(I − γPπ)−1.

  • The steepest direction is π′ ∈ Gvπ.
  • Choosing π′ ∈ Gǫ(dν,π, vπ) amounts to find an approximately

steepest direction.

9 / 13

slide-22
SLIDE 22

Conservative Policy Iteration

CPI (Kakade & Langford, 2002; Kakade, 2003) π′

k+1 ← Gǫ(dν,πk, vπk)

πk+1 ← (1 − α)πk + απ′

k+1

Convergence to a local maximum... Either dν,πk(Tπ′

k+1vπk − Tπkvπk)

  • is big: the slope is big and we make a lot of progress
  • is small (< ǫ): πk satisfies a relaxed optimality equation:

πk ∈ G2ǫ(dν,πk, vπk), which implies a global performance guarantee: µT(vπ∗ − vπk) ≤ Cπ∗ (1 − γ2)(2ǫ) where dµ,π∗ ≤ Cπ∗ν. This performance guarantee can be arbitraritly better than that known for Approximate PI (see also PSDP∞, that can be exponentially faster than CPI) (Scherrer, 2014)

10 / 13

slide-23
SLIDE 23

Conservative Policy Iteration

CPI (Kakade & Langford, 2002; Kakade, 2003) π′

k+1 ← Gǫ(dν,πk, vπk)

πk+1 ← (1 − α)πk + απ′

k+1

Convergence to a local maximum... Either dν,πk(Tπ′

k+1vπk − Tπkvπk)

  • is big: the slope is big and we make a lot of progress
  • is small (< ǫ): πk satisfies a relaxed optimality equation:

πk ∈ G2ǫ(dν,πk, vπk), which implies a global performance guarantee: µT(vπ∗ − vπk) ≤ Cπ∗ (1 − γ2)(2ǫ) where dµ,π∗ ≤ Cπ∗ν. This performance guarantee can be arbitraritly better than that known for Approximate PI (see also PSDP∞, that can be exponentially faster than CPI) (Scherrer, 2014)

10 / 13

slide-24
SLIDE 24

Conservative Policy Iteration

CPI (Kakade & Langford, 2002; Kakade, 2003) π′

k+1 ← Gǫ(dν,πk, vπk)

πk+1 ← (1 − α)πk + απ′

k+1

Convergence to a local maximum... Either dν,πk(Tπ′

k+1vπk − Tπkvπk)

  • is big: the slope is big and we make a lot of progress
  • is small (< ǫ): πk satisfies a relaxed optimality equation:

πk ∈ G2ǫ(dν,πk, vπk), which implies a global performance guarantee: µT(vπ∗ − vπk) ≤ Cπ∗ (1 − γ2)(2ǫ) where dµ,π∗ ≤ Cπ∗ν. This performance guarantee can be arbitraritly better than that known for Approximate PI (see also PSDP∞, that can be exponentially faster than CPI) (Scherrer, 2014)

10 / 13

slide-25
SLIDE 25

Conservative Policy Iteration

CPI (Kakade & Langford, 2002; Kakade, 2003) π′

k+1 ← Gǫ(dν,πk, vπk)

πk+1 ← (1 − α)πk + απ′

k+1

Convergence to a local maximum... Either dν,πk(Tπ′

k+1vπk − Tπkvπk)

  • is big: the slope is big and we make a lot of progress
  • is small (< ǫ): πk satisfies a relaxed optimality equation:

πk ∈ G2ǫ(dν,πk, vπk), which implies a global performance guarantee: µT(vπ∗ − vπk) ≤ Cπ∗ (1 − γ2)(2ǫ) where dµ,π∗ ≤ Cπ∗ν. This performance guarantee can be arbitraritly better than that known for Approximate PI (see also PSDP∞, that can be exponentially faster than CPI) (Scherrer, 2014)

10 / 13

slide-26
SLIDE 26

Outline

1 Markov Decision Processes 2 Conservative Policy Iteration 3 Practical Issues for a Guaranteed Improvement

11 / 13

slide-27
SLIDE 27

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) = α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) + o(α2)

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-28
SLIDE 28

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) −

2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-29
SLIDE 29

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ dT

ν,π(Tπ′vπ − Tπvπ) −

2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-30
SLIDE 30

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ (ˆ A − ρ) − 2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-31
SLIDE 31

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ (ˆ A − ρ) − 2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-32
SLIDE 32

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ (ˆ A − ρ) − 2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-33
SLIDE 33

Guaranteed Improvement ?

Recall the expansion of α → νTvπα: νT(vπα − vπ) ≥ α 1 − γ (ˆ A − ρ) − 2γα2 (1 − γ)2 Vmax

  • We need to estimate the direction π′ ∈ Gǫ(dν,π, vπ) and the

amplitude of the gradient: |ˆ A − dT

ν,π(Tπ′vπ − Tπvπ)| ≤ ρ

  • Samples from dν,π ⇒ resets to ν
  • Samples from Tπ′vπ − Tπvπ ⇒ exploration
  • Optimize α.

Can we estimate ˆ A without a simulator/generative model ?

12 / 13

slide-34
SLIDE 34

Summary

  • Can we improve a policy in a large MDP ?

Yes, do a gradient ascent (CPI)

  • If we repeat the process, can we get stuck in local optima ?

Yes, but any (approximately) local optimum satisfies a relaxed

  • ptimality equation that implies a nice global guarantee
  • Can we do this without a simulator ?

13 / 13

slide-35
SLIDE 35

Summary

  • Can we improve a policy in a large MDP ?

Yes, do a gradient ascent (CPI)

  • If we repeat the process, can we get stuck in local optima ?

Yes, but any (approximately) local optimum satisfies a relaxed

  • ptimality equation that implies a nice global guarantee
  • Can we do this without a simulator ?

13 / 13

slide-36
SLIDE 36

Summary

  • Can we improve a policy in a large MDP ?

Yes, do a gradient ascent (CPI)

  • If we repeat the process, can we get stuck in local optima ?

Yes, but any (approximately) local optimum satisfies a relaxed

  • ptimality equation that implies a nice global guarantee
  • Can we do this without a simulator ?

13 / 13

slide-37
SLIDE 37

Summary

  • Can we improve a policy in a large MDP ?

Yes, do a gradient ascent (CPI)

  • If we repeat the process, can we get stuck in local optima ?

Yes, but any (approximately) local optimum satisfies a relaxed

  • ptimality equation that implies a nice global guarantee
  • Can we do this without a simulator ?

13 / 13

slide-38
SLIDE 38

Summary

  • Can we improve a policy in a large MDP ?

Yes, do a gradient ascent (CPI)

  • If we repeat the process, can we get stuck in local optima ?

Yes, but any (approximately) local optimum satisfies a relaxed

  • ptimality equation that implies a nice global guarantee
  • Can we do this without a simulator ?

13 / 13

slide-39
SLIDE 39

References I

Archibald, T., McKinnon, K., & Thomas, L. 1995. On the Generation of Markov Decision Processes. Journal of the operational research society, 46, 354–361. Bertsekas, D.P., & Tsitsiklis, J.N. 1996. Neuro-dynamic programming. Athena Scientific. Kakade, Sham, & Langford, John. 2002. Approximately optimal approximate reinforcement learning. In: Icml. Kakade, S.M. 2003. On the sample complexity of reinforcement learning. Ph.D. thesis, University College London. Puterman, M. 1994. Markov Decision Processes. Wiley, New York. Scherrer, B. 2014. Approximate Policy Iteration Schemes: A Comparison. In: Icml.

14 / 13

slide-40
SLIDE 40

References II

Sutton, R.S., & Barto, A.G. 1998. Reinforcement learning, an introduction. BradFord Book. The MIT Press.

15 / 13

slide-41
SLIDE 41

Numerical Simulations

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

API

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

API(0.1)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

CPI+ (line search)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

CPI(0.1)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

NSPI(5)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

NSPI(10)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

NSPI(30)

20 40 60 80 100 Iterations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ¹(v¼ ¤¡v¼k)

PSDP1

Experiments made on 33 ∗ 30 ≃ 800 Garnet problems (Archibald et al. , 1995). For each problem, one runs 30 times each algorithm.

13 / 13