New Error Bounds for Approximations from Projected Linear Equations - - PowerPoint PPT Presentation

new error bounds for approximations from projected linear
SMART_READER_LITE
LIVE PREVIEW

New Error Bounds for Approximations from Projected Linear Equations - - PowerPoint PPT Presentation

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary New Error Bounds for Approximations from Projected Linear Equations H. Yu . Bertsekas D. P Department of Computer Science University of


slide-1
SLIDE 1

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

New Error Bounds for Approximations from Projected Linear Equations

  • H. Yu∗
  • D. P

. Bertsekas∗∗

∗Department of Computer Science

University of Helsinki

∗∗Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology

European Workshop on Reinforcement Learning, Lille, France, Jun. 30 – Jul. 4, 2008

slide-2
SLIDE 2

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Outline

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

slide-3
SLIDE 3

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Projected Equations and TD Type Methods

x∗: a solution of the linear fixed point equation x = Ax + b ¯ x: the solution of the projected equation x = Π(Ax + b) Π: weighted Euclidean projection on subspace S ⊂ ℜn, dim(S) << n Assume: I − ΠA invertible Example: TD(λ) for approximate policy evaluation in MDP

  • Solve a projected form of a multistep Bellman equation; linear function

approximation of the cost function

  • A: a stochastic or substochastic matrix
  • ΠA is usually a contraction

Example: large linear systems of equations in general

slide-4
SLIDE 4

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Two Standard Error Bounds for the Contraction Case

x∗ − ¯ x: approximation error due to solving projected equation Standard bound I (arbitrary norm): assume ΠA = α < 1, then x∗ − ¯ x ≤ 1 1 − αx∗ − Πx∗ (1) Standard bound II (weighted Euclidean norm · ξ, use Pythagorean theorem, much sharper than I): assume ΠAξ = α < 1, then x∗ − ¯ xξ ≤ 1 √ 1 − α2 x∗ − Πx∗ξ (2)

  • These are upper bounds on the ratios of

amplification: x∗ − ¯ xξ x∗ − Πx∗ξ bias-to-distance: ¯ x − Πx∗ξ x∗ − Πx∗ξ

  • Our bounds will be in a similar form

x∗ − ¯ xξ ≤ B(A, ξ, S) x∗ − Πx∗ξ , but apply to both contraction and non-contraction cases.

slide-5
SLIDE 5

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Illustration of the Form of Bounds

S Cone specified by error bound B(A, ξ, S) x∗ Πx∗ Approximation ¯ x

  • B(A, ξ, S) = 1

⇒ ¯ x = Πx∗

slide-6
SLIDE 6

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Data-Dependent Error Analysis: Motivations

Motivation I: with or without contraction assumptions, x∗ − ¯ x = (I − ΠA)−1(x∗ − Πx∗) (3) How this equality is relaxed in the standard bounds:

  • Standard bound I:

(I − ΠA)−1 = I + ΠA + (ΠA)2 + · · · , (ΠA)m ≤ αm

  • Standard bound II:

(I − ΠA)−1 = I + ΠA(I − ΠA)−1 x∗ − ¯ x2

ξ = x∗ − Πx∗2 ξ + ΠA(I − ΠA)−1(x∗ − Πx∗)2 ξ

= x∗ − Πx∗2

ξ + ΠA(x∗ − ¯

x)2

ξ ≤ x∗ − Πx∗2 ξ + α2x∗ − ¯

x2

ξ

slide-7
SLIDE 7

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Data-Dependent Error Analysis: Motivations

Motivation II: (I − ΠA)−1 = I + ΠA(I − ΠA)−1 = I + (I − ΠA)−1ΠA (i) Bound the term (I − ΠA)−1ΠA(x∗ − Πx∗) directly so that α will not be in the denominator (ii) Seek computable bounds with low order calculations involving small size matrices Consider the technical side of (ii): some notation and facts

  • Φ : n × k matrix, whose columns form a basis of S; Ξ = diag(ξ)
  • k × k matrices:

B = Φ′ΞΦ , M = Φ′ΞAΦ , F = (I − B−1M)−1

  • Π = Φ(Φ′ΞΦ)−1Φ′Ξ = ΦB−1Φ′Ξ;

the projected equation is equivalent to Φr = ΦB−1` Mr + Φ′Ξb ´ , r ∈ ℜk

  • B and M can be computed easily by simulation.
slide-8
SLIDE 8

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Technical Lemmas for New Error Bounds

Lemma 1

(I − ΠA)−1 = I + (I − ΠA)−1ΠA = I + ΦFB−1Φ′ΞA . (4) Also, I − ΠA invertible ⇐ ⇒ F = (I − B−1M)−1 exists.

Lemma 2

H and D: n × k and k × n matrix, respectively. Then, HD2

ξ = σ

` (H′ΞH)(DΞ−1D′) ´ . (5) Apply the lemmas to bound (I − ΠA)−1(x∗ − Πx∗)ξ: First bound: (I − ΠA)−1ΠA(x∗ − Πx∗)

Lemma 1

= ΦFB−1 | {z }

H

Φ′Ξ |{z}

D

A (x∗ − Πx∗) = ⇒ (I − ΠA)−1ΠA(x∗ − Πx∗)2

ξ Lemma 2

≤ σ(G1)A2

ξ(x∗ − Πx∗)2 ξ

where G1 = (H′ΞH)(DΞ−1D′) = B−1F ′BF.

slide-9
SLIDE 9

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Main Results: First Bound

Theorem 1

x∗ − ¯ xξ ≤ q 1 + σ(G1)A2

ξ x∗ − Πx∗ξ

(6) where

  • G1 is the product of k × k matrices

G1 = B−1F ′BF (7)

  • σ(G1) = (I − ΠA)−1Π2

ξ, so the bound is invariant to the choice of basis

vectors of S (i.e., Φ). Notes:

  • Thm. 1 equivalent to

(I − ΠA)−1ΠA(x∗ − Πx∗)ξ ≤ (I − ΠA)−1ΠξAξx∗ − Πx∗ξ

  • Easy to compute, and better than the standard bound I
  • Weaknesses: two over-relaxations; Aξ is required
slide-10
SLIDE 10

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Two Over-Relaxations in Theorem 1

  • 1. Π(x∗ − Πx∗) = 0 is not used.
  • Effect: degrade (to the standard bound I in the contraction case), if S

nearly contains an eigenvector of A associated with the dominant real eigenvalue.

  • For applications in practice: orthogonalization of basis vectors w.r.t. the

eigenspace to obtain sharper bounds

  • 2. When ΠA is near zero, the bound cannot fully utilize this fact.
  • This is due to the splitting of Π and A in bounding (I − ΠA)−1ΠA:
  • Thm. 1

⇔ ΠA + ΠA(I − ΠA)−1ΠAξ ≤ Π + ΠA(I − ΠA)−1ΠξAξ

  • Effect: when ΠA is near zero but Aξ = 1, σ(G1) ≈ Π2

ξ = 1, and the

bound tends to √ 2 instead of 1. Apply the lemmas in a different way to sharpen the bound = ⇒ the second bound

slide-11
SLIDE 11

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Main Results: Second Bound

Use the fact Π(x∗ − Πx∗) = 0, ‚ ‚ ‚(I − ΠA)−1ΠA(x∗ − Πx∗) ‚ ‚ ‚

ξ =

‚ ‚ ‚(I − ΠA)−1ΠA(I − Π)(x∗ − Πx∗) ‚ ‚ ‚

ξ

≤ ‚ ‚ ‚(I − ΠA)−1ΠA(I − Π) ‚ ‚ ‚

ξ x∗ − Πx∗ξ

Relate the norm of the matrix to the spectral radius of a k × k matrix: ‚ ‚ ‚(I − ΠA)−1ΠA(I − Π) ‚ ‚ ‚

2 ξ Lemma 1

= ‚ ‚ ‚ ΦFB−1 | {z }

H

Φ′ΞA(I − Π) | {z }

D

‚ ‚ ‚

2 ξ Lemma 2

= σ ` (H′ΞH)(DΞ−1D′) ´ Notes:

  • Incorporating the matrix I − Π is crucial for improving the bound.
  • Aξ is no longer needed.
slide-12
SLIDE 12

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Main Results: Second Bound

Theorem 2

x∗ − ¯ xξ ≤ p 1 + σ(G2) x∗ − Πx∗ξ (8) where

  • G2 is the product of k × k matrices

G2 = B−1F ′BFB−1(R − MB−1M′) , R = Φ′ΞAΞ−1A′ΞΦ , (9)

  • σ(G2) = (I − ΠA)−1ΠA(I − Π)2

ξ, so the bound is invariant to the choice

  • f basis vectors of S (i.e., Φ).

Proposition 1 (Comparison with the Standard Bound II)

Assume that ΠAξ ≤ α < 1. Then, the error bound (8) is always no worse than the standard bound II, i.e., 1 + σ(G2) ≤ 1/(1 − α2). Notes:

  • The bound is tight in the worst case sense.
  • Estimating R by simulation is less straightforward than estimating B and

M; it is doable, except for TD(λ) with λ > 0.

slide-13
SLIDE 13

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

MDP Applications and Numerical Comparisons of Bounds

Cost function approximation for MDP with TD(λ):

  • A is defined for a pair of values (α, λ) by

A = P(α,λ) def = (1 − λ)

X

ℓ=0

λℓ(αP)ℓ+1 discounted cases: α ∈ [0, 1), λ ∈ [0, 1] undiscounted cases: α = 1, λ ∈ [0, 1) Choices of the projection norm:

  • W/o exploration: ξ = invariant distribution of P; ΠA contraction
  • W/ exploration: ξ determined by policies/simulations that enhance

exploration; ΠA may or may not be contraction (λ needs to be chosen properly; LSTD(0) always safe to apply) On applying Thm. 1:

  • e = [1, 1, . . . , 1]′: an eigenvector of A associated with the dominant

eigenvalue (1−λ)α

1−α .

  • To obtain a sharper bound, orthogonalize the basis vectors w.r.t. e (i.e.,

project them on e⊥ – easy to do online).

slide-14
SLIDE 14

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Practical Ways of Applying Theorem 1

To Overcome the Eigenspace Related Over-relaxation

V V ⊕ W W ΠWx∗ ΠVx∗ Cone specified by B(A, ξ, V ⊕ W) Cone specified by B(A, ξ, W) x∗ ΠV⊕Wx∗ Approximation ¯ x

  • Form the eq. satisfied by x∗ − ΠVx∗ and solve its proj. eq. on W

When V is an eigenspace of A, this is the same eq. as the original proj. eq. for x∗, and ΠV x∗ is not needed if this quantity is unimportant.

  • Can replace ΠVx∗ with any vector in V (a guess of ΠVx∗).
slide-15
SLIDE 15

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Standard Bounds vs. Theorems 1 & 2 / Discounted

Markov chain: 200 states; k = 50; ξ: invariant distribution of P

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 90 100 α=0.99, λ ∈ [0,1] standard I standard II

  • Thm. 1, S=S1: e ¬⊥ S1
  • Thm. 1, S=S2: e ⊥ S2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 α=0.99, λ ∈ [0,1] standard II

  • Thm. 2, S=S1: e ¬⊥ S1
  • Thm. 1, S=S2: e ⊥ S2
  • Thm. 2, S=S2: e ⊥ S2

Bound λ Bound λ

STD I

STD II

  • Thm. 1

  • Thm. 1/⊥

STD II

  • Thm. 1/⊥

  • Thm. 2

  • Thm. 2/⊥
slide-16
SLIDE 16

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Standard Bounds vs. Theorems 1 & 2 / Exploration Case

Markov chain: 200 states; k = 50; ξ: uniform

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 40 60 80 100 120 140 160 180 α=0.99, ||αP||=0.995, λ ∈ [0,1] standard I standard II

  • Thm. 1, S=S1: e ¬⊥ S1
  • Thm. 1, S=S2: e ⊥ S2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 α=0.99, ||αP||=0.995, λ ∈ [0,1] standard II

  • Thm. 2, S=S1: e ¬⊥ S1
  • Thm. 1, S=S2: e ⊥ S2
  • Thm. 2, S=S2: e ⊥ S2

Bound λ Bound λ

  • Thm. 1

  • Thm. 1/⊥

STD II

  • Thm. 1/⊥

  • Thm. 2

  • Thm. 2/⊥
  • In general, ΠA is not necessarily a contraction.

need to choose λ properly; TD(0) can always be safely applied.

  • The first bound needs A, so do the standard bounds for the

contraction case.

slide-17
SLIDE 17

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Theorem 1 vs. Theorem 2 / Average Cost

Markov chains: 200 states; k = 50; ξ: invariant distribution of P On the right: states of the Markov chain form two “tight clusters.”

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 1 1.2 1.4 1.6 1.8 2 α=1, λ ∈ [0,1)

  • Thm. 1, e ⊥ S
  • Thm. 2, e ⊥ S

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 α=1, λ ∈ [0,1)

  • Thm. 1, e ⊥ S
  • Thm. 2, e ⊥ S

Bound λ Bound λ

  • Thm. 1/⊥

  • Thm. 2/⊥

  • Thm. 1/⊥

  • Thm. 2/⊥
  • The standard bound II in this case is qualitative:

x∗ − ¯ xξ ≤ 1 q 1 − α2

λ

x∗ − Πx∗ξ , where αλ < 1 and αλ → 0 as λ → 1.

slide-18
SLIDE 18

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

Discussion

New error bounds:

  • Data dependent, w/o contraction assumptions
  • Computable by simulation and low order calculations with small size

matrices

  • Sharper than the standard bounds (which are available only for the

contraction case)

  • Depend on A but not b (so they are valid for the worst case of b)
  • Potential use in the MDP context:
  • Provide error bound for exploration policies
  • Aid in choosing the value of λ in TD
  • Aid in basis function evaluation and selection