new error bounds for approximations from projected linear
play

New Error Bounds for Approximations from Projected Linear Equations - PowerPoint PPT Presentation

Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary New Error Bounds for Approximations from Projected Linear Equations H. Yu . Bertsekas D. P Department of Computer Science University of


  1. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary New Error Bounds for Approximations from Projected Linear Equations H. Yu ∗ . Bertsekas ∗∗ D. P ∗ Department of Computer Science University of Helsinki ∗∗ Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology European Workshop on Reinforcement Learning, Lille, France, Jun. 30 – Jul. 4, 2008

  2. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Outline Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary

  3. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Projected Equations and TD Type Methods x ∗ : a solution of the linear fixed point equation x = Ax + b ¯ x : the solution of the projected equation x = Π( Ax + b ) Π : weighted Euclidean projection on subspace S ⊂ ℜ n , dim ( S ) << n Assume: I − Π A invertible Example: TD( λ ) for approximate policy evaluation in MDP • Solve a projected form of a multistep Bellman equation; linear function approximation of the cost function • A : a stochastic or substochastic matrix • Π A is usually a contraction Example: large linear systems of equations in general

  4. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Two Standard Error Bounds for the Contraction Case x ∗ − ¯ x : approximation error due to solving projected equation Standard bound I (arbitrary norm): assume � Π A � = α < 1, then 1 � x ∗ − ¯ 1 − α � x ∗ − Π x ∗ � x � ≤ (1) Standard bound II (weighted Euclidean norm � · � ξ , use Pythagorean theorem, much sharper than I): assume � Π A � ξ = α < 1, then 1 � x ∗ − ¯ 1 − α 2 � x ∗ − Π x ∗ � ξ x � ξ ≤ √ (2) • These are upper bounds on the ratios of � x ∗ − ¯ x − Π x ∗ � ξ bias-to-distance: � ¯ x � ξ amplification: � x ∗ − Π x ∗ � ξ � x ∗ − Π x ∗ � ξ • Our bounds will be in a similar form � x ∗ − ¯ x � ξ ≤ B ( A , ξ, S ) � x ∗ − Π x ∗ � ξ , but apply to both contraction and non-contraction cases .

  5. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Illustration of the Form of Bounds x ∗ Cone specified by error bound B ( A , ξ, S ) Approximation ¯ x Π x ∗ S • B ( A , ξ, S ) = 1 x = Π x ∗ ¯ ⇒

  6. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Data-Dependent Error Analysis: Motivations Motivation I: with or without contraction assumptions, x ∗ − ¯ x = ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) (3) How this equality is relaxed in the standard bounds: • Standard bound I: ( I − Π A ) − 1 = I + Π A + (Π A ) 2 + · · · , � (Π A ) m � ≤ α m • Standard bound II: ( I − Π A ) − 1 = I + Π A ( I − Π A ) − 1 � x ∗ − ¯ ξ = � x ∗ − Π x ∗ � 2 ξ + � Π A ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) � 2 x � 2 ξ = � x ∗ − Π x ∗ � 2 ξ + � Π A ( x ∗ − ¯ ξ ≤ � x ∗ − Π x ∗ � 2 ξ + α 2 � x ∗ − ¯ x ) � 2 x � 2 ξ

  7. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Data-Dependent Error Analysis: Motivations Motivation II: ( I − Π A ) − 1 = I + Π A ( I − Π A ) − 1 = I + ( I − Π A ) − 1 Π A (i) Bound the term ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) directly so that α will not be in the denominator (ii) Seek computable bounds with low order calculations involving small size matrices Consider the technical side of (ii): some notation and facts • Φ : n × k matrix, whose columns form a basis of S ; Ξ = diag ( ξ ) • k × k matrices: F = ( I − B − 1 M ) − 1 B = Φ ′ ΞΦ , M = Φ ′ Ξ A Φ , • Π = Φ(Φ ′ ΞΦ) − 1 Φ ′ Ξ = Φ B − 1 Φ ′ Ξ ; the projected equation is equivalent to Φ r = Φ B − 1 ` ´ Mr + Φ ′ Ξ b , r ∈ ℜ k • B and M can be computed easily by simulation.

  8. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Technical Lemmas for New Error Bounds Lemma 1 ( I − Π A ) − 1 = I + ( I − Π A ) − 1 Π A = I + Φ FB − 1 Φ ′ Ξ A . (4) ⇒ F = ( I − B − 1 M ) − 1 exists. Also, I − Π A invertible ⇐ Lemma 2 H and D: n × k and k × n matrix, respectively. Then, ` ´ � HD � 2 ( H ′ Ξ H )( D Ξ − 1 D ′ ) ξ = σ . (5) Apply the lemmas to bound � ( I − Π A ) − 1 ( x ∗ − Π x ∗ ) � ξ : ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) A ( x ∗ − Π x ∗ ) Lemma 1 Φ FB − 1 Φ ′ Ξ First bound: = | {z } |{z} H D Lemma 2 � ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) � 2 ξ � ( x ∗ − Π x ∗ ) � 2 σ ( G 1 ) � A � 2 = ⇒ ≤ ξ ξ where G 1 = ( H ′ Ξ H )( D Ξ − 1 D ′ ) = B − 1 F ′ BF .

  9. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: First Bound Theorem 1 q � x ∗ − ¯ ξ � x ∗ − Π x ∗ � ξ 1 + σ ( G 1 ) � A � 2 x � ξ ≤ (6) where • G 1 is the product of k × k matrices G 1 = B − 1 F ′ BF (7) • σ ( G 1 ) = � ( I − Π A ) − 1 Π � 2 ξ , so the bound is invariant to the choice of basis vectors of S (i.e., Φ ). Notes: • Thm. 1 equivalent to � ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) � ξ ≤ � ( I − Π A ) − 1 Π � ξ � A � ξ � x ∗ − Π x ∗ � ξ • Easy to compute, and better than the standard bound I • Weaknesses: two over-relaxations; � A � ξ is required

  10. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Two Over-Relaxations in Theorem 1 1. Π( x ∗ − Π x ∗ ) = 0 is not used. • Effect: degrade (to the standard bound I in the contraction case), if S nearly contains an eigenvector of A associated with the dominant real eigenvalue. • For applications in practice: orthogonalization of basis vectors w.r.t. the eigenspace to obtain sharper bounds 2. When Π A is near zero, the bound cannot fully utilize this fact. • This is due to the splitting of Π and A in bounding � ( I − Π A ) − 1 Π A � : � Π A + Π A ( I − Π A ) − 1 Π A � ξ ≤ � Π + Π A ( I − Π A ) − 1 Π � ξ � A � ξ Thm. 1 ⇔ • Effect: when Π A is near zero but � A � ξ = 1, σ ( G 1 ) ≈ � Π � 2 ξ = 1, and the √ bound tends to 2 instead of 1. Apply the lemmas in a different way to sharpen the bound = ⇒ the second bound

  11. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: Second Bound Use the fact Π( x ∗ − Π x ∗ ) = 0, ‚ ‚ ‚ ‚ ‚ ( I − Π A ) − 1 Π A ( x ∗ − Π x ∗ ) ‚ ( I − Π A ) − 1 Π A ( I − Π)( x ∗ − Π x ∗ ) ‚ ‚ ‚ ‚ ξ = ‚ ‚ ξ ‚ ‚ ξ � x ∗ − Π x ∗ � ξ ‚ ( I − Π A ) − 1 Π A ( I − Π) ‚ ‚ ≤ ‚ Relate the norm of the matrix to the spectral radius of a k × k matrix: ‚ ‚ ‚ ‚ 2 2 Lemma 1 ‚ ( I − Π A ) − 1 Π A ( I − Π) ‚ Φ FB − 1 Φ ′ Ξ A ( I − Π) ‚ ‚ ‚ ‚ = ‚ ‚ | {z } ξ ξ | {z } H D ` ´ Lemma 2 ( H ′ Ξ H )( D Ξ − 1 D ′ ) = σ Notes: • Incorporating the matrix I − Π is crucial for improving the bound. • � A � ξ is no longer needed.

  12. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary Main Results: Second Bound Theorem 2 � x ∗ − ¯ p 1 + σ ( G 2 ) � x ∗ − Π x ∗ � ξ x � ξ ≤ (8) where • G 2 is the product of k × k matrices G 2 = B − 1 F ′ BFB − 1 ( R − MB − 1 M ′ ) , R = Φ ′ Ξ A Ξ − 1 A ′ ΞΦ , (9) • σ ( G 2 ) = � ( I − Π A ) − 1 Π A ( I − Π) � 2 ξ , so the bound is invariant to the choice of basis vectors of S (i.e., Φ ). Proposition 1 (Comparison with the Standard Bound II) Assume that � Π A � ξ ≤ α < 1 . Then, the error bound (8) is always no worse than the standard bound II, i.e., 1 + σ ( G 2 ) ≤ 1 / ( 1 − α 2 ) . Notes: • The bound is tight in the worst case sense. • Estimating R by simulation is less straightforward than estimating B and M ; it is doable, except for TD( λ ) with λ > 0.

  13. Introduction Data-Dependent Error Analysis Applications and Comparisons of Bounds Summary MDP Applications and Numerical Comparisons of Bounds Cost function approximation for MDP with TD( λ ): • A is defined for a pair of values ( α, λ ) by ∞ X A = P ( α,λ ) def λ ℓ ( α P ) ℓ + 1 = ( 1 − λ ) ℓ = 0 discounted cases: α ∈ [ 0 , 1 ) , λ ∈ [ 0 , 1 ] undiscounted cases: α = 1 , λ ∈ [ 0 , 1 ) Choices of the projection norm: • W/o exploration: ξ = invariant distribution of P ; Π A contraction • W/ exploration: ξ determined by policies/simulations that enhance exploration; Π A may or may not be contraction ( λ needs to be chosen properly; LSTD(0) always safe to apply) On applying Thm. 1: • e = [ 1 , 1 , . . . , 1 ] ′ : an eigenvector of A associated with the dominant eigenvalue ( 1 − λ ) α 1 − α . • To obtain a sharper bound, orthogonalize the basis vectors w.r.t. e (i.e., project them on e ⊥ – easy to do online).

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend