Stochastic approximation for speeding up LSTD/LSPI (and least - PowerPoint PPT Presentation

Stochastic approximation for speeding up LSTD/LSPI (and least squares regression/LinUCB) Prashanth L A † Joint work with Nathaniel Korda ♯ and Rémi Munos † † INRIA Lille - Team SequeL ♯ MLRG - Oxford University November 24, 2014 Prashanth L A Fast LSTD using SA November 24, 2014 1 / 39

Fast LSTD using SA Outline Fast LSTD using SA 1 Fast LSPI using SA 2 Experiments - Traffic Signal Control 3 Extension to Least Squares Regression 4 Experiments - News Recommendation 5 Proof outline 6 Prashanth L A Fast LSTD using SA November 24, 2014 2 / 39

Fast LSTD using SA Background MDP Set of States X , Set of Actions A , Rewards r ( x , a ) � ∞ � � V π ( s ) := E β t r ( s t , π ( s t )) | s 0 = s Value function t = 0 � T π ( V )( s ) := r ( s , π ( s )) + β p ( s , π ( s ) , s ′ ) V ( s ′ ) Bellman Operator s ′ Prashanth L A Fast LSTD using SA November 24, 2014 3 / 39

Fast LSTD using SA TD with Function Approximation Linear Function Approximation. T φ ( s ) V π ( s ) ≈ θ Parameter θ ∈ R d Feature φ ( s ) ∈ R d TD Fixed Point Φ θ = Π T π (Φ θ ) Feature Matrix Orthogonal Projection to B = { Φ θ | θ ∈ R d } with rows φ ( s ) T , ∀ s ∈ S Prashanth L A Fast LSTD using SA November 24, 2014 4 / 39

Fast LSTD using SA LSTD - A Batch Algorithm Given dataset D := { ( s i , r i , s ′ i ) , i = 1 , . . . , T ) } LSTD approximates the TD fixed point by ˆ θ T = ¯ A − 1 T ¯ O ( d 2 T ) Complexity b T , T A T = 1 � where ¯ φ ( s i )( φ ( s i ) − βφ ( s ′ i )) T T i = 1 T b T = 1 � ¯ r i φ ( s i ) . T i = 1 Prashanth L A Fast LSTD using SA November 24, 2014 5 / 39

Fast LSTD using SA Complexity of LSTD [1] Policy Evaluation Policy π Q-value Q π Policy Improvement Figure : LSPI - a batch-mode RL algorithm for control LSTD Complexity O ( d 2 T ) using the Sherman-Morrison lemma or O ( d 2 . 807 ) using the Strassen algorithm or O ( d 2 . 375 ) the Coppersmith-Winograd algorithm Prashanth L A Fast LSTD using SA November 24, 2014 6 / 39

Fast LSTD using SA Complexity of LSTD [2] Problem Practical applications involve high-dimensional features (e.g. Computer-Go: d ∼ 10 6 ) ⇒ solving LSTD is computationally intensive Related works: GTD 1 , GTD2 2 , iLSTD 3 Solution Use stochastic approximation (SA) Complexity O ( dT ) ⇒ O ( d ) reduction in complexity Theory SA variant of LSTD does not impact overall rate of convergence Experiments On traffic control application, performance of SA-based LSTD is comparable to LSTD, while gaining in runtime! 1Sutton et al. (2009) A convergent O(n) algorithm for off-policy temporal difference learning. In: NIPS 2Sutton et al. (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: ICML 3Geramifard A et al. (2007) iLSTD: Eligibility traces and convergence analysis. In: NIPS Prashanth L A Fast LSTD using SA November 24, 2014 7 / 39

Fast LSTD using SA Fast LSTD using Stochastic Approximation Update θ n Pick i n uniformly θ n + 1 θ n using ( s i n , r i n , s ′ in { 1 , . . . , T } i n ) Random Sampling SA Update Update rule: � � n − 1 φ ( s ′ r i n + βθ T i n ) − θ T θ n = θ n − 1 + γ n n − 1 φ ( s i n ) φ ( s i n ) Step-sizes Fixed-point iteration Complexity: O ( d ) per iteration Prashanth L A Fast LSTD using SA November 24, 2014 8 / 39

Fast LSTD using SA Assumptions Setting: Given dataset D := { ( s i , r i , s ′ i ) , i = 1 , . . . , T ) } Bounded features (A1) � φ ( s i ) � 2 ≤ 1 Bounded rewards (A2) | r i | ≤ R max < ∞ � � T 1 � Co-variance matrix (A3) λ min φ ( s i ) φ ( s i ) T ≥ µ . has a min-eigenvalue T i = 1 Prashanth L A Fast LSTD using SA November 24, 2014 9 / 39

Fast LSTD using SA Convergence Rate Step-size choice γ n = ( 1 − β ) c 2 ( c + n ) , with ( 1 − β ) 2 µ c ∈ ( 1 . 33 , 2 ) Bound in expectation � � K 1 � θ n − ˆ � � √ n + c E θ T 2 ≤ � High-probability bound �� K 2 � θ n − ˆ � � θ T 2 ≤ √ n + c ≥ 1 − δ, P � By iterate-averaging, the dependency of c on µ can be removed Prashanth L A Fast LSTD using SA November 24, 2014 10 / 39

Stochastic approximation for speeding up LSTD/LSPI (and least - PowerPoint PPT Presentation

Stochastic approximation for speeding up LSTD/LSPI (and least squares regression/LinUCB) Prashanth L A Joint work with Nathaniel Korda and Rmi Munos INRIA Lille - Team SequeL MLRG - Oxford University November 24, 2014

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

6. Approximation and fitting norm approximation least-norm problems regularized

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

Multi-level stochastic approximation algorithms Noufel Frikha Universit e Paris Diderot, LPMA

Bridging the gap between Stochastic Approximation and Markov chains Aymeric DIEULEVEUT ENS

Stochastic approximation-based algorithms, when the Monte Carlo bias does not vanish Gersende

Stochastic Approximation in Hilbert Spaces Aymeric DIEULEVEUT Supervised by Francis BACH

Some References P. Carpentier Master MMMEF Cours MNOS 2014-2015 263 / 263 Stochastic

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Speeding up by using ISM-like calls Junji NAKANO (The Institute of Statistical Mathematics, Japan)

Speeding up Permutation Testing Vamsi Ithapu http://pages.cs.wisc.edu/~vamsi/pt_fast November

Wheeler Road Virtual Community Meeting Summer 2020 Station 1 Speeding 2 Station 1

Speeding up query execution in PostgreSQL using LLVM JIT compiler Dmitry Melnik dm@ispras.ru

Speeding up target-language driven part-of-speech tagger training for machine translation Felipe

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

Drawing Parallels between Multi-label Classification and Multi-target Regression Grigorios

Distribution-Free Uncertainty Quantification for Kernel Methods by Gradient Perturbations Bal

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and

Machine Learning Techniques Alejandro Bellogn, Ivn Cantador, Pablo Castells, lvaro Ortigosa

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

The Player Kernel Lucas Maystre , Victor Kristof, Antonio Gonzlez Ferrer, Matthias Grossglauser

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze

Safe Policy Improvement with Baseline Bootstrapping Romain Laroche, Paul Trichelair, R emi

Stochastic approximation for speeding up LSTD/LSPI (and least - PowerPoint PPT Presentation

Stochastic approximation for speeding up LSTD/LSPI (and least squares regression/LinUCB) Prashanth L A Joint work with Nathaniel Korda and Rmi Munos INRIA Lille - Team SequeL MLRG - Oxford University November 24, 2014

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

6. Approximation and fitting norm approximation least-norm problems regularized

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

Multi-level stochastic approximation algorithms Noufel Frikha Universit e Paris Diderot, LPMA

Bridging the gap between Stochastic Approximation and Markov chains Aymeric DIEULEVEUT ENS

Stochastic approximation-based algorithms, when the Monte Carlo bias does not vanish Gersende

Stochastic Approximation in Hilbert Spaces Aymeric DIEULEVEUT Supervised by Francis BACH

Some References P. Carpentier Master MMMEF Cours MNOS 2014-2015 263 / 263 Stochastic

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Speeding up by using ISM-like calls Junji NAKANO (The Institute of Statistical Mathematics, Japan)

Speeding up Permutation Testing Vamsi Ithapu http://pages.cs.wisc.edu/~vamsi/pt_fast November

Wheeler Road Virtual Community Meeting Summer 2020 Station 1 Speeding 2 Station 1

Speeding up query execution in PostgreSQL using LLVM JIT compiler Dmitry Melnik dm@ispras.ru

Speeding up target-language driven part-of-speech tagger training for machine translation Felipe

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

Drawing Parallels between Multi-label Classification and Multi-target Regression Grigorios

Distribution-Free Uncertainty Quantification for Kernel Methods by Gradient Perturbations Bal

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and

Machine Learning Techniques Alejandro Bellogn, Ivn Cantador, Pablo Castells, lvaro Ortigosa

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

The Player Kernel Lucas Maystre , Victor Kristof, Antonio Gonzlez Ferrer, Matthias Grossglauser

Vandalism Detection on Wikipedia The class imbalance problem &amp; new approaches Paul Gtze

Safe Policy Improvement with Baseline Bootstrapping Romain Laroche, Paul Trichelair, R emi

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze