F.L. Lewis National Academy of Inventors Moncrief-ODonnell Chair, - PowerPoint PPT Presentation

F.L. Lewis National Academy of Inventors Moncrief-O’Donnell Chair, UTA Research Institute (UTARI) The University of Texas at Arlington, USA and Qian Ren Consulting Professor, State Key Laboratory of Synthetical Automation for Process Industries Northeastern University, Shenyang, China New Developments in Integral Reinforcement Learning: Continuous-time Optimal Control and Games Supported by : Supported by : ONR China NNSF US NSF China Project 111 Talk available online at http://www.UTA.edu/UTARI/acs

Invited by Manfred Morari Konstantinos Gatsis Pramod Khargonekar George Pappas

New Research Results Integral Reinforcement Learning for Online Optimal Control IRL for Online Solution of Multi-player Games Multi ‐ Player Games on Communication Graphs Off ‐ Policy Learning Experience Replay Bio-inspired Multi-Actor Critics Output Synchronization of Heterogeneous MAS Applications to: Microgrid Robotics Industry Process Control

Optimality and Games Optimal Control is Effective for: Aircraft Autopilots Vehicle engine control Aerospace Vehicles Ship Control Industrial Process Control Multi-player Games Occur in: Networked Systems Bandwidth Assignment Economics Control Theory disturbance rejection Team games International politics Sports strategy But, optimal control and game solutions are found by Offline solution of Matrix Design equations A full dynamical model of the system is needed

Optimal Control- The Linear Quadratic Regulator (LQR)     u Ru d  T T ( ( )) ( ) V x t x Qx User prescribed optimization criterion t ( , ) Q R      T 1 T 0 Off-line Design Loop PA A P Q PBR B P Using ARE   1 T K R B P Control u x System   On-line real-time K  x Ax Bu Control Loop An Offline Design Procedure that requires Knowledge of system dynamics model (A,B) System modeling is expensive, time consuming, and inaccurate

Adaptive Control is online and works for unknown systems. Generally not Optimal Optimal Control is off-line, and needs to know the system dynamics to solve design eqs. We want to find optimal control solutions Online in real-time Using adaptive control techniques Without knowing the full dynamics For nonlinear systems and general performance indices Bring together Optimal Control and Adaptive Control Reinforcement Learning turns out to be the key to this!

Books F.L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control , third edition, John Wiley and Sons, New York, 2012. New Chapters on: Reinforcement Learning Differential Games D. Vrabie, K. Vamvoudakis, and F.L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , IET Press, 2012.

F.L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits & Systems Magazine, Invited Feature Article, pp. 32-50, Third Quarter 2009. IEEE Control Systems Magazine, F. Lewis, D. Vrabie, and K. Vamvoudakis, “Reinforcement learning and feedback Control,” Dec. 2012

Multi ‐ player Game Solutions IEEE Control Systems Magazine, Dec 2017

( , , , ) RL for Markov Decision Processes X U P R X = states, U = controls P= Probability of going to state x’ from state x given that the control is u R= Expected reward on going to state x’ from state x given that the control is u  ( , ) x u Expected Value of a policy  k T         i k ( ) { | } { | } V x E J x x E r x x   , k k T k i k  i k Optimal control problem  to minimize the expected future cost determine a policy ( , ) x u Discrete State  k T         * ( , ) i k arg min ( ) arg min { | }. optimal policy x u V s E r x x  k i k     i k k T        * ( ) i k min ( ) min { | }. V x V x E r x x optimal value  k k i k    i k Policy Iteration         u u ( ) ( , ) ( ')  V x x u P R V x   Policy evaluation by Bellman eq. . for all x X ' ' j j xx xx j '  u x       Policy Improvement u u  ( , ) argmin ( ') x u P R V x . for all x X    1 ' ' j xx xx j u ' x Policy Evaluation equation is a system of N simultaneous linear equations, one for each state.    ' ( ) Policy Improvement makes ( ) V x V x R.S. Sutton and A.G. Barto, Reinforcement Learning– An Introduction, MIT Press, Cambridge, Massachusetts, 1998. D.P. Bertsekas and J. N. Tsitsiklis, Neuro ‐ Dynamic Programming, Athena Scientific, MA, 1996. W.B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley, New York, 2009.

RL ADP has been developed for Discrete-Time Systems   ( , ) x f x u Discrete ‐ Time System Hamiltonian Function 1 k k k      ( , ( ), ) ( , ( )) ( ) ( ) H x V x h r x h x V x V x  1 k k k k h k h k  Directly leads to temporal difference techniques  System dynamics does not occur  Two occurrences of value allow APPROXIMATE DYNAMIC PROGRAMMING methods   ( , ) x f x u Continuous ‐ Time System Hamiltonian Function   T  T     V V V         ( , , ) ( , )   ( , )   ( , ) ( , ) H x u V r x u x r x u f x u r x u        x x x Leads to off ‐ line solutions if system dynamics is known Hard to do on ‐ line learning  How to define temporal difference?  System dynamics DOES occur  Only ONE occurrence of value gradient How can one do Policy Iteration for Unknown Continuous ‐ Time Systems? What is Value Iteration for Continuous ‐ Time systems? How can one do ADP for CT Systems?

Bertsekas- Neurodynamic Programming Discrete-Time Systems Adaptive (Approximate) Dynamic Programming Four ADP Methods proposed by Paul Werbos Critic NN to approximate: AD Heuristic dynamic programming Heuristic dynamic programming (Watkins Q Learning) Value Iteration ( k ) V x Value ( , ) Q x k u Q function k Dual heuristic programming AD Dual heuristic programming    Q Q V , Gradient Gradients    x u x Action NN to approximate the Control Bertsekas- Neurodynamic Programming Barto & Bradtke- Q-learning proof (Imposed a settling time)

CT Systems ‐ Derivation of Nonlinear Optimal Regulator To find online methods for optimal control Focus on these two equations     ( , ) ( ) ( ) x f x u f x g x u Nonlinear System dynamics        T Cost/value ( ( )) ( , ) ( ( ) ) V x t r x u dt Q x u Ru dt Leibniz gives t t Differential equivalent Bellman Equation, in terms of the Hamiltonian function   T  T     V V V             ( , , ) ( , ) ( , ) ( ) ( ) ( , ) 0 H x u V r x u   x r x u   f x g x u r x u        x x x  H Problem ‐ System dynamics Stationarity condition  0  u shows up in Hamiltonian  V     1 T ( ) ( ) u h x 12 R g x Stationary Control Policy  x T T     * * *  dV dV dV , (0) 0         V HJB equation 1 T 1 0 ( ) f Q x gR g     4     dx dx dx Off ‐ line solution HJB hard to solve. May not have smooth solution. Dynamics must be known

CT Policy Iteration – a Reinforcement Learning Technique  ( ) ( ) u x h x Given any admissible policy The cost is given by solving the CT Bellman equation T     V V    Scalar equation   0 ( , ) ( , ) ( , , ) f x u r x u H x u     x x   T ( , ) ( ) r x u Q x u Ru Utility Policy Iteration Solution • Convergence proved by Leake and Liu 0 ( ) h x Pick stabilizing initial control policy 1967, Saridis 1979 if Lyapunov eq. solved Policy Evaluation ‐ Find cost, Bellman eq. exactly • Beard & Saridis used Galerkin Integrals to  T   V   solve Lyapunov eq. j 0   ( , ( )) ( , ( )) f x h x r x h x  j j • Abu Khalaf & Lewis used NN to approx. V   x  (0) 0 V for nonlinear systems and proved j convergence Policy improvement ‐ Update control  V    Full system dynamics must be known 1 j T 1 ( ) ( ) h x 1 2 R g x   j x Off ‐ line solution Converges to solution of HJB M. Abu-Khalaf, F.L. Lewis, and J. Huang, “Policy iterations on the Hamilton-Jacobi-Isaacs equation for H- T T     * * * dV dV dV infinity state feedback control with input saturation,”         1 T 1 0 ( ) f Q x gR g     IEEE Trans. Automatic Control, vol. 51, no. 12, pp. 4     dx dx dx 1989-1995, Dec. 2006.

F.L. Lewis National Academy of Inventors Moncrief-ODonnell Chair, - PowerPoint PPT Presentation

F.L. Lewis National Academy of Inventors Moncrief-ODonnell Chair, UTA Research Institute (UTARI) The University of Texas at Arlington, USA and Qian Ren Consulting Professor, State Key Laboratory of Synthetical Automation for Process

Lewis and Clark Expedition 1804-1806 The Lewis and Clark Expedition Detail of the mural Lewis

Presenter Don Lewis, Ph.D., Principal, Lewis Consulting email: dlewis@consultlewis.com phone:

Welcome to the Lewis & Clark County Small Acreage Informational Forum USDA Natural Resources

Entropy Theory for Sofic Group Actions Lewis Bowen Workshop on II 1 factors, May 2011 Lewis

transport and metabolism Dr Rohan Lewis rohan.lewis@southampton.ac.uk Why do we need to

Federal Agency Overview Lewis-Burke Associates, LLC May 2018 About Lewis-Burke Twenty-eight

Tribute to a Local Hero Corporal Lewis Kenneth Bausell chs2018 Born April 17, 1924 in Pulaski,

(SUE) James Lewis UK Development Manager James Lewis 22 Years GPR & Mapping

C. S. Lewis His Life and Writings Two (somewhat) Contemporaries of Lewis Holy Week Who do

Lewis-Palmer School District #38 DAAC November 1, 2016 Lewis-Palmer ESS Department LP is doing

disabilities access to justice Felicity Gerry QC Oliver Lewis Email o.lewis@leeds.ac.uk

Root Zone KSK: A fu er ICANN 53 Edward Lewis | ICANN 53 | June2015 edward.lewis@icann.org

Lewis and Clark Elementary Remodel and Addition ( ! PARK ( ! Missoula, MT | 4.10.2018 K

Aircrew and Spacecrew Radiation Exposure The Dangers of Getting High B.J. Lewis B.J. Lewis

Kripke Semantics, C and BL Andrew Lewis-Smith, Paulo Oliva Theory Group EECS QMUL

Building a blue economy Nick Lewis University of Auckland Team: Nick Lewis, Richard Le Heron,

On the Model AC=BD and Trigram" Structures of the Soliton Theory Shou-Fu Tian joint work

Helicity Asymmetry E Measurement for Single 0 Photoproduction with a Frozen Spin Target Hideko

Dark matter & Axions! Javier Redondo (Zaragoza U. & MPP Munich) Overview - Strong CP

Theta Correspondence for Dummies (Correspondance Theta pour les nuls) Jeffrey Adams Dipendra

Introduction to ABC (Approximate Bayesian computation) Richard Wilkinson School of Mathematics

Phase Transitions, Gravitational Waves, and Composite Dark Matter Pedro Schwaller (DESY)

Parameterised Electromagnetic Scattering Solutions for a Range of Incident Wave Directions P.D.

Shortwave solar radiation 1 Calculating equation coefficients Construction Conservation

Sambuz

Useful Links

Newsletter

Mail Us

F.L. Lewis National Academy of Inventors Moncrief-ODonnell Chair, - PowerPoint PPT Presentation

F.L. Lewis National Academy of Inventors Moncrief-ODonnell Chair, UTA Research Institute (UTARI) The University of Texas at Arlington, USA and Qian Ren Consulting Professor, State Key Laboratory of Synthetical Automation for Process

Lewis and Clark Expedition 1804-1806 The Lewis and Clark Expedition Detail of the mural Lewis

Presenter Don Lewis, Ph.D., Principal, Lewis Consulting email: dlewis@consultlewis.com phone:

Welcome to the Lewis &amp; Clark County Small Acreage Informational Forum USDA Natural Resources

Entropy Theory for Sofic Group Actions Lewis Bowen Workshop on II 1 factors, May 2011 Lewis

transport and metabolism Dr Rohan Lewis rohan.lewis@southampton.ac.uk Why do we need to

Federal Agency Overview Lewis-Burke Associates, LLC May 2018 About Lewis-Burke Twenty-eight

Tribute to a Local Hero Corporal Lewis Kenneth Bausell chs2018 Born April 17, 1924 in Pulaski,

(SUE) James Lewis UK Development Manager James Lewis 22 Years GPR &amp; Mapping

C. S. Lewis His Life and Writings Two (somewhat) Contemporaries of Lewis Holy Week Who do

Lewis-Palmer School District #38 DAAC November 1, 2016 Lewis-Palmer ESS Department LP is doing

disabilities access to justice Felicity Gerry QC Oliver Lewis Email o.lewis@leeds.ac.uk

Root Zone KSK: A fu er ICANN 53 Edward Lewis | ICANN 53 | June2015 edward.lewis@icann.org

Lewis and Clark Elementary Remodel and Addition ( ! PARK ( ! Missoula, MT | 4.10.2018 K

Aircrew and Spacecrew Radiation Exposure The Dangers of Getting High B.J. Lewis B.J. Lewis

Kripke Semantics, C and BL Andrew Lewis-Smith, Paulo Oliva Theory Group EECS QMUL

Building a blue economy Nick Lewis University of Auckland Team: Nick Lewis, Richard Le Heron,

On the Model AC=BD and Trigram&quot; Structures of the Soliton Theory Shou-Fu Tian joint work

Helicity Asymmetry E Measurement for Single 0 Photoproduction with a Frozen Spin Target Hideko

Dark matter &amp; Axions! Javier Redondo (Zaragoza U. &amp; MPP Munich) Overview - Strong CP

Theta Correspondence for Dummies (Correspondance Theta pour les nuls) Jeffrey Adams Dipendra

Introduction to ABC (Approximate Bayesian computation) Richard Wilkinson School of Mathematics

Phase Transitions, Gravitational Waves, and Composite Dark Matter Pedro Schwaller (DESY)

Parameterised Electromagnetic Scattering Solutions for a Range of Incident Wave Directions P.D.

Shortwave solar radiation 1 Calculating equation coefficients Construction Conservation

Sambuz

Useful Links

Newsletter

Mail Us

Welcome to the Lewis & Clark County Small Acreage Informational Forum USDA Natural Resources

(SUE) James Lewis UK Development Manager James Lewis 22 Years GPR & Mapping

On the Model AC=BD and Trigram" Structures of the Soliton Theory Shou-Fu Tian joint work

Dark matter & Axions! Javier Redondo (Zaragoza U. & MPP Munich) Overview - Strong CP