stochastic optimal control part 4 research issues
play

Stochastic Optimal Control part 4 research issues, robotics - PowerPoint PPT Presentation

Stochastic Optimal Control part 4 research issues, robotics applications Marc Toussaint Machine Learning & Robotics Group TU Berlin mtoussai@cs.tu-berlin.de ICML 2008, Helsinki challenges in stochastic optimal control


  1. Stochastic Optimal Control – part 4 research issues, robotics applications Marc Toussaint Machine Learning & Robotics Group – TU Berlin mtoussai@cs.tu-berlin.de ICML 2008, Helsinki • challenges in stochastic optimal control • probabilistic inference approaches to control • robotics • model learning 1/14

  2. challenges in stochastic optimal control • often said: “scale up” • Efficient Application in Real Systems! → try to extract the fundamental problems 2/14

  3. research issues 1/3: structured state • notion of state (i.e., having one big state space) – curse of dimensionality – real systems are typically decomposed/modular/hierarchical/structured → exploit this! • interesting lines of work – Carlos Guestrin (PhD thesis) – probabilistic inference methods! (in graphical models, belief propagation, etc) – probabilistic inference for computing optimal policies 3/14

  4. research issues 2/3: learning • learning – want to learn models from experience • interesting lines of work – ML for model learning in robotics 4/14

  5. research issues 3/3: integration • integration – complex systems (e.g., robots) collect state information from many different modalities (sensors) – many subsystems (e.g, vision, position, haptics) – delayed/partial information – integration is hard 5/14

  6. probabilistic inference approach • general idea: decision making, motion control and planning can be viewed as a problem of inferring a posterior over unknown variables (actions, control signals, whole trajectories) conditioned on available information (targets, goals, constraints) 6/14

  7. probabilistic inference approach • given some model of the future: a 0 a 1 a 2 π x 0 x 1 x 2 r 0 r 1 r 2 (here a Markov-Decision Process with P ( x 0 ) , P ( x ′ | a, x ) , P ( r | a, x ) given, and the policy π ax = P ( a | x ) unknown) • condition it on something you want to see in the future • compute the posterior over actions/decisions to get there • Toussaint & Storkey (ICML 2006): proof that maximization of expected future rewards → likelihood maximization problem (EM-algorithm) [fwd-bwd video] 7/14

  8. probabilistic inference approach [details: Toussaint, Storkey, ICML 2006] • problem: Find a policy π that maximizes V π = E { � ∞ t =0 γ t r t ; π } γ ∈ [0 , 1] with discount factor Maximizing the likelihood L π = P (ˆ • Theorem: r =1; π ) in the mixture of finite-time MDPs ( P ( T ) = γ T (1 − γ ) ) is equivalent to maximizing V π = E { � ∞ t =0 γ t r t ; π } in the original MDP . • problem of optimal policy → problem of likelihood maximization (EM-algorithm) [demo] 8/14

  9. POMDP application • in POMDPs the agent needs some kind of memory b 0 b 1 b 2 y 0 a 0 y 1 a 1 y 2 a 2 x 0 x 1 x 2 r 0 r 1 r 2 • mazes: T-junctions, halls & corridors (379 locations, 1516 states) (Toussaint, Harmeling, Storkey & 2006) 9/14

  10. POMDP application • UAI paper persented on Friday: Marc Toussaint, Laurent Charlin, Pascal Poupart: Hierarchical POMDP Controller Optimization by Likelihood Maximization N 2 N ′ 2 N 2 N ′ 2 N 2 N ′ 2 N 2 N 1 N 0 SS ′ E 1 N 2 N 1 N 0 S ′ N 1 N ′ 1 N 1 N ′ 1 N 1 N ′ 1 N 2 N ′ 2 N 1 N 0 S ′ E 0 N ′ 2 N 1 N 0 S ′ N 0 N ′ 0 N 0 N ′ 0 N 0 N ′ 0 N ′ 2 N 1 N ′ 1 N 0 S ′ O ′ O ′ N ′ 2 N ′ 1 N 0 S ′ O A O A N ′ 2 N ′ 1 N 0 N ′ 0 S ′ S ′ S ′ S ′ S S S V ∗ | S | , | A | , | O | HSVI2 Best results from [1] ML approach (avg. over 10 runs) Problem nodes t(s) nodes t(s) V V V paint 4, 4, 2 3.28 3 . 29 ± 0 . 04 (1,3) < 1 3.29 (5,3) 0 . 96 ± 0 . 3 3 . 26 ± 0 . 004 shuttle 8, 3, 5 32.7 32 . 9 ± 0 . 8 (1,3) 2 31.87 (5,3) 2 . 81 ± 0 . 2 31 . 6 ± 0 . 5 4x4 maze 16, 4, 2 3.7 3 . 75 ± 0 . 1 (1,2) 30 3.73 (3,3) 2 . 8 ± 0 . 8 3 . 72 ± 8 e − 5 157 . 1 ± 0 6 . 4 ± 0 . 2 151 . 6 ± 2 . 6 chain-of-chains 10, 4, 1 157.1 (3,3) 10 0.0 (10,3) handwashing 84, 7, 12 � 1052 N/A N/A (10,5) 655 ± 2 984 ± 1 − 9 ± 11(2 . 25 ∗ ) cheese-taxi 33, 7, 10 � 5.3 2 . 53 ± 0 . 3 N/A (10,3) 311 ± 14

  11. robotic motion inference application 100 bayes (repeats) bayes (fwd-bwd) gradient (direct) four task variables gradient (spline) 10 (MAP) cost – position of right finger – collision with objects 1 – balance – comfortableness 0.1 0 1 2 3 4 5 6 time (sec) (Toussaint & Goerick, IROS 2007) 11/14

  12. on Asimo • Toussaint, Gienger & Goerick (Humanoids 2007): Optimization of sequential attractor-based movement for compact behavior generation (other technique than inference) Time: 3s Time: 3s Time: 4s Control points: 8 Control points: 4 Control points: 10 Controlled: Both hands position Controlled: Left hand position Controlled: Both hands position and attitude and attitude and attitude 12/14

  13. model learning • Control of a dynamic robot system dynamics: f : x, ˙ x, u �→ ¨ x x ∗ �→ u learning inverse model φ : x, ˙ x, ¨ [learn] [pole] (methods: A. Moore, C. Atkeson, S. Schaal, S. Vijayakumar, et al) 13/14

  14. conclusions core of optimal control DP Bellman LQG HJB RL inference MDPs Value Iteration Path integral likelihood max TD Q-learning posterior trajectories/control Bayesian RL E^3 graphical models - state estimation - sensor processing • exciting potential for Machine Learning methods – structured state, abstraction, learning, integration • integrative view from ML perspective possible 14/14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend