Value Function Approximation on Non-linear Manifolds for Robot - PowerPoint PPT Presentation

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1 )2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics, University of Edinburgh

2 Maze Problem: Guide Robot to Goal Possible actions Up Left Right Position (x,y) Down reward Goal � Robot knows its position but doesn’t know which direction to go. � We don’t teach the best action to take at each position but give a reward at the goal. � Task : make the robot select the optimal action.

3 Markov Decision Process (MDP) { } � An MDP consists of S A P R , , , { } S s � : set of states, i { } A up, down, left, right � : set of actions, Ρ (s,a, ′ P s ) � : transition probability, R R s a ( , ) � : reward, � An action the robot takes at state is a s π specified by policy . = π a ( s ) � Goal: make the robot learn optimal policy ∗ π

4 Definition of Optimal Policy � Action-value function: ⎛ ⎞ ∞ ∑ ⎜ ⎟ π = γ = = t Q s a E r s s a a ( , ) , ⎜ ⎟ t 0 0 ⎝ ⎠ = t 0 discounted sum of future rewards when π a s taking in and following thereafter ∗ π = Q s a Q s a � Optimal value: ( , ) arg max ( , ) π ∗ ∗ π = s a Q s a ( , ) arg max ( , ) � Optimal policy: a ∗ π ∗ � Q is computed if is given. ∗ Q � Question: How to compute ?

5 Policy Iteration (Sutton & Barto, 1998) � Starting from some initial policy π iterate Steps 1 and 2 until convergence. π Q π s a ( , ) Step 1. Compute for current π Step 2. Update by π π = s Q s a ( ) arg max ( , ) a ∗ π � Policy iteration always converges to Q π s a if in step 1 can be computed. ( , ) Q π � Question: How to compute ? s a ( , ) ⎛ ⎞ ∞ ∑ π = γ = = ⎜ t ⎟ Q s a E r s s a a ( , ) | , t 0 0 ⎝ ⎠ = t 0

6 Bellman Equation Q π s a � ( , ) can be recursively expressed by ∀ , s ∀ a ∑ π π = + γ π Q s a R s a P s a s Q s s ( , ) ( , ) ( , , ' ) ( ' , ( ' )) s ' Q π � s a ( , ) can be computed by solving Bellman equation � Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces S × A high computational cost

7 Least-Squares Policy Iteration (Lagoudakis and Parr, 2003) � Linear architecture: φ s a ( , ) : fixed basis functions i K ∑ ˆ π = i φ w Q s a w s a ( , ) ( , ) : parameters i i = K i : # of basis functions 1 K w � { } is learned so as to optimally approximate = i i 1 Bellman equation in the least-squares sense � # of parameters is only : K << × K S A φ � LSPI works well if we choose appropriate K { } = i i 1 φ K � Question: How to choose ? { } = i i 1

8 Popular Choice: Gaussian Kernel (GK) ⎛− ⎞ Partitions ED s s 2 ( , ) ⎜ ⎟ = c k s ( ) exp ⎜ ⎟ σ 2 ⎝ 2 ⎠ ED : Euclidean distance s c s : Centre state c s � Smooth c � Gaussian tail goes over partitions

9 Approximated Value Function by GK Optimal value function Approximated by GK Log scale 20 randomly located Gaussians � Values around the partitions are not approximated well.

10 Policy Obtained by GK Optimal policy GK-based policy � GK provides an undesired policy around the partition.

11 Aim of This Research � Gaussian tails go over the partition. � Not suited for approximating discontinuous value functions. We propose new Gaussian kernel to overcome this problem.

12 State Space as a Graph � Ordinary Gaussian uses Euclidean distance. ⎛− ⎞ ED s s 2 ( , ) ⎜ ⎟ = k s c ( ) exp ⎜ ⎟ σ 2 ⎝ 2 ⎠ � Euclidean distance does not incorporate state space structure, so tail problems occur. � We represent state space structure by a graph, and use it for defining Gaussian kernels. (Mahadevan, ICML 2005)

13 Geodesic Gaussian Kernels � Natural distance on graph is shortest path. Shortest path � We use shortest path in Gaussian function. Euclidean distance ⎛− ⎞ SP s s 2 ( , ) ⎜ ⎟ = k s c ( ) exp ⎜ ⎟ σ 2 ⎝ ⎠ 2 � We call this kernel geodesic Gaussian. � SP can be efficiently computed by Dijkstra.

14 Example of Kernels s c Ordinary Gaussian Geodesic Gaussian s s c c � Tails do not go across the partition. � Values smoothly decrease along the maze.

15 Comparison of Value Functions Optimal Ordinary Gaussian Geodesic Gaussian � Values near the partition are well approximated. � Discontinuity across the partition is preserved.

16 Comparison of Policies Ordinary Gaussian Geodesic Gaussian � GGKs provide good policies near the partition.

17 Experimental Result Average over 100 runs Geodesic Fraction of optimal states Ordinary Number of kernels � Ordinary Gaussian: tail problem � Geodesic Gaussian: no tail problem

18 Robot Arm Reaching � Task: move the end effector to reach the object 2-DOF robot arm State space 180 Object End Obstacle Joint 2 (degree) effector 0 Joint 2 Joint 1 Reward: -180 -100 0 100 Joint 1 (degree) +1 reach the object 0 otherwise

19 Robot Arm Reaching Ordinary Gaussian Geodesic Gaussian Moves directly towards Successfully avoids the object without the obstacle and avoiding the obstacle. can reach the object.

20 Khepera Robot Navigation � Khepera has 8 IR sensors measuring the distance to obstacles. � Task: explore unknown maze without collision Reward: +1 (forward) -2 (collision) 0 (others) Sensor value: 0 - 1030

21 State Space and Graph Discretize 8D state space by self-organizing map. 2D visualization 1000 800 600 400 200 Partitions 0 −200 −400 −1000 −800 −600 −400 −200 0 200 400 600 800 1000

22 Khepera Robot Navigation Ordinary Gaussian Geodesic Gaussian When facing obstacle, When facing obstacle, goes backward makes a turn (and goes forward again). (and go forward).

23 Experimental Results Average over 30 runs Geodesic Ordinary � Geodesic outperforms ordinary Gaussian.

24 Conclusion � Value function approximation: good basis function needed � Ordinary Gaussian kernel: tail goes over discontinuities � Geodesic Gaussian kernel: smooth along the state space � Through the experiments, we showed geodesic Gaussian is promising in high-dimensional continuous problems!

Value Function Approximation on Non-linear Manifolds for Robot - PowerPoint PPT Presentation

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1 )2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics,

Vector Bundle Valued Differential Forms on Non-Negatively Graded DG Manifolds Luca Vitagliano

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation

Hierarchical Clustering on Special Manifolds Motivation Background Manifolds Angelos Markos 1

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

6. Approximation and fitting norm approximation least-norm problems regularized

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Lecture 5: Value Function Approximation Emma Brunskill CS234 Reinforcement Learning. Winter 2018

Lecture 5: Value Function Approximation Emma Brunskill CS234 Reinforcement Learning. Winter 2020

Function Representation & Spherical Harmonics Function approximation G (x) ... function

Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System

Splines linear non-linearity September 9, 2019 Splines linear non-linearity September 9,

Utilization of ASQ in Web Design Course Brankica Brati c, Vladimir Kurbalija, Vasileios

Manuela Veloso Manuela Veloso

Reinforcement Learning Part 2 CS 760@UW-Madison Goals for the lecture you should understand the

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of

Screening: Survey of Well-being of Young Children (SWYC) Rhonda Burk - Crete Public

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final

10/20/2009 Announcements Introduction to Artificial Intelligence Assignment 2 due next Monday

Maestro Workflow Conductor: A vision for the future of HPC Workflow Computing Expo Francesco Di

Value Function Approximation on Non-linear Manifolds for Robot - PowerPoint PPT Presentation

Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama 1)2) Hirotaka Hachiya 1 )2) Christopher Towell 2) Sethu Vijayakumar 2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics,

Vector Bundle Valued Differential Forms on Non-Negatively Graded DG Manifolds Luca Vitagliano

LOCAL LINEAR APPROXIMATION MATH 200 GOALS Be able to compute the local linear approximation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation

Hierarchical Clustering on Special Manifolds Motivation Background Manifolds Angelos Markos 1

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

6. Approximation and fitting norm approximation least-norm problems regularized

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Lecture 5: Value Function Approximation Emma Brunskill CS234 Reinforcement Learning. Winter 2018

Lecture 5: Value Function Approximation Emma Brunskill CS234 Reinforcement Learning. Winter 2020

Function Representation &amp; Spherical Harmonics Function approximation G (x) ... function

Model-Selection for Non-Parametric Function Approximation: A Case Study in a Smart Energy System

Splines linear non-linearity September 9, 2019 Splines linear non-linearity September 9,

Utilization of ASQ in Web Design Course Brankica Brati c, Vladimir Kurbalija, Vasileios

Manuela Veloso Manuela Veloso

Reinforcement Learning Part 2 CS 760@UW-Madison Goals for the lecture you should understand the

Introduction to Deep Learning M S Ram Dept. of Computer Science &amp; Engg. Indian Institute of

Screening: Survey of Well-being of Young Children (SWYC) Rhonda Burk - Crete Public

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final

10/20/2009 Announcements Introduction to Artificial Intelligence Assignment 2 due next Monday

Maestro Workflow Conductor: A vision for the future of HPC Workflow Computing Expo Francesco Di

Function Representation & Spherical Harmonics Function approximation G (x) ... function

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of