Learning to Control an Octopus Arm with Gaussian Process Temporal - PowerPoint PPT Presentation

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion)

Gaussian Processes in Practice Workshop Why use GPs in RL? • A Bayesian approach to value estimation • Forces us to to make our assumptions explicit • Non-parametric – priors are placed and inference is performed directly in function space (kernels). • Domain knowledge intuitively coded into priors • Provides full posterior, not just point estimates • Efficient, on-line implementation, suitable for large problems 2/23

Gaussian Processes in Practice Workshop Markov Decision Processes X : state space U : action space p : X × X × U → [0 , 1], x t +1 ∼ p ( ·| x t , u t ) R × X × U → [0 , 1], R ( x t , u t ) ∼ q ( ·| x t , u t ) q : I A Stationary policy: U × X → [0 , 1], u t ∼ µ ( ·| x t ) µ : Discounted Return: D µ ( x ) = � ∞ i =0 γ i R ( x i , u i ) | ( x 0 = x ) Value function: V µ ( x ) = E µ [ D µ ( x )] Goal: Find a policy µ ∗ maximizing V µ ( x ) ∀ x ∈ X 3/23

Gaussian Processes in Practice Workshop Bellman’s Equation For a fixed policy µ : � � R ( x , u ) + γV µ ( x ′ ) V µ ( x ) = E x ′ , u | x Optimal value and policy: µ ∗ = argmax V ∗ ( x ) = max V µ ( x ) , V µ ( x ) µ µ How to solve it? - Methods based on Value Iteration (e.g. Q-learning) - Methods based on Policy Iteration (e.g. SARSA, OPI, Actor-Critic) 4/23

Gaussian Processes in Practice Workshop Solution Method Taxonomy RL Algorithms Purely Policy based Value−Function based (Policy Gradient) Value Iteration type Policy Iteration type (Actor−Critic, OPI, SARSA) (Q−Learning) PI methods need a “subroutine” for policy evaluation 5/23

Gaussian Processes in Practice Workshop Gaussian Process Temporal Difference Learning Model Equations: R ( x i ) = V ( x i ) − γV ( x i +1 ) + N ( x i ) Or, in compact form: R t = H t +1 V t +1 + N t   1 0 0 − γ . . .   0 1 0 − γ . . .   H t = .   . . . .   . .     0 0 1 . . . − γ Our (Bayesian) goal: Find the posterior distribution of V ( · ), given a sequence of observed states and rewards. 6/23

Gaussian Processes in Practice Workshop The Posterior General noise covariance: Cov [ N t ] = Σ t Joint distribution:          H t K t H ⊤  0 t + Σ t H t k t ( x )  R t − 1    ∼ N  ,  k t ( x ) ⊤ H ⊤ V ( x ) 0 k ( x , x )   t Invoke Bayes’ Rule: E [ V ( x ) | R t − 1 ] = k t ( x ) ⊤ α t Cov [ V ( x ) , V ( x ′ ) | R t − 1 ] = k ( x , x ′ ) − k t ( x ) ⊤ C t k t ( x ′ ) k t ( x ) = ( k ( x 1 , x ) , . . . , k ( x t , x )) ⊤ 7/23

Gaussian Processes in Practice Workshop 8/23

Gaussian Processes in Practice Workshop The Octopus Arm Can bend and twist at any point Can do this in any direction Can be elongated and shortened Can change cross section Can grab using any part of the arm Virtually infinitely many DOF 9/23

Gaussian Processes in Practice Workshop The Muscular Hydrostat Mechanism A constraint: Muscle fibers can only contract (actively) In vertebrate limbs, two separate muscle groups - agonists and antagonists - are used to control each DOF of every joint, by exerting opposite torques. But the Octopus has no skeleton! Balloon example Muscle tissue is incompressible, therefore, if muscles are arranged such that different muscle groups are interleaved in perpendicular directions in the same region, contraction in one direction will result in extension in at least one of the other directions. This is the Muscular Hydrostat mechanism 10/23

Gaussian Processes in Practice Workshop Octopus Arm Anatomy 101 11/23

✱ ✲ ✳ ✳ ✳ ✳ ✳ ✳ ✲ ✲ ✲ ✲ ✲ ✳ ✱ ✱ ✱ ✱ ✸ ✱ ✯✰ ✭✮ ✫✬ ✩✪ ✳ ✴ ✥✦ ✶ ✷ ✷ ✷ ✷ ✷ ✶ ✶ ✶ ✶ ✶ ✵ ✴ ✵ ✵ ✵ ✵ ✵ ✴ ✴ ✴ ✴ ✴ ✴ ✧★ ✣✤ ✸ ✂ ☎ ✄ ✄ ✄ ✄ ✄ ✄ ✂ ✂ ✂ ✂ ☎ ✂ ✁ ✁ ✁ ✁ � � � � ✸ ✸ ☎ ☎ ✜✢ ✸ ✚✛ ✘✙ ✖✗ ✔✕ ✒✓ ✏✑ ✍✎ ☞✌ ✡☛ ✟✠ ✸ ✆ ✞ ✞ ✞ ✞ ✝ ✝ ✝ ✝ ✆ ✆ ✆ ✷ Gaussian Processes in Practice Workshop Our Arm Model arm tip dorsal side C N pair #N+1 arm base C 1 ventral side pair #1 longitudinal muscle transverse muscle transverse muscle longitudinal muscle 12/23

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ � ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ � ✁ � � � � � � � � � � � � � � � � � � ✁ Gaussian Processes in Practice Workshop The Muscle Model f ( a ) = ( k 0 + a ( k max − k 0 )) ( ℓ − ℓ 0 ) + β dℓ dt a ∈ [0 , 1] 13/23

Gaussian Processes in Practice Workshop Other Forces • Gravity • Buoyancy • Water drag • Internal pressures (maintain constant compartmental volume) Dimensionality 10 compartments ⇒ 22 point masses × ( x, y, ˙ x, ˙ y ) = 88 state variables 14/23

Gaussian Processes in Practice Workshop The Control Problem Starting from a random position, bring { any part, tip } of arm into contact with a goal region, optimally . Optimality criteria: Time, energy, obstacle avoidance Constraint: We only have access to sampled trajectories Our approach: Define problem as a MDP Apply Reinforcement Learning algorithms 15/23

Gaussian Processes in Practice Workshop The Task 0.15 t = 1.38 0.1 0.05 0 −0.05 −0.1 −0.1 −0.05 0 0.05 0.1 0.15 16/23

Gaussian Processes in Practice Workshop Actions Each action specifies a set of fixed activations – one for each muscle in the arm. Action # 1 Action # 2 Action # 3 Action # 4 Action # 5 Action # 6 Base rotation adds duplicates of actions 1,2,4 and 5 with positive and negative torques applied to the base. 17/23

Gaussian Processes in Practice Workshop Rewards Deterministic rewards: +10 for a goal state, Large negative value for obstacle hitting, -1 otherwise. Energy economy: A constant multiple of the energy expended by the muscles in each action interval was deducted from the reward. 18/23

Gaussian Processes in Practice Workshop Fixed Base Task I 19/23

Gaussian Processes in Practice Workshop Fixed Base Task II 20/23

Gaussian Processes in Practice Workshop Rotating Base Task I 21/23

Gaussian Processes in Practice Workshop Rotating Base Task II 22/23

Gaussian Processes in Practice Workshop To Wrap Up • There’s more to GPs than regression and classification • Online sparsification works Challenges • How to use value uncertainty? • What’s a disciplined way to select actions? • What’s the best noise covariance? • More complicated tasks 23/23

Learning to Control an Octopus Arm with Gaussian Process Temporal - PowerPoint PPT Presentation

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Gaussian Processes in Practice Workshop Why use GPs in RL? A Bayesian

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Kinematic Studies of Octopus Movements: 3D Reconstruction and Analysis of Motor Control by Yoram

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

octopus'GPU'cluster'inaugura2on' Scien2fic'Compu2ng'Group' Unil'|'18'fvrier'2016' Welcome'

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

RF Exposure Procedures TCB Workshop April 2016 Laboratory Division Office of Engineering and

Nervous System Slide Core View of the musculature Side view of the Transversus Abdominis

P

Muscle Muscle Muscle Muscle Cytoskeleton Cytoskeleton Cytoskeleton Cytoskeleton II II II

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&M

EMS course hands-on course on electrical muscle stimulation pedro lopes, max pffeifer, michael

Panel Discussion Chronic Dis isease Self-Management Program Do Domin inick Du Duhamel, l,

Out is In! How Outdoor Play Environments Bring Learning Outdoors Presented by: Beth Wise

Learning to Control an Octopus Arm with Gaussian Process Temporal - PowerPoint PPT Presentation

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Gaussian Processes in Practice Workshop Why use GPs in RL? A Bayesian

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Kinematic Studies of Octopus Movements: 3D Reconstruction and Analysis of Motor Control by Yoram

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

octopus'GPU'cluster'inaugura2on' Scien2fic'Compu2ng'Group' Unil'|'18'fvrier'2016' Welcome'

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

RF Exposure Procedures TCB Workshop April 2016 Laboratory Division Office of Engineering and

Nervous System Slide Core View of the musculature Side view of the Transversus Abdominis

P

Muscle Muscle Muscle Muscle Cytoskeleton Cytoskeleton Cytoskeleton Cytoskeleton II II II

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&amp;M

EMS course hands-on course on electrical muscle stimulation pedro lopes, max pffeifer, michael

Panel Discussion Chronic Dis isease Self-Management Program Do Domin inick Du Duhamel, l,

Out is In! How Outdoor Play Environments Bring Learning Outdoors Presented by: Beth Wise

MATH 676 Finite element methods in scientifjc computing Wolfgang Bangerth, T exas A&M