Learning to Control an Octopus Arm with Gaussian Process Temporal - - PowerPoint PPT Presentation

learning to control an octopus arm with gaussian process
SMART_READER_LITE
LIVE PREVIEW

Learning to Control an Octopus Arm with Gaussian Process Temporal - - PowerPoint PPT Presentation

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Collaborators: Peter Szabo and Dmitry Volkinshtein Bayesian RL Tutorial 2/16 The Octopus Arm Can bend and twist at any point Can do this in


slide-1
SLIDE 1

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Yaakov Engel

Collaborators: Peter Szabo and Dmitry Volkinshtein

slide-2
SLIDE 2

Bayesian RL Tutorial 2/16

slide-3
SLIDE 3

The Octopus Arm

Can bend and twist at any point Can do this in any direction Can be elongated and shortened Can change cross section Can grab using any part of the arm Virtually infinitely many DOF

Bayesian RL Tutorial 3/16

slide-4
SLIDE 4

Octopus Arm Anatomy 101

Bayesian RL Tutorial 4/16

slide-5
SLIDE 5

Our Arm Model

C

N

✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞

C

1

✟✠ ✡☛ ☞✌ ✍✎ ✏✑ ✒✓ ✔✕ ✖✗ ✘✙ ✚✛ ✜✢ ✣✤ ✥✦ ✧★ ✩✪ ✫✬ ✭✮ ✯✰ ✱ ✱ ✱ ✱ ✱ ✱ ✲ ✲ ✲ ✲ ✲ ✲ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✴ ✴ ✴ ✴ ✴ ✴ ✴ ✴ ✵ ✵ ✵ ✵ ✵ ✵ ✶ ✶ ✶ ✶ ✶ ✶ ✷ ✷ ✷ ✷ ✷ ✷ ✸ ✸ ✸ ✸ ✸ ✸

ventral side dorsal side pair #1 pair #N+1 longitudinal muscle longitudinal muscle transverse muscle transverse muscle arm tip arm base Bayesian RL Tutorial 5/16

slide-6
SLIDE 6

The Muscle Model

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁

f(a) = (k0 + a(kmax − k0)) (ℓ − ℓ0) + β dℓ dt a ∈ [0, 1]

Bayesian RL Tutorial 6/16

slide-7
SLIDE 7

Other Forces

  • Gravity
  • Buoyancy
  • Water drag
  • Internal pressures (maintain constant compartmental volume)

Dimensionality

10 compartments ⇒ 22 point masses × (x, y, ˙ x, ˙ y) = 88 state variables

Bayesian RL Tutorial 7/16

slide-8
SLIDE 8

The Control Problem

Starting from a random position, bring {any part, tip} of arm into contact with a goal region, optimally. Optimality criteria: Time, energy, obstacle avoidance Constraint: We only have access to sampled trajectories Our approach: Define problem as a MDP Apply Reinforcement Learning algorithms

Bayesian RL Tutorial 8/16

slide-9
SLIDE 9

The Task

−0.1 −0.05 0.05 0.1 0.15 −0.1 −0.05 0.05 0.1 0.15

t = 1.38

Bayesian RL Tutorial 9/16

slide-10
SLIDE 10

Actions

Each action specifies a set of fixed activations –

  • ne for each muscle in the arm.

Action # 1 Action # 2 Action # 3 Action # 4 Action # 5 Action # 6

Base rotation adds duplicates of actions 1,2,4 and 5 with positive and negative torques applied to the base.

Bayesian RL Tutorial 10/16

slide-11
SLIDE 11

Rewards

Deterministic rewards: +10 for a goal state, Large negative value for obstacle hitting,

  • 1 otherwise.

Energy economy: A constant multiple of the energy expended by the muscles in each action interval was deducted from the reward.

Bayesian RL Tutorial 11/16

slide-12
SLIDE 12

Now, to the Movies...

Bayesian RL Tutorial 12/16

slide-13
SLIDE 13

Fixed Base Task I

Bayesian RL Tutorial 13/16

slide-14
SLIDE 14

Fixed Base Task II

Bayesian RL Tutorial 14/16

slide-15
SLIDE 15

Rotating Base Task I

Bayesian RL Tutorial 15/16

slide-16
SLIDE 16

Rotating Base Task II

Bayesian RL Tutorial 16/16