Trajectory Optimization (this is a draft, to be updated before - - PowerPoint PPT Presentation

β–Ά
trajectory optimization
SMART_READER_LITE
LIVE PREVIEW

Trajectory Optimization (this is a draft, to be updated before - - PowerPoint PPT Presentation

Trajectory Optimization (this is a draft, to be updated before lecture) McGill COMP 765 Oct 5 th , 2017 Recall: LQR Provided a globally optimal control solution in a single pass, under fairly restrictive assumptions As with the KF/EKF,


slide-1
SLIDE 1

Trajectory Optimization (this is a draft, to be updated before lecture)

McGill COMP 765 Oct 5th, 2017

slide-2
SLIDE 2

Recall: LQR

  • Provided a globally optimal control solution in a single pass, under

fairly restrictive assumptions

  • As with the KF/EKF, no practical robots meet these assumptions, but

we can still make use of the math through clever approximations

  • We will consider several of these approaches today
slide-3
SLIDE 3

Non-linear Extensions

  • Trajectory optimization: solve locally about a path, which is jointly

improved:

  • Dynamic programming with local linearization
  • Constrained optimization: cost is objective, dynamics are constraints
  • "Direct" methods that search in the space of policies
  • Approximate the value function with learning approaches
slide-4
SLIDE 4

Differential Dynamic Programming

  • A "shooting" local trajectory
  • ptimization method that build upon

LQR ideas

  • Approximate the familiar value

function with a 2nd order approximation:

  • Computed around a reference trajectory

from a forward pass integrating current policy

  • Delta x and u are deviations from that,

where we can assume linearity

slide-5
SLIDE 5

DDP Backwards Pass

  • Solving for optimizing control

relative to current forward "rollout" is called a backwards pass

  • The math follows the LQR pattern:
  • Expand
  • Take derivative w.r.t. u
  • Compute control and best

value

  • Iterate
slide-6
SLIDE 6

DDP analysis

  • What can DDP do?
  • Swing up pendulums from rest (rapidly!)
  • Grasping motions for robot arms
  • Full-body motions for humanoid robots (and animated humans)
  • What are the limitations?
  • No guarantees about global solution quality
  • Sensitive to starting controller
  • Model knowledge is still needed
  • Posted papers for examples of use in recent research:
  • "Probabilistic Differential Dynamic Programming"
  • "Control-Limited Differential Dynamic Programming"
  • "Guided Policy Search" (states that it uses iLQR… what is the difference?)
slide-7
SLIDE 7

What about under-actuation?

  • So far our optimal control formulation includes solutions 𝑣 = ±∞,

this will often be the solution:

  • For example in our time-optimal 2nd order linear actuator problem, just

accelerate to infinite speed and reach in 0 time.

  • While this may be the objective of some cab drivers in Montreal, passengers

may prefer a limited acceleration.

  • Many methods exist to represent control limits:
  • Penalize large control values using the reward function
  • Form hard constraints and solve a constrained optimization
slide-8
SLIDE 8

Constrained optimal control

max

𝑒

𝑠(𝑦𝑒) 𝑑. 𝑒. 𝑦𝑒= 𝑔(π‘¦π‘’βˆ’1, 𝜌 π‘¦π‘’βˆ’1 ) 𝜌 𝑦𝑒 < 𝑑 βˆ€π‘’

  • This can be solved easily for certain classes of reward and constraint
  • Example: sequential quadratic programs used in walking control
slide-9
SLIDE 9

What to do if our models are not precise?

  • Recall: this is not an β€œif” but a β€œwhen”!
  • Model error will quickly invalidate the results of our extensive

computations as they are recursively applied over time

  • Many exciting solutions for this, but two of the most classical are:
  • Computing control policies while β€œknowing what we don’t know”: robust

control

  • Limiting the horizon and re-computing our controls often: model-predictive

control (MPC)