Inverse KKT - Learning Cost functions of Manipulation from - - PowerPoint PPT Presentation

inverse kkt learning cost functions of manipulation from
SMART_READER_LITE
LIVE PREVIEW

Inverse KKT - Learning Cost functions of Manipulation from - - PowerPoint PPT Presentation

Inverse KKT - Learning Cost functions of Manipulation from Demonstration Englert, P., Vien, N. A., & Toussaint, M. IJRR 2017 Presenter: Yu-Siang Wang Outline Problem Statement Contribution Background Methods


slide-1
SLIDE 1

Inverse KKT - Learning Cost functions of Manipulation from Demonstration

Englert, P., Vien, N. A., & Toussaint, M. IJRR 2017 Presenter: Yu-Siang Wang

slide-2
SLIDE 2

Outline

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-3
SLIDE 3

Problem Statement

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-4
SLIDE 4

Problem Statement

Learn the cost(reward) function from Demonstration → Inverse Optimal Control

slide-5
SLIDE 5

Contribution

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-6
SLIDE 6

Contribution

  • Learn the cost function (Inverse Optimal Control) with the KKT condition for

the constrained motion optimization

  • A formulation of square hand-crafted features as cost function and a

formulation of kernel method

  • These two methods can be reduced as a constrained quadratic optimization

problem and easily solved with the existing quadratic solver

slide-7
SLIDE 7

Contribution

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-8
SLIDE 8

Background - Optimization

Objective function

slide-9
SLIDE 9

Background - Optimization

s.t.

Objective function Constraint

slide-10
SLIDE 10

Background - Optimization - Lagrangian Multiplier

s.t.

Objective function Constraint Lagrangian function

slide-11
SLIDE 11

Background - Optimization - Lagrangian Multiplier

s.t.

Objective function Constraint Lagrangian function

slide-12
SLIDE 12

Background - Optimization

s.t.

Objective function Constraint

slide-13
SLIDE 13

Ref: Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

slide-14
SLIDE 14

Ref: Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725

slide-15
SLIDE 15

Background - Optimization - KKT

s.t.

Objective function Constraint Lagrangian function First KKT condition

slide-16
SLIDE 16

Background --Task Settings - Features

Cost function: : features. Differences between the forward kinematics mapping and object position (given by y)

  • Transition Features: Smoothness of the motion (sum of squared

acceleration or torques)

  • Position Features: Represent a body position relative to another body
  • Orientation Features: Represent orientation of a body relative to other body
slide-17
SLIDE 17

Background -- Task Settings - weighting vector w

Cost function: : Weighting vector at time t. Given in optimal control. Required to solve in the inverse optimal control scenario

slide-18
SLIDE 18

Background -- Task Settings - constraints

Cost function: Constraint: : The smallest distance difference between the forward kinematics mapping and object position has to be larger than a threshold. [Body orientation or relative positions between robot and an object] : The distance between hand and object that should be exact zero

slide-19
SLIDE 19

Optimal Control and Inverse Optimal Control

slide-20
SLIDE 20

Inverse KKT overview

slide-21
SLIDE 21

Methods

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-22
SLIDE 22

Inverse Optimal Control -- features method

s.t.

Cost function Constraint Goal: Given demonstration x* and y Find the optimal w

slide-23
SLIDE 23

Inverse Optimal Control -- features method

s.t.

Constraint Lagrangian function First KKT condition Cost function

slide-24
SLIDE 24

Inverse Optimal Control -- features method

If we assume the demonstration x* is the optimal demonstration

slide-25
SLIDE 25

Inverse Optimal Control -- features method

If we assume the demonstration x* is the optimal demonstration Just find the w and λ make the equation hold!

slide-26
SLIDE 26

Inverse Optimal Control -- features method

If we assume the demonstration x* is the optimal demonstration Just find the w and λ make the equation hold! Very hard to do it!

slide-27
SLIDE 27

Inverse Optimal Control -- features method

Treat it as a loss function and find the optimal w through the optimization method Loss function: l, D: number of demonstration

slide-28
SLIDE 28

Inverse Optimal Control -- features method

Goal: Find the optimal w. Problem to solve w?

slide-29
SLIDE 29

Inverse Optimal Control -- features method

Two unknown variables here! We don’t know λ! Goal: Find the optimal w. Problem to solve w?

slide-30
SLIDE 30

Inverse Optimal Control -- features method

Two unknown variables here! We don’t know λ! Represent λ with w to be a single variable optimization Goal: Find the optimal w. Problem to solve w?

slide-31
SLIDE 31

Inverse Optimal Control -- features method

Goal: Find the optimal w. : is a function of w and all the other terms are given

slide-32
SLIDE 32

Inverse Optimal Control -- features method

Goal: Find the optimal w. : is a function of w and all the other terms are given s.t.

(Quadratic optimization)

slide-33
SLIDE 33

Inverse Optimal Control -- features method

Goal: Find the optimal w. s.t.

slide-34
SLIDE 34

Inverse Optimal Control -- features method

Goal: Find the optimal w. s.t. Problem?

slide-35
SLIDE 35

Inverse Optimal Control -- features method

Goal: Find the optimal w. s.t. Problem? w can be all zeros!

slide-36
SLIDE 36

Inverse Optimal Control -- features method

Goal: Find the optimal w. Add constraint for w! s.t.

slide-37
SLIDE 37

Inverse Optimal Control -- features method

s.t. Linear Solution Goal: Find the optimal w. Add constraint for w! where A is given (one parameter to multiple task)

slide-38
SLIDE 38

Inverse Optimal Control -- features method

s.t. Nonlinear Solution Goal: Find the optimal w. Add constraint for w!

w is a gaussian distribution function of t. Mean and variance in Gaussian is described by ρ

slide-39
SLIDE 39

Inverse Optimal Control -- features method

Goal: Find the optimal w. : is a function of w and all the other terms are given s.t.

slide-40
SLIDE 40

Method - Kernel Method

Kernel Method: Instead of using hand crafted features, using the features in the kernel space Cost function f:

slide-41
SLIDE 41

Method - Kernel Method

Kernel Method: Instead of using hand crafted features, using the features in the kernel space Cost function f: α: weighting vector k: RBF kernel function : hyperparameters

slide-42
SLIDE 42

Method - Kernel Method

Goal: Solve α Loss function will be optimized

slide-43
SLIDE 43

Method - Kernel Method

Loss function will be optimized Represent loss function with α s.t. Solve α with quadratic solver Goal: Solve α

slide-44
SLIDE 44
  • Experiments & Results
  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-45
SLIDE 45

Experiments -- toy 2d example

Task: Start from green point and and end at blue point. 6 time steps in total and time step 3 and 4 should be in contact with the stick.

slide-46
SLIDE 46

Experiments -- toy 2d example

Task: Start from green point and and end at blue point. 6 time steps in total and time step 3 and 4 should be in contact with the stick. Training Set

slide-47
SLIDE 47

Experiments -- toy 2d example

Task: Start from green point and and end at blue point. 6 time steps in total and time step 3 and 4 should be in contact with the stick. Training Set Testing Set

slide-48
SLIDE 48

Results -- toy 2d example

Ref: Levine and Koltun, Continuous Inverse Optimal Control with Locally Optimal Examples, ICML 2011

Error: sum of absolute difference between the resulting motion with the learned weights w and the reference motion. Constraint violation: Distance to the stick.

slide-49
SLIDE 49

Results -- toy 2d example

Ref: Levine and Koltun, Continuous Inverse Optimal Control with Locally Optimal Examples, ICML 2011

Error: sum of absolute difference between the resulting motion with the learned weights w and the reference motion. Error: Hand-crafted features << Kernel Method

slide-50
SLIDE 50

Results -- toy 2d example

Ref: Levine and Koltun, Continuous Inverse Optimal Control with Locally Optimal Examples, ICML 2011

Constraint violation: Distance to the stick. Constraint Violation Error: IKKT << CIOC

slide-51
SLIDE 51

Experiments -- synthetic dataset

Synthetic dataset: longer time steps (50 time steps) Groundtruth weighting vector w is known (But still requires to learn it)

slide-52
SLIDE 52

Experiments

Synthetic dataset: longer time steps (50 time steps) Three methods

  • Direct param: Each time step learn a parameter
  • RBF param: 30 Gaussian with standard deviation 0.8 and uniformly

distributed in 50 time steps.

  • Nonlinear Gaussian: A single gaussian. The mean and the standard deviation

are parametrized.

slide-53
SLIDE 53

Results

Direct param outperform the other methods

slide-54
SLIDE 54

Experiments

https://www.youtube.com/watch?v=pO6XNiyJqNw

slide-55
SLIDE 55

Results - Sliding Box on a table

slide-56
SLIDE 56

Takeaway

  • Problem Statement
  • Contribution
  • Background
  • Methods
  • Experiments & Results
  • Takeaway
slide-57
SLIDE 57

Takeaway

  • Learn the cost function with the inverse KKT method for constrained motion
  • ptimization
  • The author proposed two methods -- hand crafted features based method and

kernel based method

  • Both of the methods can be solved by existing quadratic solver
slide-58
SLIDE 58

Discussion

  • Handcrafted features works well. What if the task is too difficult and the

handcrafted features are not good enough?

  • Is a good enough cost function?
slide-59
SLIDE 59

Questions

  • The relation between optimal control and inverse optimal control
  • The relation between loss function in inverse optimal control and the cost

function in optimal control

  • What two main methods do they use
  • What’s the KKT first condition