Third-Person Visual Imitation Learning via Decoupled Hierarchical - - PowerPoint PPT Presentation

third person visual imitation learning via decoupled
SMART_READER_LITE
LIVE PREVIEW

Third-Person Visual Imitation Learning via Decoupled Hierarchical - - PowerPoint PPT Presentation

Third-Person Visual Imitation Learning via Decoupled Hierarchical Control Pratyusha Sharma, Deepak Pathak, Abhinav Gupta Problem / Goal Can our robot manipulate a new object given a single human video alone? Why is it hard? Inferring


slide-1
SLIDE 1

Third-Person Visual Imitation Learning via Decoupled Hierarchical Control

Pratyusha Sharma, Deepak Pathak, Abhinav Gupta

slide-2
SLIDE 2

Problem / Goal

Can our robot manipulate a new object given a single human video alone?

slide-3
SLIDE 3

Why is it hard?

  • Inferring useful information in the video
  • Handling domain shift
  • Every major part of the sequence needs to be executed correctly - Ex: For

pouring, it needs to reach the cup before twisting its hand

  • The manipulation is challenging. (6D, novel objects and positioning, no force

feedback)

slide-4
SLIDE 4

Issue

Scenario 1:

Sequentially predict the states of the robot arm Input: Human demonstration + first image of object Output

Issue: Not closed loop. No understanding of how the positions of the

  • bjects placed in front of the robot change with time!
slide-5
SLIDE 5

Issue

Scenario 2:

Sequentially predict the states of the robot arm Input: Human Demo + Robot visual state Output

How do we force it to use task information from Human demonstration alone but condition its action on current observable state?

slide-6
SLIDE 6

We want to build a model that can infer the intent from the Human Demonstration of a task and act in the Robot’s current environment to then accomplish the task.

slide-7
SLIDE 7

Approach

Training Goal Generator
 (high-level) Controller
 (low-level)

at

We decouple the task of Goal Inference from Local Control

slide-8
SLIDE 8

Training and Test Scenarios - Data Availability

Training Test (deployment)

  • Human demo video
  • Robot demo video
  • Robot joint angles
  • Human demo video
  • Current visible image
  • f the table
slide-9
SLIDE 9

Approach - Training

Training Goal Generator
 (high-level)

Goal Generator: Given human demo and present visual state of the robot we hallucinate the next step

slide-10
SLIDE 10

Approach - Training

Training Goal Generator
 (high-level) Controller
 (low-level)

at

Inverse Model : Use the hallucinated prediction with the current visual state to predict the action! Goal Generator: Given human demo and present visual state of the robot we hallucinate the next step

slide-11
SLIDE 11

Train Time:The Goal Generator and Inverse Model are trained separately Test Time: The Goal Generator and Inverse Model are executed alternatingly

slide-12
SLIDE 12

Approach - Test

slide-13
SLIDE 13

Approach - Train Vs Test

slide-14
SLIDE 14

Experiments and Results

We evaluate the models trained as follows:

  • Goal generation model with a perfect inverse model
  • Inverse model with a perfect goal generation model
  • Goal generation model and inverse model in tandem
slide-15
SLIDE 15

Results: Goal generation model with perfect inverse model

slide-16
SLIDE 16

Results: Inverse model with perfect goal generator

GT trajectory Predicted trajectory from GT-images

slide-17
SLIDE 17

Results: Final experiment runs

slide-18
SLIDE 18

Results: Final Experimental Runs : Placing in a box

slide-19
SLIDE 19

Shortcomings:

  • 1. Robot trajectory is shaky: The robot trajectory looks shaky

because of the absence of any temporal knowledge. Though trajectories predicted by inverse models with memory units(LSTM) look far less shaky but the models then over fit to the task

slide-20
SLIDE 20

Thank you!