Welcome 1 https://www.youtube.com/watch?v=1EpJv34gQ88&t=183s 2 - - PowerPoint PPT Presentation

welcome
SMART_READER_LITE
LIVE PREVIEW

Welcome 1 https://www.youtube.com/watch?v=1EpJv34gQ88&t=183s 2 - - PowerPoint PPT Presentation

Welcome 1 https://www.youtube.com/watch?v=1EpJv34gQ88&t=183s 2 https://www.youtube.com/watch?v=kVmp0uGtShk&t=55s 3 Solving a Rubiks cube with a robotic hand (Learning dexterous manipulations) 4 Outline Why you should care


slide-1
SLIDE 1

1

Welcome

slide-2
SLIDE 2

2

https://www.youtube.com/watch?v=1EpJv34gQ88&t=183s

slide-3
SLIDE 3

3

https://www.youtube.com/watch?v=kVmp0uGtShk&t=55s

slide-4
SLIDE 4

4

Solving a Rubik‘s cube with a robotic hand (Learning dexterous manipulations)

slide-5
SLIDE 5

5

Outline

  • Why you should care
  • How to train your robotic hand
  • Learning dexterous manipulations
slide-6
SLIDE 6

6

Outline

  • Why you should care
  • How to train your robotic hand
  • Learning dexterous manipulations
slide-7
SLIDE 7

7

Why you should care

  • Human hands are awesome
  • Custom robot for every task
  • Learning to use a humanoid hand would give

more freedom

slide-8
SLIDE 8

8

Outline

  • Why you should care
  • How to train your robotic hand
  • Learning dexterous manipulations
slide-9
SLIDE 9

9

How to train your robotic hand

  • Imitation Learning
  • Simulation

Andrychowicz, Marcin, et al. "Learning dexterous in-hand manipulation." arXiv preprint arXiv:1808.00177 (2018)., Figure 3 left https://vcresearch.berkeley.edu/news/berkeley-startup-train-robots-puppets

slide-10
SLIDE 10

10

Simulations

  • Simulate everything
  • Collect a lot of data

for training

  • Train policy in Sim

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 7

slide-11
SLIDE 11

11

Reinforcement learning

  • Learning from mistakes
  • Agenct, action, states and

reward

  • Goal is represented

through a function

https://en.wikipedia.org/wiki/Reinforcement_learning#/media/File:Reinforcement_learning_diagram.svg

slide-12
SLIDE 12

12

Deep Reinforcement learning

  • Combine ANNs and

RF

  • Policy is learned by

ANN

  • Second ANN for state

values

https://en.wikipedia.org/wiki/Artificial_neural_network

slide-13
SLIDE 13

13

Memory

  • Long-short-term-memory

(LSTM)

  • Well suited for clasification

based on time series

– Store important information – Can retrieve it ater arbitrary time

slide-14
SLIDE 14

14

Outline

  • Why you should care
  • How to train your robotic hand
  • Learning dexterous manipulations
slide-15
SLIDE 15

15

Domain Randomizations (DR)

  • Randomize physical properties of sim

environments

  • Hand-picked randomizations

– Uniform distribution

  • Problem:

– What is important? – Not that robust

slide-16
SLIDE 16

16

Automatic Domain Randomization (ADR)

  • Basic Idea:

– Automatically change

domain randomizations with progress

https://openai.com/blog/solving-rubiks-cube/

slide-17
SLIDE 17

17

Automatic Domain Randomization (ADR)

  • Changes can be made in:

– Cube size – Friction of the hand – Gravity – Brightness – Action delay – Motor backlash

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 2a

slide-18
SLIDE 18

18

Learning dexterous manipulations

  • Using ADR
  • Train for several months (~13 Thausand

years of sim)

  • Two networks during training

– One to predict value function – One for agent policy

slide-19
SLIDE 19

19

Learning dexterous manipulations

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 12

slide-20
SLIDE 20

20

The robotic hand

  • The cage with 3

cameras from different angles

  • Hand with tactile

sensors

  • Used CNN for vision

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 4a

slide-21
SLIDE 21

21

Comparisson

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Table 3

slide-22
SLIDE 22

22

How robust is the outcome?

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 17

slide-23
SLIDE 23

23

Comparisson

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Table 6

npd = nats per dimension, where nat is the natural unit of information

slide-24
SLIDE 24

24

But ...

  • Not a Rubik‘s Cube but Giiker‘s Cube
  • Policy only solved 20% with a ‚fair

scramble‘

  • Other robotic hands can solve rubik‘s

cube faster

  • Solution steps were generated before

Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand.", Figure 13b

slide-25
SLIDE 25

25

Thank you

https://www.youtube.com/watch?v=QyJGXc9WeNo

slide-26
SLIDE 26

26

Questions?

slide-27
SLIDE 27

27

Feedback

slide-28
SLIDE 28

28

Source

  • https://skymind.ai/wiki/deep-reinforcement-learning
  • https://towardsdatascience.com/welcome-to-deep-reinforcement-learning-part-1-dqn-c3cab4d41b6b
  • Akkaya, Ilge, et al. "Solving Rubik's Cube with a Robot Hand."
  • Andrychowicz, Marcin, et al. "Learning dexterous in-hand manipulation." arXiv preprint arXiv:1808.00177 (2018).
  • https://openai.com/blog/solving-rubiks-cube/