SLIDE 1 Learning to Control Self-Assembling Morphologies
Generalization via Modularity
Deepak Pathak* Chris Lu* Trevor Darrell Phillip Isola Alyosha Efros
* equal contribution
SLIDE 2
How do we train a robot?
SLIDE 3
SLIDE 4
- Multiple tasks
- Expert demonstrations
- Rewards, labels
- …
SLIDE 5
- Multiple tasks
- Expert demonstrations
- Rewards, labels
- …
- Self-supervision
- Curious exploration
- Learning “common sense”
- …
SLIDE 6
. . .
SLIDE 7
. . .
… even earlier?
SLIDE 8
Single to Multicellular
SLIDE 9
Single to Multicellular competition collaboration
SLIDE 10
Single to Multicellular shared objective competition collaboration
SLIDE 11
Compositionality has been useful in language …
[Andreas et. al. 2016]
SLIDE 12
How to implement compositionality in hardware?
SLIDE 13
Modular Co-evolution of Control and Morphology
SLIDE 14 Modular Co-evolution of Control and Morphology
Cylindrical Limb
SLIDE 15 Modular Co-evolution of Control and Morphology
Cylindrical Limb Configurable Motor Joint
SLIDE 16
Modular Co-evolution of Control and Morphology
SLIDE 17
Modular Co-evolution of Control and Morphology
SLIDE 18 Modular Co-evolution of Control and Morphology
Potential Magnetic Joint
SLIDE 19 Modular Co-evolution of Control and Morphology
Potential Magnetic Joint
SLIDE 20 Modular Co-evolution of Control and Morphology
Potential Magnetic Joint
Acts as single agent upon joining Rewards are shared!
SLIDE 21 Modular Co-evolution of Control and Morphology
Potential Magnetic Joint
- Input = Local Sensory State
- Output = Torques, Link, Unlink
Acts as single agent upon joining Rewards are shared!
SLIDE 22 Modular Co-evolution of Control and Morphology
Potential Magnetic Joint
- Input = Local Sensory State
- Output = Torques, Link, Unlink
Acts as single agent upon joining Rewards are shared!
SLIDE 23
Consider the task of “standing up” …
SLIDE 24
SLIDE 25
How to learn compositional controllers?
SLIDE 26 Idea: Shared policy network across limbs
Node Node Node Node Node Node Node Node Node Node Node Node
Nod in
SLIDE 27 Idea: Shared policy network across limbs
Node Node Node Node Node Node Node Node Node Node Node Node
Nod in
𝜌𝜄
shared policy
input
SLIDE 28
How to adapt when morphology changes?
SLIDE 29
How to adapt when morphology changes?
SLIDE 30
SLIDE 31
Network as reusable LEGO Blocks
SLIDE 32 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
SLIDE 33 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
SLIDE 34 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
same dimension
SLIDE 35 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
SLIDE 36 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
SLIDE 37 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄
SLIDE 38 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄
SLIDE 39 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄
SLIDE 40 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄
cut
SLIDE 41 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄
and paste
cut
SLIDE 42 Network as reusable LEGO Blocks 𝜌𝜄
shared policy
input
message
message input
𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄 𝜌𝜄
and paste
adaptation by conditioning cut
SLIDE 43
Dynamic Graph Networks
SLIDE 44
SLIDE 45
BTW, basically curriculum learning but in hardware
SLIDE 46
How well does it generalize?
SLIDE 47
SLIDE 48
. . .
SLIDE 49
a bit crazy… and totally useless!
. . .
SLIDE 50 Self-Assembling Robots in the Real World
[Mark Yim’s Lab at UPenn] [Daniela Rus's Lab at MIT]
Also: [Modular Snake Robot – Howie Choset’s Lab at CMU]
SLIDE 51
https://people.eecs.berkeley.edu/~pathak/
Thank You!
Poster # 197 …today!! (Multi-agent RL)
code & data at