Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - - PDF document

learning to plan with logical
SMART_READER_LITE
LIVE PREVIEW

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - - PDF document

Slide 1 Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln


slide-1
SLIDE 1

Slide 1

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3, Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology 2Columbia University 3MIT Lincoln Laboratory

*Equal contributors

1

Slide 2

2

Many environments have simple rules – for example cooking from a recipe, playing games, driving, and assembly. People are able to learn how to perform tasks like these by observing an expert. When observing an expert, people don’t learn to just mimic the

  • expert. They learn the rules that the

expert is following. This allows a person who has, for example, learned to cook a dish to modify the ingredients they put in the dish or the order in which they add ingredients.

Slide 3

Goals

Learn to plan in an environment with rules

  • 1. Learn the rules in a way that they can be easily interpreted by humans
  • 2. Incorporate the rules into planning so that modifying the rules results in

predictable changes in behavior

3

Our goal is to replicate this ability algorithmically using model-based imitation learning.

slide-2
SLIDE 2

Slide 4

Packing a Lunchbox

Pack a burger or a sandwich; then pack a banana

4

Let’s say you have a robot that has to pack a lunchbox. The rules are that it has to first pack a burger or a sandwich, and then pack the banana.

Slide 5

Goal 1 – Interpretability

Pack a burger or a sandwich; then pack a banana

5

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

GOAL! Rules Finite State Automaton

We can make these rules both useful and interpretable by representing them as a finite state automaton. … We assume that the environment can be factored into a high-level Markov Decision Process which is equivalent to the FSA, and a low-level MDP of the sort usually used in reinforcement learning.

Slide 6

6

Low-level MDP High-level MDP

Pack sandwich or burger; Then pack banana Avoid obstacles

Factoring the Environment

So you can imagine that we have a reinforcement learning robot arm simulator with state x, y, theta, etc, and actions such as torques or commanded positions. We assume that there is also a high- level MDP, which embodies the rules that the robot must follow.

slide-3
SLIDE 3

Slide 7

7

Discrete 2D gridworld Finite state automaton

Representing the Environment

Pack sandwich or burger; Then pack banana Avoid obstacles

Initial State Picked up

  • r

Packed

  • r

Picked up Packed

Slide 8

Goal 2 – Manipulability

Incorporate FSA into planning

8 Initial State Picked up

  • r

Packed

  • r

Picked up Packed

S0 S1 S2 S3 G S0

  • Ø

S0 S1 S2 S3 G T

We want to be able to modify the behavior of the agent in order to make it perform similar but new tasks to the

  • ne it has learned.

We achieve this by incorporating the FSA into a recursive planning step. Since FSAs are graphs, they can be converted into a transition matrix. First it is useful to label each FSA state with a name. And here is the transition matrix of the first FSA state. The columns are associated with features of the environment, and the rows correspond to FSA states. You can see by looking at the graph that the sandwich and the burger cause a transition to state S1, whereas the

  • ther items do not cause a transition

to a new state.

slide-4
SLIDE 4

Slide 9

Learn reward Learn transitions Learn transitions of FSA

9

One VIN for each FSA state

Differentiable Recursive Planning

Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems. 2016.

We use differentiable recursive planning to approximate value iteration and calculate a policy for the

  • agent. The matrix form of the FSA

allows us to embed the FSA as a convolution in the planning step – for more details, come to our poster session.

Slide 10

Experiments - Interpretability

10

Propositions FSA States

S0

  • Ø

S0 S1 S2 S3 G T

This is the learned transition matrix of the first state of the FSA. Columns correspond to propositions,

  • r important features of the

environment. Rows correspond to the other FSA states.

Slide 11

S0

  • Ø

S0 S1 S2 S3 G T

Experiments - Interpretability

11

Picking up the sandwich or the hamburger causes a transition to the next state

slide-5
SLIDE 5

Slide 12

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

12 Initial State Picked up

  • r

Packed

  • r

Picked up Packed

Since we have learned an interpretable model of the rules, we can easily modify the rules to change the behavior of the agent. In terms of the FSA, this means just deleting this edge between the initial state and the next state.

Slide 13

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

13 Initial State Picked up

  • r

Packed

  • r

Picked up Packed

Slide 14

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

14

S0

  • Ø

S0 S1 S2 S3 G T

This is also easy to express using the transition matrix of the FSA; we can change the values in the matrix to change the form of the FSA.

slide-6
SLIDE 6

Slide 15

S0

  • Ø

S0 S1 S2 S3 G T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

15

Slide 16

S0

  • Ø

S0 S1 S2 S3 G T

Experiments – Manipulability

We can modify the FSA so that it will only pick up the burger and not the sandwich.

16

Slide 17

Learning to Plan with Logical Automata

Brandon Araki1*, Kiran Vodrahalli2*, Thomas Leech1,3, Mark Donahue3, Cristian-Ioan Vasile1, Daniela Rus1

1Massachusetts Institute of Technology 2Columbia University 3MIT Lincoln Laboratory

*Equal contributors

17