Formalizing Connections Between Motion Planning and Machine - - PowerPoint PPT Presentation

formalizing connections between motion planning and
SMART_READER_LITE
LIVE PREVIEW

Formalizing Connections Between Motion Planning and Machine - - PowerPoint PPT Presentation

Formalizing Connections Between Motion Planning and Machine Learning Siddhartha Srinivasa Boeing Endowed Professor University of Washington 1 Problems I Want You to Solve So I can Retire Siddhartha Srinivasa Retired Boeing Endowed


slide-1
SLIDE 1

Formalizing Connections Between 
 Motion Planning and Machine Learning

Siddhartha Srinivasa Boeing Endowed Professor University of Washington

1
slide-2
SLIDE 2

Problems I Want You
 to Solve So I can Retire

Siddhartha Srinivasa Retired Boeing Endowed Professor University of Washington

2
slide-3
SLIDE 3
slide-4
SLIDE 4

Motion Planning

slide-5
SLIDE 5
slide-6
SLIDE 6 6
slide-7
SLIDE 7 7
slide-8
SLIDE 8 8
slide-9
SLIDE 9

Motion Planning
 is a technology

slide-10
SLIDE 10

10-100X Improvement

slide-11
SLIDE 11

The Piano Movers’ Problem

On the Piano Movers problem. I-III, Schwartz and Sharir,

  • Comm. on Pure and Applied Math., 1983
slide-12
SLIDE 12

Roadmaps

Probabilistic roadmaps for path planning in high-dimensional configuration spaces, Kavraki et al., IEEE TRO, 1996.


Build Roadmap Plan on Roadmap Plan on Roadmap

slide-13
SLIDE 13

A* Search

slide-14
SLIDE 14

A* Search OPTIMAL!!

Is it optimal over something we care about?

slide-15
SLIDE 15

A* Search: A Personal Journey

1

Search for Optimal Solutions: the Heart of Heuristic Search is Still Beating

Ariel Felner ISE Department Ben-Gurion University ISRAEL felner@bgu.ac.il

slide-16
SLIDE 16 16
slide-17
SLIDE 17

A* Search: A Personal Journey

slide-18
SLIDE 18

A* Search: Amoebas!

Bacteria Vectors by Vecteezy

Optimal Substructure

f(a) < f(b) ⟹ f(a ∘ x) < f(b ∘ x)∀x

You will never catch up.

Bellman Condition

f*(a) = min

x∈succ{c(a, x) + f*(b)}

Be best, locally.

slide-19
SLIDE 19

A* Search: Favoritism

Optimism in the 
 Face of Uncertainty (OFU)

min

x∈open g(x) + h(x) Always be optimistic under uncertainty.
 You’ll either be correct,


  • r learn something important if you’re wrong.

R-MAX: A general polynomial time algorithm for near-optimal reinforcement learning, Brafman and Tennenholtz, JMLR, 2002.


slide-20
SLIDE 20

A* Search is Optimal …

Expands the Fewest Number of Vertices

But is this what we
 really want in Motion Planning?

slide-21
SLIDE 21

Edge Evaluation Dominates Planning Time

Edge Evaluations Other

Lazy collision checking in asymptotically-optimal motion planning, Hauser, ICRA 2015.


Amoebas are Cheap Slime is Expensive

slide-22
SLIDE 22

Is there a Search Algorithm
 that Minimizes 
 the Number of Edge Evaluations?

LazySP

ICAPS 2018 [Best Conference Paper Award Winner] First Provably Edge-Optimal A*-like Search Algorithm

The Provable Virtue of Laziness in Motion Planning, Hagtalab et al., ICAPS 2018.


I don’t care about amoebas. What algorithm minimizes slime?

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

LazySP

Greedy Best-first Search over Paths

To find the shortest path,
 eliminate all shorter paths!

slide-28
SLIDE 28 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids!

slide-29
SLIDE 29 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Send out the Ghost Amoebas

slide-30
SLIDE 30 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-31
SLIDE 31 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-32
SLIDE 32 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-33
SLIDE 33 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Send out the Ghost Amoebas

slide-34
SLIDE 34 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-35
SLIDE 35 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-36
SLIDE 36 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-37
SLIDE 37 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-38
SLIDE 38 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-39
SLIDE 39 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-40
SLIDE 40 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Only Slime Known Shortest Paths

slide-41
SLIDE 41 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids! Optimal Slime!

slide-42
SLIDE 42 Lazy search for shortest path Evaluate Path Update the graph P P Collision Free Graph, start, goal, lazy estimates

LazySP

OFU on Steroids!

+ =

slide-43
SLIDE 43

Edge Selectors

Forward (first unevaluated edge) Reverse (last unevaluated edge) Alternate (alternate Forward and Reverse) Bisect (furthest from an unevaluated edge)

slide-44
SLIDE 44

Hypothesis Class All LazySP Selectors

The Realizability Assumption

Forward Alternate Oracle

The Oracle is a LazySP Selector!

The Provable Virtue of Laziness in Motion Planning, Hagtalab et al., ICAPS 2018.


Can we Learn to Imitate the Oracle?

Leveraging experience in lazy search, Bhardwaj et al., RSS 2019.


slide-45
SLIDE 45

Is there a Search Algorithm
 that Minimizes 
 the Number of Edge Evaluations?

LazySP

ICAPS 2018 [Best Conference Paper Award Winner] First Provably Edge-Optimal A*-like Search Algorithm

slide-46
SLIDE 46

Anytime Motion Planning

Solution Cost

Feasible Path Shortest Path

Computation Time

46
slide-47
SLIDE 47

Anytime Motion Planning

Solution Cost Computation Time

47
slide-48
SLIDE 48

Will it converge to the shortest path?

Solution Cost Computation Time

48
slide-49
SLIDE 49

Beyond Asymptotic Optimality

Solution Cost Computation Time

49
slide-50
SLIDE 50

Beyond Asymptotic Optimality

Solution Cost

Time to Initial Path

Computation Time

50
slide-51
SLIDE 51

Beyond Asymptotic Optimality

Solution Cost

Time to Initial Path Time Budget

Computation Time

Suboptimality Gap

51
slide-52
SLIDE 52

We formalize anytime search as Bayesian Reinforcement Learning

52

Posterior Sampling for Anytime Motion Planning on
 Graphs with Expensive-to-Evaluate Edges, Hou et al., ICRA 2020.


slide-53
SLIDE 53
  • Evaluating edges uncovers shorter paths
  • Anytime Objective: cumulative path lengths
  • Given prior on collision statuses
  • Bayesian Anytime Objective:
  • Bayesian planning algorithm uses


edge evaluation history to
 compute collision posterior

Bayesian Anytime Motion Planning

53
slide-54
SLIDE 54

The Experienced Piano Movers’ Problem

New Piano. New House. Same Mover.

slide-55
SLIDE 55
  • Equivalence to episodic Bayesian RL [Osband et al, 2013]
  • Infer unknown MDP through repeated episodes

Bayesian Anytime Motion Planning as
 Bayesian Reinforcement Learning

Minimizing Bayesian regret is equivalent to
 minimizing the Bayesian anytime planning objective!

55

“no regret” is equivalent to asymptotic optimality

Extending rapidly-exploring random trees for asymptotically optimal anytime motion planning, Abbasi-Yadkori et al., IROS 2010.


slide-56
SLIDE 56

Experienced Lazy Path Search

Proposer Validator Posterior

Evaluated edge statuses Path Feasible Path

56
slide-57
SLIDE 57

Validator Posterior Proposer

  • Posterior Sampling for Motion Planning (PSMP):


propose paths according to probability they are optimal

  • Idea from multi-armed bandits (as Thompson sampling),


Posterior Sampling for RL

The Posterior Sampling Proposer

57

(More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013.


slide-58
SLIDE 58

Validator Posterior Proposer

  • Posterior Sampling for Motion Planning (PSMP):


propose paths according to probability they are optimal

  • Idea from multi-armed bandits (as Thompson sampling),


Posterior Sampling for RL [Osband et al, 2013]

  • First anytime motion planning algorithm with Bayesian regret bounds
  • Analysis adapts [Osband et al, 2013] for deterministic MDPs
  • Bound of matches known lower bounds

The Posterior Sampling Proposer

58

(More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013.


slide-59
SLIDE 59

Validator Posterior Proposer

  • Posterior Sampling for Motion Planning (PSMP):


propose paths according to probability they are optimal

  • Idea from multi-armed bandits (as Thompson sampling),


Posterior Sampling for RL [Osband et al, 2013]

  • First anytime motion planning algorithm with Bayesian regret bounds
  • Analysis adapts [Osband et al, 2013] for deterministic MDPs
  • Bound of matches known lower bounds
  • Solves one shortest path problem per proposal

The Posterior Sampling Proposer

59

(More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013.


slide-60
SLIDE 60 60

But Whatever Happened to Optimism?!

Optimism in the 
 Face of Uncertainty (OFU) Shortest Path Bayesian
 Regret Anytime Performance Bayes
 Optimality Feasible Path

slide-61
SLIDE 61

Sample Sample Sample Validate Validate Validate

61
slide-62
SLIDE 62 62

Posterior Distributions

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Bayesian Anytime Motion Planning
 via Posterior Sampling

slide-63
SLIDE 63

Learning Collision Posteriors

Validator Proposer Posterior

63
slide-64
SLIDE 64

Learning Collision Posteriors

Validator Proposer Posterior

Nearest Neighbor (NN) Finite Set (FS)

new environments with unknown structure known structure from past experience

64
slide-65
SLIDE 65

Shorter paths in fewer collision checks

5000 10000 Configurations Evaluated 1.3 1.4 1.5 1.6 1.7 1.8 Path Length PSMP (FS) 5000 10000 Configurations Evaluated 1.3 1.4 1.5 1.6 1.7 1.8 Path Length POMP (FS) 5000 10000 Configurations Evaluated 1.3 1.4 1.5 1.6 1.7 1.8 Path Length LazySP 65

Pareto-Optimal Search over Configuration Space Beliefs for Anytime Motion Planning, Choudhury et al., IROS 2016.
 Posterior Sampling for Anytime Motion Planning on
 Graphs with Expensive-to-Evaluate Edges, Hou et al., ICRA 2020.


slide-66
SLIDE 66 66

Motion Planning Networks, Qureshi et al., ICRA 2019.


slide-67
SLIDE 67

RRT* requires many collision checks

50000 100000 150000 Configurations Evaluated 10 20 30 40 50 Path Length 50000 100000 150000 Configurations Evaluated 0% 20% 40% 60% 80% 100% Success Rate

RRT + PS RRT*

67

Sampling-based Algorithms for Optimal Motion Planning, Karaman and Frazzoli, IJRR 2011.
 RRT-Connect: An efficient approach to single-query path planning, Kuffner and Lavalle, ICRA 2000.


slide-68
SLIDE 68

Outperforms common anytime heuristics

1000 Configurations Evaluated 0% 20% 40% 60% 80% 100% Success Rate 500 1000 1500 Configurations Evaluated 10 20 30 40 50 Path Length

POMP (FS) PSMP (FS) LAZYSP RRT + PS

68
slide-69
SLIDE 69

We formalize anytime search as Bayesian Reinforcement Learning

69

Posterior Sampling for Anytime Motion Planning on
 Graphs with Expensive-to-Evaluate Edges, Hou et al., ICRA 2020.


slide-70
SLIDE 70

The Experienced Piano Movers’ Problem

New Piano. New House. Same Mover.

slide-71
SLIDE 71

Search = Eliminating Paths

Optimal Substructure

f(a) < f(b) ⟹ f(a ∘ x) < f(b ∘ x)∀x

You will never catch up.

Posterior Distributions

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

slide-72
SLIDE 72

RL = Eliminating Policies

Optimal Substructure

f(a) < f(b) ⟹ f(a ∘ x) < f(b ∘ x)∀x

You will never catch up.

Posterior Distributions

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts, Lee et al., arXiv:2002.03042


slide-73
SLIDE 73 73 1

Search for Optimal Solutions: the Heart of Heuristic Search is Still Beating

Ariel Felner ISE Department Ben-Gurion University ISRAEL felner@bgu.ac.il

Exploit Structure Embrace Laziness Prove some
 Damn Theorems

Data-driven Planning via Imitation Learning, Choudhury et al., IJRR 2018.


slide-74
SLIDE 74 74

Theory System

Aim for the Corners!

slide-75
SLIDE 75 75

The Provable Virtue of Laziness in Motion Planning, Hagtalab et al., ICAPS 2018.
 Leveraging experience in lazy search, Bhardwaj et al., RSS 2019.
 Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts, Lee et al., arXiv: 2002.03042
 Posterior Sampling for Anytime Motion Planning on
 Graphs with Expensive-to-Evaluate Edges, Hou et al., ICRA 2020.
 Pareto-Optimal Search over Configuration Space Beliefs for Anytime Motion Planning, Choudhury et al., IROS 2016.
 A Unifying Formalism for Shortest Path Problems with Expensive Edge Evaluations via Lazy Best-First Search over Paths with Edge Selectors, Dellin and Srinivasa, ICAPS 2016.
 Near-Optimal Edge Evaluation in Explicit Generalized Binomial Graphs, Choudhury et al., N*IPS 2017.
 Bayesian Active Edge Evaluation on Expensive Graphs, Choudhury et al., IJCAI 2018.
 Lazy Receding Horizon A* for Efficient Path Planning in Graphs with Expensive-to-Evaluate Edges, Mandalika et al., ICAPS 2018.
 Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles, Mandalika et al., ICAPS 2019.


Mohan Bhardwaj, Byron Boots, Sushman Choudhury, Sanjiban Choudhury, Chris Dellin, Nika Hagtalab, Brian Hou, Shervin Javdani, Gilwoo Lee, Simon Mackenzie, Aditya Mandalika, Ariel Procaccia, Oren Salzman, Sebastian Scherer.

Coauthors

Tim Barfoot, Dmitry Berenson, Jon Gammell, David Hsu, Brad Saund, Rahul Vernwal.

Collaborators

Drew Bagnell, Kostas Bekris, Dan Halperin, Kris Hauser, Sven Koenig, Max Likhachev.

Smarty Pants

Army, DARPA, Honda, NIH, NSF, ONR.

Funders

https://personalrobotics.cs.washington.edu/publications/

slide-76
SLIDE 76 76

https://www.amazon.jobs/en/teams/rai

slide-77
SLIDE 77

Formalizing Connections Between 
 Motion Planning and Machine Learning

Siddhartha Srinivasa Boeing Endowed Professor University of Washington

77