Dynamic Programming Algorithms for Planning and Robotics in - PowerPoint PPT Presentation

Dynamic Programming Algorithms for Planning and Robotics in Continuous Domains and the Hamilton-Jacobi Equation Ian Mitchell Department of Computer Science University of British Columbia research supported by the Natural Science and Engineering Research Council of Canada and Office of Naval Research under MURI contract N00014-02-1-0720

Outline • Introduction – Optimal control – Dynamic programming (DP) • Path Planning – Discrete planning as optimal control – Dijkstra’s algorithm & its problems – Continuous DP & the Hamilton- Jacobi (HJ) PDE – The fast marching method (FMM): Dijkstra’s for continuous spaces • Algorithms for Static HJ PDEs – Four alternatives – FMM pros & cons • Generalizations – Alternative action norms – Multiple objective planning 22 Sept 2008 Ian Mitchell, University of British Columbia 2

Basic Path Planning • Find the optimal path p p ( s s ) to a target (or from a source) p p s s • Inputs – Cost c c ( x x ) to pass through each state in the state space c c x x – Set of targets or sources (provides boundary conditions) cost map (higher is more costly) cost map (contours) 22 Sept 2008 Ian Mitchell, University of British Columbia 3

Discrete vs Continuous • Discrete variable – Drawn from a countable domain, typically finite – Often no useful metric other than the discrete metric – Often no consistent ordering – Examples: names of students in this room, rooms in this building, natural numbers, grid of � d , … • Continuous variable – Drawn from an uncountable domain, but may be bounded – Usually has a continuous metric – Often no consistent ordering – Examples: Real numbers [ 0, 1 ], � d , SO(3), … 22 Sept 2008 Ian Mitchell, University of British Columbia 4

Classes of Models for Dynamic Systems • Discrete time and state • Continuous time / discrete state – Discrete event systems • Discrete time / continuous state • Continuous time and state • Markovian assumption – All information relevant to future evolution is captured in the state variable – Vital assumption, but failures are often treated as nondeterminism • Deterministic assumption – Future evolution completely determined by initial conditions – Can be eased in many cases • Not the only classes of models 22 Sept 2008 Ian Mitchell, University of British Columbia 5

Achieving Desired Behaviours • We can attempt to control a system when there is a parameter u u u u of the dynamics (the “control input”) which we can influence – Time dependent dynamics are possible, but we will mostly deal with time invariant systems • Without a control signal specification, system is nondeterministic – Current state cannot predict unique future evolution • Control signal may be specified u : �� – Open-loop u u ( t t ) or u u u t t u u U U U U u : �� – Feedback, closed-loop u u ( x x ( t t )) or u u u x x t t u u U U U U – Either choice makes the system deterministic again 22 Sept 2008 Ian Mitchell, University of British Columbia 6

Objective Function • We distinguish quality of control by an objective / payoff / cost function, which comes in many different variations – eg: discrete time discounted with fixed finite horizon t t t t f f f f – eg: continuous time no discount with target set T T T T 22 Sept 2008 Ian Mitchell, University of British Columbia 7

Value Function • Choose input signal to optimize the objective – Optimize: “cost” is usually minimized, “payoff” is usually maximized and “objective” may be either • Value function is the optimal value of the objective function – May not be achieved for any signal – Set of signals U� can be an issue in continuous time problems (eg piecewise constant vs measurable) 22 Sept 2008 Ian Mitchell, University of British Columbia 8

Dynamic Programming in Discrete Time Consider finite horizon objective with α = 1 (no discount) • u ( � ) we can solve inductively backwards in time for • So given u u u objective J J ( t t , x x , u u ( � )), starting at t t = t J J t t x x u u t t t t t f f f f – Called dynamic programming (DP) 22 Sept 2008 Ian Mitchell, University of British Columbia 9

DP for the Value Function • DP can also be applied to the value function – Second step works because u u ( t t 0 ) can be chosen independently of u u t t u u ( t t ) for t t > t t 0 u u t t t t t t 22 Sept 2008 Ian Mitchell, University of British Columbia 10

Optimal Control via DP • Optimal control signal • Optimal trajectory (discrete gradient descent) • Observe update equation • Can be extended (with appropriate care) to – other objectives – probabilistic models – adversarial models 22 Sept 2008 Ian Mitchell, University of British Columbia 11

Outline • Introduction – Optimal control – Dynamic programming (DP) • Path Planning – Discrete planning as optimal control – Dijkstra’s algorithm & its problems – Continuous DP & the Hamilton- Jacobi (HJ) PDE – The fast marching method (FMM): Dijkstra’s for continuous spaces • Algorithms for Static HJ PDEs – Four alternatives – FMM pros & cons • Generalizations – Alternative action norms – Multiple objective planning 22 Sept 2008 Ian Mitchell, University of British Columbia 12

Basic Path Planning (reminder) • Find the optimal path p p ( s s ) to a target (or from a source) p p s s • Inputs – Cost c c ( x x ) to pass through each state in the state space c c x x – Set of targets or sources (provides boundary conditions) cost map (higher is more costly) cost map (contours) 22 Sept 2008 Ian Mitchell, University of British Columbia 13

Discrete Planning as Optimal Control 22 Sept 2008 Ian Mitchell, University of British Columbia 14

Dynamic Programming Principle Value function ϑ ϑ ϑ ϑ ( x • x ) is “cost to go” from x x to the nearest target x x x x Value ϑ ϑ ( x ϑ ϑ • x ) at a point x x is the minimum over all points y y in the x x x x y y neighborhood N N ( x x ) of the sum of N N x x – the value ϑ ϑ ( y ϑ ϑ y ) at point y y y y y y – the cost c c ( x x ) to travel through x c c x x x x x • Dynamic programming applies if – Costs are additive – Subsets of feasible paths are themselves feasible – Concatenations of feasible paths are feasible • Compute solution by value iteration – Repeatedly solve DP equation until solution stops changing – In many situations, smart ordering reduces number of iterations 22 Sept 2008 Ian Mitchell, University of British Columbia 15

Policy (Feedback Control) Given value function ϑ ϑ ϑ ϑ ( x • x ), optimal action at x x is x x → → y → → y where x x x x x x y y – Policy u u ( x x ) = y u u x x y y y • Alternative policy iteration constructs policy directly – Finite termination of policy iteration can be proved for some situations where value iteration does not terminate – Representation of policy function may be more complicated than value function 22 Sept 2008 Ian Mitchell, University of British Columbia 16

Dijkstra’s Algorithm for the Value Function • Single pass dynamic programming value iteration on a discrete graph 1. Set all interior nodes to a dummy value infinity ∞ ∞ ∞ ∞ y ∈ ∈ ∈ ∈ N x ) approximate ϑ ϑ ( y ϑ ϑ 2. For all boundary nodes x x and all y N ( x y ) by x x y y N N x x y y DPP 3. Sort all interior nodes with finite values in a list x with minimum value from the list and update ϑ ϑ ϑ ϑ ( y 4. Pop node x y ) by x x y y y ∈ ∈ N DPP for all y ∈ ∈ N ( x x ) y y N N x x 5. Repeat from (3) until all nodes have been popped Constant cost map c c ( y x ) = 1 c c y y y x x x Boundary node ϑ ϑ ( x ϑ ϑ x ) = 0 x x First Neighbors ϑ ϑ ( x ϑ ϑ x ) = 1 x x Second Neighbors ϑ ϑ ϑ ϑ ( x x ) = 2 x x Distant node ϑ ϑ ϑ ϑ ( x x ) = 15 x x Optimal path? 22 Sept 2008 Ian Mitchell, University of British Columbia 17

Generic Dijkstra-like Algorithm • Could also use iterative scheme by minor modifications in management of the queue 22 Sept 2008 Ian Mitchell, University of British Columbia 18

Typical Discrete Update • Much better results from discrete Dijkstra with eight 0.8 neighbour stencil 0.6 • Result still shows facets in 0.4 what should be circular 0.2 contours 0 −0.2 −0.4 −0.6 −0.8 −0.5 0 0.5 black: value function contours for minimum time to the origin red: a few optimal paths 22 Sept 2008 Ian Mitchell, University of British Columbia 19

Other Issues • Values and actions are not defined for states that are not nodes in the discrete graph • Actions only include those corresponding to edges leading to neighboring states • Interpolation of actions to points that are not grid nodes may not lead to actions optimal under continuous constraint two optimal paths to the lower right node 22 Sept 2008 Ian Mitchell, University of British Columbia 20

Deriving Continuous DP (Informally) 22 Sept 2008 Ian Mitchell, University of British Columbia 21

The Static Hamilton -Jacobi PDE 22 Sept 2008 Ian Mitchell, University of British Columbia 22

Continuous Planning as Optimal Control 22 Sept 2008 Ian Mitchell, University of British Columbia 23

Dynamic Programming Algorithms for Planning and Robotics in - PowerPoint PPT Presentation

Dynamic Programming Algorithms for Planning and Robotics in Continuous Domains and the Hamilton-Jacobi Equation Ian Mitchell Department of Computer Science University of British Columbia research supported by the Natural Science and

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery

Network Layer Support for Gigabit TCP Flows in Wireless Mesh Networks This work is collaborated

WINLAB Rutgers University Routing in MobilityFirst: Objectives Efficient and robust support of

QoS QoS Aware Aware BiNoC BiNoC Architecture Architecture Shih Shih- -Hsin Hsin Lo, Ying

Greedy routing by distributed D l Delaunay triangulation t i l ti 4/4/2017 Greedy Routing (S.

Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage Ibrahim Ahmed Linda Shen

for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu School of

Happy Birthday Slide! We have created an option in the CCHD software to post student birthdays

Dynamic Programming Algorithms for Planning and Robotics in - PowerPoint PPT Presentation

Dynamic Programming Algorithms for Planning and Robotics in Continuous Domains and the Hamilton-Jacobi Equation Ian Mitchell Department of Computer Science University of British Columbia research supported by the Natural Science and

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Sensors for Robotics

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery

Network Layer Support for Gigabit TCP Flows in Wireless Mesh Networks This work is collaborated

WINLAB Rutgers University Routing in MobilityFirst: Objectives Efficient and robust support of

QoS QoS Aware Aware BiNoC BiNoC Architecture Architecture Shih Shih- -Hsin Hsin Lo, Ying

Greedy routing by distributed D l Delaunay triangulation t i l ti 4/4/2017 Greedy Routing (S.

Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage Ibrahim Ahmed Linda Shen

for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu School of

Happy Birthday Slide! We have created an option in the CCHD software to post student birthdays

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics