Computational Approaches for Stochastic Shortest Path on Succinct - PowerPoint PPT Presentation

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1 Hongfei Fu 2 Amir Goharshady 1 Nastaran Okati 3 1 IST Austria 2 Shanghai Jiao Tong University 3 Ferdowsi University of Mashhad IJCAI 2018

Succinct MDPs

Succinct MDPs A succinct MDP is an MDP described implicitly by:

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables,

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables, a set of rules that describe how the variables can be updated,

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables, a set of rules that describe how the variables can be updated, a target set, consisting of valuations to the variables.

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables, a set of rules that describe how the variables can be updated, a target set, consisting of valuations to the variables. At every time step, a rule is non-deterministically chosen to update the variables. This process continues until the target set is reached.

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables, a set of rules that describe how the variables can be updated, a target set, consisting of valuations to the variables. At every time step, a rule is non-deterministically chosen to update the variables. This process continues until the target set is reached. We can think of a succinct MDP as a probabilistic program with the following form: while φ do Q 1 � . . . � Q k od where � denotes non-determinism and each Q i is a sequence of assignments to variables.

Succinct MDPs A succinct MDP is an MDP described implicitly by: a set of variables, a set of rules that describe how the variables can be updated, a target set, consisting of valuations to the variables. At every time step, a rule is non-deterministically chosen to update the variables. This process continues until the target set is reached. We can think of a succinct MDP as a probabilistic program with the following form: while φ do Q 1 � . . . � Q k od where � denotes non-determinism and each Q i is a sequence of assignments to variables. Example: while x ≥ 1 do x := x + r � x := x − 1 od

Another Example while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e � i f ( 0 . 3 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e Figure: Gambler’s Ruin as a Succinct MDP

Stochastic Shortest Path Fix an initial valuation v for the program variables. Let σ be a policy that at any point of time, given the history of the program, chooses one of the Q i ’s to be executed. We define R ∞ ( v , σ ) as the expected sum of rewards collected by the program before termination, if the program starts with the valuation v and follows the policy σ .

Stochastic Shortest Path Fix an initial valuation v for the program variables. Let σ be a policy that at any point of time, given the history of the program, chooses one of the Q i ’s to be executed. We define R ∞ ( v , σ ) as the expected sum of rewards collected by the program before termination, if the program starts with the valuation v and follows the policy σ . We define infval( v ) = inf σ R ∞ ( v , σ ), and supval( v ) = sup σ R ∞ ( v , σ ), where the inf and sup are taken over all policies that guarantee finite expected termination time. We are looking for methods to obtain upper and lower bounds for both infval and supval.

LUPFs and LLPFs We focus on supval, the approach for infval is similar.

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions:

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions: h is linear, 1

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions: h is linear, 1 the value of h at terminating valuations is bounded between 2 two fixed constants K and K ′ ,

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions: h is linear, 1 the value of h at terminating valuations is bounded between 2 two fixed constants K and K ′ , For every Q i and every valuation v that satisfies the loop 3 guard: h ( v ) ≥ E u ( h ( Q i ( v , u ))) + E u ( R ( u , Q i ))

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions: h is linear, 1 the value of h at terminating valuations is bounded between 2 two fixed constants K and K ′ , For every Q i and every valuation v that satisfies the loop 3 guard: h ( v ) ≥ E u ( h ( Q i ( v , u ))) + E u ( R ( u , Q i )) There is a fixed constant M , s.t. at each step of the program, 4 the value of h changes by at most M .

LUPFs and LLPFs We focus on supval, the approach for infval is similar. Linear Upper Potential Function (LUPF): Let X be the set of program variables, a function h : R X → R is an LUPF if it satisfies the following conditions: h is linear, 1 the value of h at terminating valuations is bounded between 2 two fixed constants K and K ′ , For every Q i and every valuation v that satisfies the loop 3 guard: h ( v ) ≥ E u ( h ( Q i ( v , u ))) + E u ( R ( u , Q i )) There is a fixed constant M , s.t. at each step of the program, 4 the value of h changes by at most M . Linear Lower Potential Function (LLPF): An LLPF h is a function that satisfies the above conditions, except that condition 3 is changed to: For every v that satisfies the loop guard, there exists a Q i such that h ( v ) ≤ E u ( h ( Q i ( v , u ))) + E u ( R ( u , Q i )) .

Theorem If h is an LUPF, then supval ( v ) ≤ h ( v ) − K for all valuations v ∈ R X that satisfy the loop guard. Theorem If h is an LLPF, then supval ( v ) ≥ h ( v ) − K ′ for all valuations v ∈ R X that satisfy the loop guard.

Theorem If h is an LUPF, then supval ( v ) ≤ h ( v ) − K for all valuations v ∈ R X that satisfy the loop guard. Theorem If h is an LLPF, then supval ( v ) ≥ h ( v ) − K ′ for all valuations v ∈ R X that satisfy the loop guard. Sketch of Proof. Construct a stochastic process based on h . Show that it forms a supermartingale, and then apply the optional stopping theorem (OST).

Synthesizing LUPFs while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } e l s e { x := x − 1 } � i f ( 0 . 3 ) { x := x + 1 reward = 1 } e l s e { x := x − 1 }

Synthesizing LUPFs while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } e l s e { x := x − 1 } � i f ( 0 . 3 ) { x := x + 1 reward = 1 } e l s e { x := x − 1 } Let h : R → R be an LUPF for this example, we have: (1) ∃ λ 1 , λ 2 ∈ R ∀ x ∈ R h ( x ) = λ 1 x + λ 2 ∃ K , K ′ ∈ R (2) ∀ x ∈ [1 , 2) K ≤ h ( x ) ≤ K ′ ∀ x ∈ [1 , ∞ ) h ( x ) ≥ 0 . 4 · (1+ h ( x +1))+0 . 6 · h ( x − 1) (3) (3) ∀ x ∈ [1 , ∞ ) h ( x ) ≥ 0 . 3 · (1+ h ( x +1))+0 . 7 · h ( x − 1) (4) ∃ M ∈ [0 , ∞ ) ∀ x ∈ [1 , ∞ ) | h ( x ) − h ( x − 1) | ≤ M | h ( x ) − h ( x + 1) | ≤ M (4) and

By applying Farkas lemma and solving the resulting LP with the goal of minimizing λ 1 , we get: λ 1 = M = 2 , λ 2 = K = 0 , K ′ = 4.

By applying Farkas lemma and solving the resulting LP with the goal of minimizing λ 1 , we get: λ 1 = M = 2 , λ 2 = K = 0 , K ′ = 4. Therefore, we have supval( x 0 ) ≤ 2 x 0 for all initial valuations x 0 that satisfy the loop guard. Hence, in this case the problem was solved by a reduction to linear programming.

Synthesizing LLPFs while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e � i f ( 0 . 3 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e

Synthesizing LLPFs while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e � i f ( 0 . 3 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e This case is a bit more complicated. If h is an LLPF, we must have the exact same conditions as before, except that condition 3 changes to: (3 ′ ) ∀ x ∈ [1 , ∞ ) h ( x ) ≤ 0 . 4 · (1 + h ( x + 1)) + 0 . 6 · h ( x − 1) or h ( x ) ≤ 0 . 3 · (1 + h ( x + 1)) + 0 . 7 · h ( x − 1)

Synthesizing LLPFs while x ≥ 1 do � i f ( 0 . 4 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e � i f ( 0 . 3 ) { x := x + 1 reward = 1 } { x := x − 1 } e l s e This case is a bit more complicated. If h is an LLPF, we must have the exact same conditions as before, except that condition 3 changes to: (3 ′ ) ∀ x ∈ [1 , ∞ ) h ( x ) ≤ 0 . 4 · (1 + h ( x + 1)) + 0 . 6 · h ( x − 1) or h ( x ) ≤ 0 . 3 · (1 + h ( x + 1)) + 0 . 7 · h ( x − 1) which is equivalent to λ 1 ≤ 2, and hence we get supval( x 0 ) ≥ 2 x 0 . So, our previous bound is tight.

Computational Approaches for Stochastic Shortest Path on Succinct - PowerPoint PPT Presentation

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1 Hongfei Fu 2 Amir Goharshady 1 Nastaran Okati 3 1 IST Austria 2 Shanghai Jiao Tong University 3 Ferdowsi University of Mashhad IJCAI 2018 Succinct MDPs

Finding Shortest Paths Shortest Path Problem Shortest Path Problem We are given a graph G = ( V ,

ECE 242 Data Structures Lecture 31 Shortest Path Algorithms November 30, 2009 ECE242 L31:

Finding Shortest Paths Shortest Path Problem Shortest Path Problem Given a graph G = ( V , E )

4.4 Shortest Paths in a Graph shortest path from Princeton CS department to Einstein's house

Shortest Paths with Arbitrary Edge Weights Cormen et. al.24.1 Shortest Path Problem Shortest

Shortest Paths Shortest Paths path between two given vertices path between two given vertices

TEDI: Efficient Shortest Path Query Answering on Graphs Fang Wei University of Freiburg SIGMOD

Outline and Reading Weighted graphs (7.1) Shortest Paths Shortest path problem Shortest

Graphs II - Shortest paths Single Source Shortest Paths All Sources Shortest Paths some drawings

= k w ( p ) w v ( , v ) i 1 i Single Single- -destination shortest

Shortest Paths Shortest path problem. Given a directed graph G = (V, E), with edge weights c vw ,

Three Graph Algorithms Shortest Distance Paths Distance/Cost of a path in weighted graph sum of

Mechanically checked proof on Dijkstras shortest path algorithm Qiang Zhang J Moore October

Network layer The Dijkstra Algorithm or Dijkstras Shortest Path First Algorithm Non-Adaptive

Design of capacitated networks with unsplittable shortest path routing Andreas Bley Zuse

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 "5 L eaders h ip H acks

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics

Computational Approaches for Stochastic Shortest Path on Succinct - PowerPoint PPT Presentation

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1 Hongfei Fu 2 Amir Goharshady 1 Nastaran Okati 3 1 IST Austria 2 Shanghai Jiao Tong University 3 Ferdowsi University of Mashhad IJCAI 2018 Succinct MDPs

Finding Shortest Paths Shortest Path Problem Shortest Path Problem We are given a graph G = ( V ,

ECE 242 Data Structures Lecture 31 Shortest Path Algorithms November 30, 2009 ECE242 L31:

Finding Shortest Paths Shortest Path Problem Shortest Path Problem Given a graph G = ( V , E )

4.4 Shortest Paths in a Graph shortest path from Princeton CS department to Einstein's house

Shortest Paths with Arbitrary Edge Weights Cormen et. al.24.1 Shortest Path Problem Shortest

Shortest Paths Shortest Paths path between two given vertices path between two given vertices

TEDI: Efficient Shortest Path Query Answering on Graphs Fang Wei University of Freiburg SIGMOD

Outline and Reading Weighted graphs (7.1) Shortest Paths Shortest path problem Shortest

Graphs II - Shortest paths Single Source Shortest Paths All Sources Shortest Paths some drawings

= k w ( p ) w v ( , v ) i 1 i Single Single- -destination shortest

Shortest Paths Shortest path problem. Given a directed graph G = (V, E), with edge weights c vw ,

Three Graph Algorithms Shortest Distance Paths Distance/Cost of a path in weighted graph sum of

Mechanically checked proof on Dijkstras shortest path algorithm Qiang Zhang J Moore October

Network layer The Dijkstra Algorithm or Dijkstras Shortest Path First Algorithm Non-Adaptive

Design of capacitated networks with unsplittable shortest path routing Andreas Bley Zuse

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with

Economic viewpoint on risk transfer Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 &quot;5 L eaders h ip H acks

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima &amp; Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics &amp; Statistics

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 "5 L eaders h ip H acks

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics