rich behavioral models illustration on journey planning
play

Rich Behavioral Models: Illustration on Journey Planning Mickael - PowerPoint PPT Presentation

Rich Behavioral Models: Illustration on Journey Planning Mickael Randour F.R.S.-FNRS & UMONS Universit e de Mons, Belgium March 14, 2019 Workshop Theory and Algorithms in Graph and Stochastic Games Context SSP-E/SSP-P SSP-WE


  1. Rich Behavioral Models: Illustration on Journey Planning Mickael Randour F.R.S.-FNRS & UMONS – Universit´ e de Mons, Belgium March 14, 2019 Workshop – Theory and Algorithms in Graph and Stochastic Games

  2. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion The talk in one slide Strategy synthesis for Markov Decision Processes (MDPs) Finding good controllers for systems interacting with a stochastic environment. Good? Performance evaluated through payoff functions . Usual problem is to optimize the expected performance or the probability of achieving a given performance level . Not sufficient for many practical applications. � Several extensions, more expressive but also more complex. . . Aim of this survey talk Give a flavor of classical questions and extensions ( rich behavioral models ), illustrated on the stochastic shortest path (SSP). Rich Behavioral Models Mickael Randour 1 / 41

  3. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion 1 Context, MDPs, strategies 2 Classical stochastic shortest path problems 3 Good expectation under acceptable worst-case 4 Percentile queries in multi-dimensional MDPs 5 Conclusion Rich Behavioral Models Mickael Randour 2 / 41

  4. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion 1 Context, MDPs, strategies 2 Classical stochastic shortest path problems 3 Good expectation under acceptable worst-case 4 Percentile queries in multi-dimensional MDPs 5 Conclusion Rich Behavioral Models Mickael Randour 3 / 41

  5. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Multi-criteria quantitative synthesis Verification and synthesis: � a reactive system to control , � an interacting environment , � a specification to enforce . Model of the (discrete) interaction? � Antagonistic environment: 2-player game on graph. � Stochastic environment: MDP . Quantitative specifications. Examples: � Reach a state s before x time units � shortest path. � Minimize the average response-time � mean-payoff. Focus on multi-criteria quantitative models � to reason about trade-offs and interplays . Rich Behavioral Models Mickael Randour 4 / 41

  6. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Strategy (policy) synthesis for MDPs system environment informal description description specification model as a model as 1 How complex is it to decide if Markov Decision a winning Process (MDP) objective a winning strategy exists? 2 How complex such a strategy synthesis needs to be? Simpler is better . is there a 3 Can we synthesize one winning strategy ? efficiently? yes no empower system capabilities strategy or weaken = specification controller requirements Rich Behavioral Models Mickael Randour 5 / 41

  7. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov decision processes MDP D = ( S , s init , A , δ, w ). � Finite sets of states S and actions A , 0 . 7 � probabilistic transition δ : S × A → D ( S ), 0 . 3 s 1 s 2 � weight function w : A → Z . a 1 , 2 Run (or play): ρ = s 1 a 1 . . . a n − 1 s n . . . 0 . 9 a 2 , − 1 b 3 , 3 such that δ ( s i , a i , s i +1 ) > 0 for all i ≥ 1. 0 . 1 � Set of runs R ( D ). � Set of histories (finite runs) H ( D ). s 3 Strategy σ : H ( D ) → D ( A ). � ∀ h ending in s , Supp( σ ( h )) ∈ A ( s ). a 4 , 1 a 3 , 0 s 4 Rich Behavioral Models Mickael Randour 6 / 41

  8. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov decision processes Sample pure memoryless strategy σ . 0 . 7 Sample run ρ = s 1 a 1 s 2 a 2 s 1 a 1 s 2 a 2 ( s 3 a 3 s 4 a 4 ) ω . 0 . 3 s 1 s 2 Other possible run ρ ′ = s 1 a 1 s 2 a 2 ( s 3 a 3 s 4 a 4 ) ω . a 1 , 2 Strategies may use 0 . 9 a 2 , − 1 b 3 , 3 � finite or infinite memory , 0 . 1 � randomness . s 3 Payoff functions map runs to numerical values: a 4 , 1 a 3 , 0 � truncated sum up to T = { s 3 } : TS T ( ρ ) = 2, TS T ( ρ ′ ) = 1, � mean-payoff: MP( ρ ) = MP( ρ ′ ) = 1 / 2, s 4 � many more. Rich Behavioral Models Mickael Randour 6 / 41

  9. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Markov chains 0 . 7 0 . 3 Once strategy σ fixed, fully stochastic process: s 1 s 2 a 1 , 2 � Markov chain ( MC ) M . 0 . 9 State space = product of the MDP and the a 2 , − 1 memory of σ . 0 . 1 s 3 Event E ⊆ R ( M ) � probability P M ( E ) Measurable f : R ( M ) → R ∪ {∞} , a 4 , 1 a 3 , 0 � expected value E M ( f ) s 4 Rich Behavioral Models Mickael Randour 7 / 41

  10. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Aim of this survey Compare different types of quantitative specifications for MDPs � w.r.t. the complexity of the decision problem, � w.r.t. the complexity of winning strategies. Recent extensions share a common philosophy: framework for the synthesis of strategies with richer performance guarantees . � Our work deals with many different payoff functions. Focus on the shortest path problem in this talk. � Not the most involved technically, natural applications. � Useful to understand the practical interest of each variant. Joint work with R. Berthon, V. Bruy` ere, E. Filiot, J.-F. Raskin, O. Sankur [BFRR17, RRS17, RRS15, BCH + 16, Ran16, BRR17]. Rich Behavioral Models Mickael Randour 8 / 41

  11. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion 1 Context, MDPs, strategies 2 Classical stochastic shortest path problems 3 Good expectation under acceptable worst-case 4 Percentile queries in multi-dimensional MDPs 5 Conclusion Rich Behavioral Models Mickael Randour 9 / 41

  12. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Stochastic shortest path Shortest path problem for weighted graphs Given state s ∈ S and target set T ⊆ S , find a path from s to a state t ∈ T that minimizes the sum of weights along edges. � PTIME algorithms (Dijkstra, Bellman-Ford, etc) [CGR96]. We focus on MDPs with strictly positive weights for the SSP. � Truncated sum payoff function for ρ = s 1 a 1 s 2 a 2 . . . and target set T : �� n − 1 j =1 w ( a j ) if s n first visit of T , TS T ( ρ ) = ∞ if T is never reached. Rich Behavioral Models Mickael Randour 10 / 41

  13. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Planning a journey in an uncertain environment home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work Each action takes time, target = work. � What kind of strategies are we looking for when the environment is stochastic? Rich Behavioral Models Mickael Randour 11 / 41

  14. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: minimizing the expected length to target SSP-E problem Given MDP D = ( S , s init , A , δ, w ), target set T and threshold ℓ ∈ Q , decide if there exists σ such that E σ D (TS T ) ≤ ℓ . Theorem [BT91] The SSP-E problem can be decided in polynomial time. Optimal pure memoryless strategies always exist and can be constructed in polynomial time. Rich Behavioral Models Mickael Randour 12 / 41

  15. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: illustration home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work � Pure memoryless strategies suffice. � Taking the car is optimal: E σ D (TS T ) = 33. Rich Behavioral Models Mickael Randour 13 / 41

  16. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). For each s ∈ S \ T , one variable x s , � max x s s ∈ S \ T under the constraints � δ ( s , a , s ′ ) · x s ′ x s ≤ w ( a )+ for all s ∈ S \ T , for all a ∈ A ( s ). s ′ ∈ S \ T Rich Behavioral Models Mickael Randour 14 / 41

  17. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). Optimal solution v : � v s = expectation from s to T under an optimal strategy. Optimal pure memoryless strategy σ v :   σ v ( s ) = arg min � δ ( s , a , s ′ ) · v s ′  .  w ( a ) + a ∈ A ( s ) s ′ ∈ S \ T � Playing optimally = locally optimizing present + future. Rich Behavioral Models Mickael Randour 14 / 41

  18. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion SSP-E: PTIME algorithm 1 Graph analysis (linear time): � s not connected to T ⇒ ∞ and remove, � s ∈ T ⇒ 0. 2 Linear programming ( LP , polynomial time). In practice, value and strategy iteration algorithms often used: � best performance in most cases but exponential in the worst-case, � fixed point algorithms, successive solution improvements [BT91, dA99, HM14]. Rich Behavioral Models Mickael Randour 14 / 41

  19. Context SSP-E/SSP-P SSP-WE SSP-PQ Conclusion Traveling without taking too many risks home go back, 2 railway, 2 car, 1 0 . 1 0 . 9 0 . 2 0 . 1 0 . 7 waiting light medium heavy train bike, 45 room traffic traffic traffic 0 . 9 0 . 1 relax, 35 drive, 20 drive, 30 drive, 70 wait, 3 work Minimizing the expected time to destination makes sense if we travel often and it is not a problem to be late. With car, in 10% of the cases, the journey takes 71 minutes. Rich Behavioral Models Mickael Randour 15 / 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend