symblicit algorithms for optimal strategy synthesis in
play

Symblicit algorithms for optimal strategy synthesis in monotonic - PowerPoint PPT Presentation

Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes Aaron Bohy 1 ere 1 cois Raskin 2 V eronique


  1. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes Aaron Bohy 1 ere 1 cois Raskin 2 V´ eronique Bruy` Jean-Fran¸ 1 Universit´ 2 Universit´ e de Mons e Libre de Bruxelles SYNT 2014 3rd workshop on Synthesis

  2. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (1/2) Motivations: • Markov decision processes with large state spaces • Explicit enumeration exhausts the memory • Symbolic representations like MTBDDs are useful • No easy use of (MT)BDDs for solving linear systems 1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent probabilistic systems. In QEST , pages 27-36. IEEE Computer Society, 2010. 1 / 27

  3. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (1/2) Motivations: • Markov decision processes with large state spaces • Explicit enumeration exhausts the memory • Symbolic representations like MTBDDs are useful • No easy use of (MT)BDDs for solving linear systems Recent contributions of [WBB + 10] 1 : • Symblicit algorithm • Mixes symb olic and exp licit data structures • Expected mean-payoff in Markov decision processes • Using (MT)BDDs 1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent probabilistic systems. In QEST , pages 27-36. IEEE Computer Society, 2010. 1 / 27

  4. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (2/2) Our motivations: • Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07]) • Use antichains instead of (MT)BDDs in symblicit algorithms 2 / 27

  5. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Overview (2/2) Our motivations: • Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07]) • Use antichains instead of (MT)BDDs in symblicit algorithms Our contributions: • New structure of pseudo-antichain (extension of antichains) • Closed under negation • Monotonic Markov decision processes • Two quantitative settings : • Stochastic shortest path (focus of this talk) • Expected mean-payoff • Two applications : • Automated planning • LTL synthesis Full paper available on ArXiv: abs/1402.1076 2 / 27

  6. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 3 / 27

  7. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 4 / 27

  8. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 5 1 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 σ 1 σ 1 4 1 1 5 1 5 5 / 27

  9. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 5 1 1 3 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 • Cost function c : S × Σ → R > 0 σ 1 2 σ 1 1 4 1 1 5 1 5 5 / 27

  10. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov decision processes (MDPs) • M = ( S , Σ , P ) where: s 0 • S is a finite set of states σ 2 σ 1 σ 1 5 1 1 3 6 2 • Σ is a finite set of actions 1 1 • P : S × Σ → Dist ( S ) is a stochastic transition 6 2 function s 2 s 1 • Cost function c : S × Σ → R > 0 σ 1 σ 1 2 σ 1 σ 1 1 4 1 1 5 1 • (Memoryless) strategy λ : S → Σ 5 5 / 27

  11. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov chains (MCs) s 0 • MDP ( S , Σ , P ) with P : S × Σ → Dist ( S ) 1 + strategy λ : S → Σ 2 ⇒ induced MC ( S , P λ ) with P λ : S → Dist ( S ) 1 2 s 2 s 1 4 1 1 5 1 5 6 / 27

  12. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Markov chains (MCs) s 0 • MDP ( S , Σ , P ) with P : S × Σ → Dist ( S ) 1 + strategy λ : S → Σ 3 2 ⇒ induced MC ( S , P λ ) with P λ : S → Dist ( S ) 1 2 • Cost function c : S × Σ → R > 0 s 2 s 1 + strategy λ : S → Σ 2 1 4 1 ⇒ induced cost function c λ : S → R > 0 1 5 1 5 6 / 27

  13. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states 7 / 27

  14. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states • TS G ( ρ = s 0 s 1 s 2 . . . ) = � n − 1 i =0 c λ ( s i ), with n first index s.t. s n ∈ G 7 / 27

  15. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Expected truncated sum • Let M λ = ( S , P λ ) with cost function c λ • Let G ⊆ S be a set of goal states • TS G ( ρ = s 0 s 1 s 2 . . . ) = � n − 1 i =0 c λ ( s i ), with n first index s.t. s n ∈ G • E TS G ( s ) = � ρ P λ ( ρ )TS G ( ρ ), with ρ = s 0 s 1 . . . s n s.t. λ s 0 = s , s n ∈ G and s 0 , . . . , s n − 1 �∈ G 7 / 27

  16. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Stochastic shortest path (SSP) • Let M = ( S , Σ , P ) with cost function c • Let G ⊆ S be a set of goal states • λ ∗ is optimal if E TS G λ ∗ ( s ) = inf λ ∈ Λ E TS G ( s ) λ 8 / 27

  17. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Stochastic shortest path (SSP) • Let M = ( S , Σ , P ) with cost function c • Let G ⊆ S be a set of goal states • λ ∗ is optimal if E TS G λ ∗ ( s ) = inf λ ∈ Λ E TS G ( s ) λ • SSP problem: compute an optimal strategy λ ∗ • Complexity and strategies [BT96]: • Polynomial time via linear programming • Memoryless optimal strategies exist 8 / 27

  18. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 9 / 27

  19. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Ingredients • Strategy iteration algorithm [How60, BT96] • Generates a sequence of monotonically improving strategies • 2 phases: • strategy evaluation by solving a linear system • strategy improvement at each state • Stops as soon as no more improvement can be made • Returns the optimal strategy along with its value function 10 / 27

  20. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Ingredients • Strategy iteration algorithm [How60, BT96] • Generates a sequence of monotonically improving strategies • 2 phases: • strategy evaluation by solving a linear system • strategy improvement at each state • Stops as soon as no more improvement can be made • Returns the optimal strategy along with its value function • Bisimulation lumping [LS91, Buc94, KS60] • Applies to MCs • Gathers states which behave equivalently • Produces a bisimulation quotient (hopefully) smaller • Interested in the largest bisimulation ∼ L 10 / 27

  21. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Symblicit algorithm • Mix of symbolic and explicit data structures Algo 1 Symblicit(MDP M S , Cost function c S , Goal states G S ) 1: n := 0 , λ S n := InitialStrategy( M S , G S ) 2: repeat ( M S λ n , c S λ n ) := InducedMCAndCost( M S , c S , λ S 3: n ) ( M S λ n , ∼ L , c S λ n , ∼ L ) := Lump( M S λ n , c S 4: λ n ) ( M λ n , ∼ L , c λ n , ∼ L ) := Explicit( M S λ n , ∼ L , c S 5: λ n , ∼ L ) 6: v n := SolveLinearSystem( M λ n , ∼ L , c λ n , ∼ L ) v S 7: n := Symbolic( v n ) λ S n +1 := ImproveStrategy( M S , λ S n , v S 8: n ) 9: n := n + 1 10: until λ S n = λ S n − 1 11: return ( λ S n − 1 , v S n − 1 ) Key: S in superscript denotes symbolic representations 11 / 27

  22. Definitions Symblicit approach Antichains and pseudo-antichains Monotonic MDPs Applications Conclusion Table of contents Definitions Symblicit approach Antichains and pseudo-antichains Monotonic Markov decision processes Applications Conclusion and future work 12 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend