learning to learn to control
play

(Learning to) Learn to Control Jan K ret nsk y Technical - PowerPoint PPT Presentation

(Learning to) Learn to Control Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br azdil (Masaryk University Brno), K. Chatterjee, M. Chmel k, P . Daca, A.


  1. (Learning to) Learn to Control Jan Kˇ ret´ ınsk´ y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br´ azdil (Masaryk University Brno), K. Chatterjee, M. Chmel´ ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov, V. Toman (IST Austria), V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University) D. Parker (University of Birmingham) Dagstuhl seminar: Computer-Assisted Engineering for Robotics and Autonomous Systems February 14, 2017

  2. Controller synthesis and verification 2/12

  3. Controller synthesis and verification 2/12

  4. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

  5. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues MEM-OUT

  6. Formal methods and machine learning 3/12 Formal methods + precise – scalability issues

  7. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use different objectives

  8. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use

  9. Formal methods and machine learning 3/12 Learning Formal methods – weaker guarantees + precise + scalable – scalability issues + simpler solutions – can be hard to use precise computation focus on important stuff

  10. Examples 4/12 ◮ Reinforcement learning for efficient controller synthesis ◮ MDP with functional spec (reachability, LTL) 1 ◮ MDP with performance spec (mean payoff/average reward) 2 ◮ Decision tree learning for efficient controller representation ◮ MDP 3 ◮ Games 4 1 Brazdil, Chatterjee, Chmelik, Forejt, K., Kwiatkowska, Parker, Ujma: Verification of Markov Decision Processes Using Learning Algorithms. ATVA 2014 Daca, Henzinger, K., Petrov: Faster Statistical Model Checking for Unbounded Temporal Properties. TACAS 2016 2 Ashok, Chatterjee, Daca, K., Meggendorfer: Value Iteration for Long-run Average Reward in Markov Decision Processes. Submitted 3 Brazdil, Chatterjee, Chmelik, Fellner, K.: Counterexample Explanation by Learning Small Strategies in Markov Decision Processes. CAV 2015 4 Brazdil, Chatterjee, K., Toman: Strategy Representation by Decision Trees in Reactive Synthesis. Submitted

  11. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1

  12. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down 0 . 01 0 . 5 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

  13. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  14. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  15. Example: Markov decision processes 5/12 p . . . 1 a 1 up b b 0 . 5 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 5 0 . 99 0 . 99 c c t goal 1 1 controller σ P σ [ � goal ] max

  16. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c t goal 1 controller σ P σ [ � goal ] max

  17. Example: Markov decision processes 5/12 p . . . 1 a 1 up b 0 . 5 · · · v 1 s init down down 0 . 01 0 . 01 0 . 5 0 . 99 0 . 99 c ACTION = down t goal 1 Y N controller σ P σ [ � goal ] max

  18. Example 1: Computing controllers faster 6/12 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  19. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat a for all transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  20. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  21. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  22. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat sample a path from s init 2: a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  23. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ

  24. Example 1: Computing controllers faster 6/12 More frequently update what is visited more frequently by reasonably good controllers 1: repeat a UpBound ( s −→ ) sample a path from s init ⊲ pick action arg max 2: a a for all visited transitions s −→ do 3: a U pdate ( s −→ ) 4: 5: until UpBound ( s init ) − LoBound ( s init ) < ǫ faster & sure updates important parts of the system

  25. Example 1: Experimental results 7/12 Visited states Example PRISM with RL zeroconf 4,427,159 977 wlan 5,007,548 1,995 firewire 19,213,802 32,214 mer 26,583,064 1,950

  26. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

  27. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

  28. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106

  29. Example 2: Computing small controllers 8/12 ◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree) Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106 * MEM-OUT in PRISM, whereas RL yields: 1887 619 13 0.00014

  30. Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ :

  31. Example 2: Computing small controllers 9/12 precise decisions DT, importance of decisions Importance of a decision in s with respect to � goal and controller σ : P σ [ � s | � goal ]

  32. Some related work 10/12 Further examples on decision trees ◮ Garg, Neider, Madhusudan, Roth: Learning Invariants using Decision Trees and Implication Counterexamples . POPL 2016 ◮ Krishna, Puhrsch, Wies: Learning Invariants Using Decision Trees. Further examples on reinforcement learning ◮ Junges, Jansen, Dehnert, Topcu, Katoen: Safety-Constrained Reinforcement Learning for MDPs. TACAS 2016 ◮ David, Jensen, Larsen, Legay, Lime, Sorensen, Taankvist: On Time with Minimal Expected Cost! ATVA 2014

  33. Summary 11/12 Machine learning in verification ◮ Scalable heuristics ◮ Example 1: Speeding up value iteration ◮ technique : reinforcement learning, BRTDP ◮ idea : focus on updating “most important parts” = most often visited by good strategies ◮ Example 2: Small and readable strategies ◮ technique : decision tree learning ◮ idea : based on the importance of states, feed the decisions to the learning algorithm ◮ Learning in Verification (LiVe) at ETAPS ◮ Explainable Verification (FEVer) at CAV

  34. Discussion 12/12 Verification using machine learning ◮ How far do we want to compromise? ◮ Do we have to compromise? ◮ BRTDP , invariant generation, strategy representation don’t ◮ Don’t we want more than ML? ◮ ( ε -)optimal controllers? ◮ arbitrary controllers – is it still verification? ◮ What do we actually want? ◮ scalability shouldnt overrule guarantees? ◮ when is PAC enough? ◮ Oracle usage seems fine ◮ How much of it can work for examples from robotics?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend