(Learning to) Learn to Control Jan K ret nsk y Technical - - PowerPoint PPT Presentation

learning to learn to control
SMART_READER_LITE
LIVE PREVIEW

(Learning to) Learn to Control Jan K ret nsk y Technical - - PowerPoint PPT Presentation

(Learning to) Learn to Control Jan K ret nsk y Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM), T. Br azdil (Masaryk University Brno), K. Chatterjee, M. Chmel k, P . Daca, A.


slide-1
SLIDE 1

(Learning to) Learn to Control

Jan Kˇ ret´ ınsk´ y

Technical University of Munich, Germany joint work with P . Ashok, T. Meggendorfer (TUM),

  • T. Br´

azdil (Masaryk University Brno),

  • K. Chatterjee, M. Chmel´

ık, P . Daca, A. Fellner, T. Henzinger, T. Petrov, V. Toman (IST Austria),

  • V. Forejt, M. Kwiatkowska, M. Ujma (Oxford University)
  • D. Parker (University of Birmingham)

Dagstuhl seminar: Computer-Assisted Engineering for Robotics and Autonomous Systems February 14, 2017

slide-2
SLIDE 2

Controller synthesis and verification

2/12

slide-3
SLIDE 3

Controller synthesis and verification

2/12

slide-4
SLIDE 4

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues

slide-5
SLIDE 5

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues

MEM-OUT

slide-6
SLIDE 6

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues

slide-7
SLIDE 7

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues – can be hard to use Learning – weaker guarantees + scalable + simpler solutions different objectives

slide-8
SLIDE 8

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues – can be hard to use Learning – weaker guarantees + scalable + simpler solutions

slide-9
SLIDE 9

Formal methods and machine learning

3/12

Formal methods + precise – scalability issues – can be hard to use Learning – weaker guarantees + scalable + simpler solutions precise computation focus on important stuff

slide-10
SLIDE 10

Examples

4/12

◮ Reinforcement learning for efficient controller synthesis

◮ MDP with functional spec (reachability, LTL)1 ◮ MDP with performance spec (mean payoff/average reward)2

◮ Decision tree learning for efficient controller representation

◮ MDP3 ◮ Games 4

1Brazdil, Chatterjee, Chmelik, Forejt, K., Kwiatkowska, Parker, Ujma: Verification of

Markov Decision Processes Using Learning Algorithms. ATVA 2014 Daca, Henzinger, K., Petrov: Faster Statistical Model Checking for Unbounded Temporal

  • Properties. TACAS 2016

2Ashok, Chatterjee, Daca, K., Meggendorfer: Value Iteration for Long-run Average

Reward in Markov Decision Processes. Submitted

3Brazdil, Chatterjee, Chmelik, Fellner, K.: Counterexample Explanation by Learning

Small Strategies in Markov Decision Processes. CAV 2015

4Brazdil, Chatterjee, K., Toman: Strategy Representation by Decision Trees

in Reactive Synthesis. Submitted

slide-11
SLIDE 11

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1

slide-12
SLIDE 12

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1

max

controller σ Pσ[ goal]

slide-13
SLIDE 13

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1 down 0.01 0.99 b 0.5 0.5 c 1

max

controller σ Pσ[ goal]

slide-14
SLIDE 14

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1 down 0.01 0.99 b 0.5 0.5 c 1

max

controller σ Pσ[ goal]

slide-15
SLIDE 15

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1 down 0.01 0.99 b 0.5 0.5 c 1

max

controller σ Pσ[ goal]

slide-16
SLIDE 16

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1 down 0.01 0.99

max

controller σ Pσ[ goal]

slide-17
SLIDE 17

Example: Markov decision processes

5/12

init

  • p. . .

s · · · v1 t goal up 1 down 0.01 0.99 b 0.5 0.5 a 1 c 1 down 0.01 0.99

max

controller σ Pσ[ goal]

ACTION = down Y N

slide-18
SLIDE 18

Example 1: Computing controllers faster

6/12

1: repeat 3:

for all transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-19
SLIDE 19

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently

1: repeat 3:

for all transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-20
SLIDE 20

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently

1: repeat 2:

sample a path from sinit

3:

for all visited transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-21
SLIDE 21

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently by reasonably good controllers

1: repeat 2:

sample a path from sinit

3:

for all visited transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-22
SLIDE 22

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently by reasonably good controllers

1: repeat 2:

sample a path from sinit

3:

for all visited transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-23
SLIDE 23

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently by reasonably good controllers

1: repeat 2:

sample a path from sinit ⊲ pick action arg max

a

UpBound(s

a

−→)

3:

for all visited transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

slide-24
SLIDE 24

Example 1: Computing controllers faster

6/12

More frequently update what is visited more frequently by reasonably good controllers

1: repeat 2:

sample a path from sinit ⊲ pick action arg max

a

UpBound(s

a

−→)

3:

for all visited transitions s

a

−→ do

4:

Update(s

a

−→)

5: until UpBound(sinit) − LoBound(sinit) < ǫ

faster & sure updates important parts of the system

slide-25
SLIDE 25

Example 1: Experimental results

7/12

Example Visited states PRISM with RL zeroconf 4,427,159 977 wlan 5,007,548 1,995 firewire 19,213,802 32,214 mer 26,583,064 1,950

slide-26
SLIDE 26

Example 2: Computing small controllers

8/12

◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

slide-27
SLIDE 27

Example 2: Computing small controllers

8/12

◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

slide-28
SLIDE 28

Example 2: Computing small controllers

8/12

◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106

slide-29
SLIDE 29

Example 2: Computing small controllers

8/12

◮ explicit map σ : S → A ◮ BDD (binary decision diagrams) encoding its bit representation ◮ DT (decision tree)

Example #states Value Explicit BDD DT Rel.err(DT) % firewire 481,136 1.0 479,834 4233 1 0.0 investor 35,893 0.958 28,151 783 27 0.886 mer 1,773,664 0.200016 ——— MEM-OUT ——— * zeroconf 89,586 0.00863 60,463 409 7 0.106 * MEM-OUT in PRISM, whereas RL yields: 1887 619 13 0.00014

slide-30
SLIDE 30

Example 2: Computing small controllers

9/12

precise decisions DT, importance of decisions Importance of a decision in s with respect to goal and controller σ:

slide-31
SLIDE 31

Example 2: Computing small controllers

9/12

precise decisions DT, importance of decisions Importance of a decision in s with respect to goal and controller σ: Pσ[s | goal]

slide-32
SLIDE 32

Some related work

10/12

Further examples on decision trees

◮ Garg, Neider, Madhusudan, Roth: Learning Invariants using

Decision Trees and Implication Counterexamples. POPL 2016

◮ Krishna, Puhrsch, Wies: Learning Invariants Using Decision Trees.

Further examples on reinforcement learning

◮ Junges, Jansen, Dehnert, Topcu, Katoen: Safety-Constrained

Reinforcement Learning for MDPs. TACAS 2016

◮ David, Jensen, Larsen, Legay, Lime, Sorensen, Taankvist: On Time

with Minimal Expected Cost! ATVA 2014

slide-33
SLIDE 33

Summary

11/12

Machine learning in verification

◮ Scalable heuristics ◮ Example 1: Speeding up value iteration

◮ technique: reinforcement learning, BRTDP ◮ idea: focus on updating “most important parts”

= most often visited by good strategies

◮ Example 2: Small and readable strategies

◮ technique: decision tree learning ◮ idea: based on the importance of states,

feed the decisions to the learning algorithm

◮ Learning in Verification (LiVe) at ETAPS ◮ Explainable Verification (FEVer) at CAV

slide-34
SLIDE 34

Discussion

12/12

Verification using machine learning

◮ How far do we want to compromise? ◮ Do we have to compromise?

◮ BRTDP

, invariant generation, strategy representation don’t

◮ Don’t we want more than ML?

◮ (ε-)optimal controllers? ◮ arbitrary controllers – is it still verification?

◮ What do we actually want?

◮ scalability shouldnt overrule guarantees? ◮ when is PAC enough?

◮ Oracle usage seems fine ◮ How much of it can work for examples from robotics?

slide-35
SLIDE 35

Discussion

12/12

Verification using machine learning

◮ How far do we want to compromise? ◮ Do we have to compromise?

◮ BRTDP

, invariant generation, strategy representation don’t

◮ Don’t we want more than ML?

◮ (ε-)optimal controllers? ◮ arbitrary controllers – is it still verification?

◮ What do we actually want?

◮ scalability shouldnt overrule guarantees? ◮ when is PAC enough?

◮ Oracle usage seems fine ◮ How much of it can work for examples from robotics?

Thank you