Continuous-time Markov Decisions based on Partial Exploration - - PowerPoint PPT Presentation

continuous time markov decisions based on partial
SMART_READER_LITE
LIVE PREVIEW

Continuous-time Markov Decisions based on Partial Exploration - - PowerPoint PPT Presentation

Continuous-time Markov Decisions based on Partial Exploration Pranav Ashok Technical University of Munich Highlights 2018, Berlin Joint work with Yuliya Butkova 1 , Holger Hermanns 1 and Jan Kretinsky 2 1 Saarland University, Germany 2 Technical


slide-1
SLIDE 1

Continuous-time Markov Decisions based on Partial Exploration

Joint work with Yuliya Butkova1, Holger Hermanns1 and Jan Kretinsky2

1Saarland University, Germany 2Technical University of Munich, Germany

1

Highlights 2018, Berlin

Pranav Ashok Technical University of Munich

slide-2
SLIDE 2

Motivation

By Gareth Jones [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0), from Wikimedia Commons 2

slide-3
SLIDE 3

Motivation

  • n students mail @ λ1, λ2,..., λn /day
  • you pick a student’s mail to process it
  • if processed: remove from queue
  • else: put it back into queue

By Gareth Jones [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0), from Wikimedia Commons 3

slide-4
SLIDE 4

Motivation

  • n students mail @ λ1, λ2,..., λn /day
  • you pick a student’s mail to process it
  • if processed: remove from queue
  • else: put it back into queue

By Gareth Jones [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0), from Wikimedia Commons

Q1: What is the max. prob. (over all strategies) that all queues are empty at the end of the week?

4

slide-5
SLIDE 5

Motivation

  • n students mail @ λ1, λ2,..., λn /day
  • you pick a student’s mail to process it
  • if processed: remove from queue
  • else: put it back into queue

By Gareth Jones [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0), from Wikimedia Commons

Q1: What is the max. prob. (over all strategies) that all queues are empty at the end of the week? Q2: What is the min. prob. that student X quits your group after a semester?

5

slide-6
SLIDE 6

Continuous-time Markov Decision Process (CTMDP)

Time-bounded Reachability

Maximal probability (over all strategies) of reaching some goal state within T time units

max P(♢≤TG)

6

slide-7
SLIDE 7

Challenge

Existing reachability algorithms sometimes perform extremely bad in practice even though in PTIME Can we improve them?

7

slide-8
SLIDE 8

Contributions

➔ Framework for time-bounded reachability (TBR) analysis ➔ Use simulations to identify important parts of state-space ➔ Instantiate with standard algorithms to show speed up

8

slide-9
SLIDE 9

Key Idea

Partial Exploration Suffices Not necessary to explore all states to get -optimal solution

9

slide-10
SLIDE 10

What can we do with a partial model?

10

slide-11
SLIDE 11

What can we do with a partial model?

11

slide-12
SLIDE 12

What can we do with a partial model?

12

slide-13
SLIDE 13

What can we do with a partial model?

lo-bo me up-bo me

13

slide-14
SLIDE 14

The Framework

14

Expand partial model Compute lower/upper models Use any solver to get L and U Initialize U - L >

slide-15
SLIDE 15

Partial model through simulations using sim

15

slide-16
SLIDE 16

Experiments I

Size of partial models

Explored States Benchmark States by πsim % 1,479k 105 0.01 597k 296 0.05 1,000k 559 0.06 7,562k 23309 0.31 2k 2537 93.86 119k

  • 16
slide-17
SLIDE 17

Experiments I

Size of partial models

Explored States Benchmark States by πsim % 1,479k 105 0.01 597k 296 0.05 1,000k 559 0.06 7,562k 23309 0.31 2k 2537 93.86 119k

  • 17
slide-18
SLIDE 18

Experiments II

Runtimes

1,000k 71 1 4 1 1,479k

  • TO-

2

  • TO-

2 597k 251 10 114 15 7,562k 507

  • TO-

171 105 18k 6 99 2

  • TO-

119k 1475

  • TO-

826

  • TO-

18

TO → > 1800s (30 min)

slide-19
SLIDE 19

Experiments II

Runtimes

1,000k 71 1 4 1 1,479k

  • TO-

2

  • TO-

2 597k 251 10 114 15 7,562k 507

  • TO-

171 105 18k 6 99 2

  • TO-

119k 1475

  • TO-

826

  • TO-

19

TO → > 1800s (30 min)

slide-20
SLIDE 20

Conclusion

➔ CTMDP TBR analysis framework based on partial exploration ➔ Partial model through simulations ➔ Usable with any TBR solver* ➔ Good on models with many unimportant/improbable states

20 *conditions apply, based on simulation strategy

slide-21
SLIDE 21

21

slide-22
SLIDE 22

Continuous-time Markov Decision Processes (CTMDP)

  • C = (S, A, R, Goal)
  • S: finite set of states; A: finite set of non-det choices
  • Each choice → multiple transitions
  • Each transition has a rate λ = R(s, a, s’)
  • Time t at which transition fired ← exp. dist (λ)
  • Next state chosen by a race between transitions

a

s s’

λ

22