Reconciling Rationality and Stochasticity: Rich Behavioral Models in - - PowerPoint PPT Presentation

reconciling rationality and stochasticity rich behavioral
SMART_READER_LITE
LIVE PREVIEW

Reconciling Rationality and Stochasticity: Rich Behavioral Models in - - PowerPoint PPT Presentation

Reconciling Rationality and Stochasticity: Rich Behavioral Models in Two-Player Games Mickael Randour Computer Science Department, ULB - Universit e libre de Bruxelles, Belgium July 24, 2016 GAMES 2016 - 5th World Congress of the Game Theory


slide-1
SLIDE 1

Reconciling Rationality and Stochasticity: Rich Behavioral Models in Two-Player Games

Mickael Randour

Computer Science Department, ULB - Universit´ e libre de Bruxelles, Belgium

July 24, 2016 GAMES 2016 - 5th World Congress of the Game Theory Society

slide-2
SLIDE 2

Rationality & stochasticity Planning a journey Synthesis Conclusion

The talk in one slide

Two traditional paradigms for agents in complex systems Fully rational System = (multi-player) game Fully stochastic System = large stochastic process In some fields (e.g., computer science), need to go beyond: rich behavioral models Illustration: planning a journey in an uncertain environment

Reconciling Rationality and Stochasticity Mickael Randour 1 / 21

slide-3
SLIDE 3

Rationality & stochasticity Planning a journey Synthesis Conclusion

Advertisement

Full paper available on arXiv [Ran16a]: abs/1603.05072

Reconciling Rationality and Stochasticity Mickael Randour 2 / 21

slide-4
SLIDE 4

Rationality & stochasticity Planning a journey Synthesis Conclusion

1 Rationality & stochasticity 2 Planning a journey in an uncertain environment 3 Synthesis of reliable reactive systems 4 Conclusion

Reconciling Rationality and Stochasticity Mickael Randour 3 / 21

slide-5
SLIDE 5

Rationality & stochasticity Planning a journey Synthesis Conclusion

1 Rationality & stochasticity 2 Planning a journey in an uncertain environment 3 Synthesis of reliable reactive systems 4 Conclusion

Reconciling Rationality and Stochasticity Mickael Randour 4 / 21

slide-6
SLIDE 6

Rationality & stochasticity Planning a journey Synthesis Conclusion

Rationality hypothesis

Rational agents [OR94]: clear personal objectives, aware of their alternatives, form sound expectations about any unknowns, choose their actions coherently (i.e., regarding some notion of

  • ptimality).

= ⇒ In the particular setting of zero-sum games: antagonistic interactions between the players. ֒ → Well-founded abstraction in computer science. E.g., processes competing for access to a shared resource.

Reconciling Rationality and Stochasticity Mickael Randour 5 / 21

slide-7
SLIDE 7

Rationality & stochasticity Planning a journey Synthesis Conclusion

Stochasticity

Stochastic agents:

  • ften a sufficient abstraction to reason about macroscopic

properties of a complex system, agents follow stochastic models that can be based on experimental data (e.g., traffic in a town). Several models of interest: fully stochastic agents = ⇒ Markov chain [Put94], rational agent against stochastic agent = ⇒ Markov decision process [Put94], two rational agents + one stochastic agent = ⇒ stochastic game or competitive MDP [FV97].

Reconciling Rationality and Stochasticity Mickael Randour 6 / 21

slide-8
SLIDE 8

Rationality & stochasticity Planning a journey Synthesis Conclusion

Choosing the appropriate paradigm matters!

As an agent having to choose a strategy, the assumptions made

  • n the other agents are crucial.

= ⇒ They define our objective hence the adequate strategy. = ⇒ Illustration: planning a journey.

Reconciling Rationality and Stochasticity Mickael Randour 7 / 21

slide-9
SLIDE 9

Rationality & stochasticity Planning a journey Synthesis Conclusion

1 Rationality & stochasticity 2 Planning a journey in an uncertain environment 3 Synthesis of reliable reactive systems 4 Conclusion

Reconciling Rationality and Stochasticity Mickael Randour 8 / 21

slide-10
SLIDE 10

Rationality & stochasticity Planning a journey Synthesis Conclusion

Aim of this illustration

Flavor of = types of useful strategies in stochastic environments. Based on a series of papers, most in a computer science setting (more on that later) [Ran13, BFRR14b, BFRR14a, RRS15a, RRS15b, BCH+16]. Applications to the shortest path problem.

A B C D E 30 10 20 5 10 20 5

֒ → Find a path of minimal length in a weighted graph (Dijkstra, Bellman-Ford, etc) [CGR96].

Reconciling Rationality and Stochasticity Mickael Randour 9 / 21

slide-11
SLIDE 11

Rationality & stochasticity Planning a journey Synthesis Conclusion

Aim of this illustration

Flavor of = types of useful strategies in stochastic environments. Based on a series of papers, most in a computer science setting (more on that later) [Ran13, BFRR14b, BFRR14a, RRS15a, RRS15b, BCH+16]. Applications to the shortest path problem.

A B C D E 30 10 20 5 10 20 5

What if the environment is uncertain? E.g., in case of heavy traffic, some roads may be crowded.

Reconciling Rationality and Stochasticity Mickael Randour 9 / 21

slide-12
SLIDE 12

Rationality & stochasticity Planning a journey Synthesis Conclusion

Planning a journey in an uncertain environment

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Each action takes time, target = work. What kind of strategies are we looking for when the environment is stochastic (MDP)?

Reconciling Rationality and Stochasticity Mickael Randour 10 / 21

slide-13
SLIDE 13

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 1: minimize the expected time to work

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

“Average” performance: meaningful when you journey often. Simple strategies suffice: no memory, no randomness. Taking the car is optimal: Eσ

D(TSwork) = 33.

Reconciling Rationality and Stochasticity Mickael Randour 11 / 21

slide-14
SLIDE 14

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 2: traveling without taking too many risks

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Minimizing the expected time to destination makes sense if we travel

  • ften and it is not a problem to be late.

With car, in 10% of the cases, the journey takes 71 minutes.

Reconciling Rationality and Stochasticity Mickael Randour 12 / 21

slide-15
SLIDE 15

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 2: traveling without taking too many risks

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Most bosses will not be happy if we are late too often. . . what if we are risk-averse and want to avoid that?

Reconciling Rationality and Stochasticity Mickael Randour 12 / 21

slide-16
SLIDE 16

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 2: maximize the probability to be on time

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Specification: reach work within 40 minutes with 0.95 probability Sample strategy: take the train Pσ

D

  • TSwork ≤ 40
  • = 0.99

Bad choices: car (0.9) and bike (0.0)

Reconciling Rationality and Stochasticity Mickael Randour 13 / 21

slide-17
SLIDE 17

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 3: strict worst-case guarantees

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Specification: guarantee that work is reached within 60 minutes (to avoid missing an important meeting) Sample strategy: bike worst-case reaching time = 45 minutes. Bad choices: train (wc = ∞) and car (wc = 71)

Reconciling Rationality and Stochasticity Mickael Randour 14 / 21

slide-18
SLIDE 18

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 3: strict worst-case guarantees

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Worst-case analysis two-player zero-sum game against a ratio- nal antagonistic adversary (bad guy) forget about probabilities and give the choice of transitions to the adversary

Reconciling Rationality and Stochasticity Mickael Randour 14 / 21

slide-19
SLIDE 19

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 4: minimize the expected time under strict worst-case guarantees

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Expected time: car E = 33 but wc = 71 > 60 Worst-case: bike wc = 45 < 60 but E = 45 >>> 33

Reconciling Rationality and Stochasticity Mickael Randour 15 / 21

slide-20
SLIDE 20

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 4: minimize the expected time under strict worst-case guarantees

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

In practice, we want both! Can we do better? Beyond worst-case synthesis [BFRR14b, BFRR14a]: minimize the expected time under the worst-case constraint.

Reconciling Rationality and Stochasticity Mickael Randour 15 / 21

slide-21
SLIDE 21

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 4: minimize the expected time under strict worst-case guarantees

home waiting room train light traffic medium traffic heavy traffic work

railway, 2 car, 1 wait, 3 relax, 35 go back, 2 bike, 45 drive, 20 drive, 30 drive, 70 0.1 0.9 0.2 0.7 0.1 0.1 0.9

Sample strategy: try train up to 3 delays then switch to bike. wc = 58 < 60 and E ≈ 37.34 << 45 Strategies need memory more complex!

Reconciling Rationality and Stochasticity Mickael Randour 15 / 21

slide-22
SLIDE 22

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 5: multiple objectives ⇒ trade-offs

home work car wreck

bus, 30, 3 taxi, 10, 20 0.7 0.99 0.01 0.3

Two-dimensional weights on actions: time and cost. Often necessary to consider trade-offs: e.g., between the probability to reach work in due time and the risks of an expensive journey.

Reconciling Rationality and Stochasticity Mickael Randour 16 / 21

slide-23
SLIDE 23

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 5: multiple objectives ⇒ trade-offs

home work car wreck

bus, 30, 3 taxi, 10, 20 0.7 0.99 0.01 0.3

Solution 2 (probability) can only ensure a single constraint. C1: 80% of runs reach work in at most 40 minutes.

Taxi ≤ 10 minutes with probability 0.99 > 0.8.

C2: 50% of them cost at most 10$ to reach work.

Bus ≥ 70% of the runs reach work for 3$.

Taxi | = C2, bus | = C1. What if we want C1 ∧ C2?

Reconciling Rationality and Stochasticity Mickael Randour 16 / 21

slide-24
SLIDE 24

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 5: multiple objectives ⇒ trade-offs

home work car wreck

bus, 30, 3 taxi, 10, 20 0.7 0.99 0.01 0.3

C1: 80% of runs reach work in at most 40 minutes. C2: 50% of them cost at most 10$ to reach work. Study of multi-constraint percentile queries [RRS15a]. Sample strategy: bus once, then taxi. Requires memory. Another strategy: bus with probability 3/5, taxi with probability 2/5. Requires randomness.

Reconciling Rationality and Stochasticity Mickael Randour 16 / 21

slide-25
SLIDE 25

Rationality & stochasticity Planning a journey Synthesis Conclusion

Solution 5: multiple objectives ⇒ trade-offs

home work car wreck

bus, 30, 3 taxi, 10, 20 0.7 0.99 0.01 0.3

C1: 80% of runs reach work in at most 40 minutes. C2: 50% of them cost at most 10$ to reach work. Study of multi-constraint percentile queries [RRS15a]. In general, both memory and randomness are required. = previous problems more complex!

Reconciling Rationality and Stochasticity Mickael Randour 16 / 21

slide-26
SLIDE 26

Rationality & stochasticity Planning a journey Synthesis Conclusion

1 Rationality & stochasticity 2 Planning a journey in an uncertain environment 3 Synthesis of reliable reactive systems 4 Conclusion

Reconciling Rationality and Stochasticity Mickael Randour 17 / 21

slide-27
SLIDE 27

Rationality & stochasticity Planning a journey Synthesis Conclusion

Controller synthesis

Setting:

a reactive system to control, an interacting environment, a specification to enforce.

For critical systems (e.g., airplane controller, power plants, ABS), testing is not enough!

⇒ Need formal methods.

Automated synthesis of provably-correct and efficient controllers:

mathematical frameworks,

֒ → e.g., games on graphs [GTW02, Ran13, Ran14]

software tools.

Reconciling Rationality and Stochasticity Mickael Randour 18 / 21

slide-28
SLIDE 28

Rationality & stochasticity Planning a journey Synthesis Conclusion

Strategy synthesis in stochastic environments

Strategy = formal model of how to control the system

system description environment description informal specification model as a Markov Decision Process (MDP) model as a winning

  • bjective

synthesis is there a winning strategy ? empower system capabilities

  • r weaken

specification requirements strategy = controller no yes

1 How complex is it to decide if

a winning strategy exists?

2 How complex such a strategy

needs to be? Simpler is better.

3 Can we synthesize one

efficiently? ⇒ Depends on the winning

  • bjective, the exact type of

interaction, etc.

Reconciling Rationality and Stochasticity Mickael Randour 19 / 21

slide-29
SLIDE 29

Rationality & stochasticity Planning a journey Synthesis Conclusion

Some other objectives

The example was about shortest path objectives, but there are many more! Some examples based on energy applications. Energy: operate with a (bounded) fuel tank and never run

  • ut of fuel [BFL+08].

Mean-payoff: average cost/reward (or energy consumption) per action in the long run [EM79]. Average-energy: energy objective + optimize the long-run average amount of fuel in the tank [BMR+15]. Also inspired by economics: Discounted sum: simulates interest or inflation [BCF+13].

Reconciling Rationality and Stochasticity Mickael Randour 20 / 21

slide-30
SLIDE 30

Rationality & stochasticity Planning a journey Synthesis Conclusion

Conclusion

Our research aims at: defining meaningful strategy concepts, providing algorithms and tools to compute those strategies, classifying the complexity of the different problems from a theoretical standpoint.

֒ → Is it mathematically possible to obtain efficient algorithms?

Take-home message

Rich behavioral models are natural and important in computer science (e.g., synthesis). Maybe they can be useful in other areas too. E.g., in economics: combining sufficient risk-avoidance and profitable expected return, value-at-risk models.

Thank you! Any question?

Reconciling Rationality and Stochasticity Mickael Randour 21 / 21

slide-31
SLIDE 31

References I

  • T. Br´

azdil, T. Chen, V. Forejt, P. Novotn´ y, and A. Simaitis. Solvency Markov decision processes with interest. In Proc. of FSTTCS, volume 24 of LIPIcs, pages 487–499. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2013.

  • R. Brenguier, L. Clemente, P. Hunter, G.A. P´

erez, M. Randour, J.-F. Raskin, O. Sankur, and M. Sassolas. Non-zero sum games for reactive synthesis. In Proc. of LATA, LNCS 9618, pages 3–23. Springer, 2016.

  • P. Bouyer, U. Fahrenberg, K.G. Larsen, N. Markey, and J. Srba.

Infinite runs in weighted timed automata with energy constraints. In Proc. of FORMATS, LNCS 5215, pages 33–47. Springer, 2008.

  • V. Bruy`

ere, E. Filiot, M. Randour, and J.-F. Raskin. Expectations or guarantees? I want it all! A crossroad between games and MDPs. In Proc. of SR, EPTCS 146, pages 1–8, 2014.

  • V. Bruy`

ere, E. Filiot, M. Randour, and J.-F. Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. In Proc. of STACS, LIPIcs 25, pages 199–213. Schloss Dagstuhl - LZI, 2014.

  • P. Bouyer, N. Markey, M. Randour, K.G. Larsen, and S. Laursen.

Average-energy games. In Proc. of GandALF, EPTCS 193, pages 1–15, 2015. Reconciling Rationality and Stochasticity Mickael Randour 22 / 21

slide-32
SLIDE 32

References II

B.V. Cherkassky, A.V. Goldberg, and T. Radzik. Shortest paths algorithms: Theory and experimental evaluation.

  • Math. programming, 73(2):129–174, 1996.
  • A. Ehrenfeucht and J. Mycielski.

Positional strategies for mean payoff games. International Journal of Game Theory, 8:109–113, 1979.

  • J. Filar and K. Vrieze.

Competitive Markov decision processes. Springer, 1997.

  • E. Gr¨

adel, W. Thomas, and T. Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research, LNCS 2500. Springer, 2002. M.J. Osborne and A. Rubinstein. A Course in Game Theory. MIT Press, 1994. M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.

  • M. Randour.

Automated synthesis of reliable and efficient systems through game theory: A case study. In Proceedings of the European Conference on Complex Systems 2012, Springer Proceedings in Complexity XVII, pages 731–738. Springer, 2013. Reconciling Rationality and Stochasticity Mickael Randour 23 / 21

slide-33
SLIDE 33

References III

  • M. Randour.

Synthesis in Multi-Criteria Quantitative Games. PhD thesis, Universit´ e de Mons, Belgium, 2014.

  • M. Randour.

Reconciling rationality and stochasticity: Rich behavioral models in two-player games. CoRR, abs/1603.05072, 2016.

  • M. Randour.

Reconciling rationality and stochasticity: Rich behavioral models in two-player games. Talk at GAMES 2016 - 5th World Congress of the Game Theory Society, 2016.

  • M. Randour, J.-F. Raskin, and O. Sankur.

Percentile queries in multi-dimensional Markov decision processes. In Proc. of CAV, LNCS 9206, pages 123–139. Springer, 2015.

  • M. Randour, J.-F. Raskin, and O. Sankur.

Variations on the stochastic shortest path problem. In Proc. of VMCAI, LNCS 8931, pages 1–18. Springer, 2015. Reconciling Rationality and Stochasticity Mickael Randour 24 / 21

slide-34
SLIDE 34

Algorithmic complexity: hierarchy of problems

For shortest path

complexity NP∩coNP P LOGSPACE LOG N P NP-c c

  • N

P coNP-c PSPACE EXPTIME EXPSPACE . . . ELEMENTARY . . . 2EXPTIME PR DECIDABLE UNDECIDABLE not computable by an algorithm

Solutions 1 (E) and 3 (wc) Solution 4 (BWC) Solutions 2 (P) and 5 (percentile)

Reconciling Rationality and Stochasticity Mickael Randour 25 / 21