quan col
play

quancol . ........ . . . ... ... ... ... ... ... ... SFM, - PowerPoint PPT Presentation

www.quanticol.eu Mean-field methods: what can go wrong? The decoupling assumption: a zoom on the fixed point and on mean-field games Nicolas Gast (Inria) and Luca Bortolussi (UNITS) Inria, Grenoble, France SFM, Bertinoro, June 21, 2016


  1. www.quanticol.eu The decoupling method: finite and infinite time horizon 1 Illustration of the method Finite time horizon: some theory Steady-state regime Rate of convergence 2 Optimal control and mean-field games 3 Centralized control Decentralized control and games Conclusion and recap 4 SFM, Bertinoro, June 21, 2016 13 / 59

  2. The fixed point method www.quanticol.eu Markov chain Transient regime p = pK ˙ t → ∞ Stationary π K = 0 1Performance analysis of the IEEE 802.11 distributed coordination function. 2Fixed point analys is of single cell IEEE 802.11e WLANs: Uniqueness, multistability. 3Performance analysis of exponenetial backoff. SFM, Bertinoro, June 21, 2016 14 / 4New insights from a fixed-point analysis of single cell IEEE 802.11 WLANs. 59

  3. The fixed point method www.quanticol.eu Markov chain Mean-field Transient regime p = pK ˙ x = xQ ( x ) ˙ N → ∞ t → ∞ Stationary xQ ( x ) = 0 π K = 0 ? fixed points Bianchi 00 1 Ramaiyan et al. 08 2 Method was used in many papers: Kwak et al. 05 3 Kumar et al 08 4 1Performance analysis of the IEEE 802.11 distributed coordination function. 2Fixed point analys is of single cell IEEE 802.11e WLANs: Uniqueness, multistability. 3Performance analysis of exponenetial backoff. SFM, Bertinoro, June 21, 2016 14 / 4New insights from a fixed-point analysis of single cell IEEE 802.11 WLANs. 59

  4. Does it always work? 56 www.quanticol.eu SIRS model: � A node S becomes I at rate 1 (external infection) � When a S meets an I, it becomes infected at rate 1 / ( S + a ) � An I recovers at rate 5. � A node R becomes S by: � meeting a node S (rate 10 S ) � alone (at rate 10 − 3 ). 5 Benaim Le Boudec 08 6 Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling SFM, Bertinoro, June 21, 2016 15 / Assumption for Analyzing 802.11 MAC Protoco. 2010 59

  5. Does it always work? 56 www.quanticol.eu SIRS model: � A node S becomes I at rate 1 (external infection) � When a S meets an I, it becomes infected at rate 1 / ( S + a ) � An I recovers at rate 5. � A node R becomes S by: � meeting a node S (rate 10 S ) � alone (at rate 10 − 3 ). I 1 + 10 I 5 S + a 10 S + 10 − 3 S R 5 Benaim Le Boudec 08 6 Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling SFM, Bertinoro, June 21, 2016 15 / Assumption for Analyzing 802.11 MAC Protoco. 2010 59

  6. Does it always work? 78 www.quanticol.eu � Markov chain is irreducible. I � Unique fixed point xQ ( x ) = 0. 1 + 10 I 5 S + a S 10 S + 10 − 3 R 7 Benaim Le Boudec 08 8 Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling SFM, Bertinoro, June 21, 2016 16 / Assumption for Analyzing 802.11 MAC Protoco. 2010 59

  7. Does it always work? 78 www.quanticol.eu � Markov chain is irreducible. I � Unique fixed point xQ ( x ) = 0. Fixed point Stat. measure 1 + 10 I 5 xQ ( x ) = 0 N = 1000 S + a x S x I π S π I a = . 3 0.209 0.234 0.209 0.234 S 10 S + 10 − 3 R 7 Benaim Le Boudec 08 8 Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling SFM, Bertinoro, June 21, 2016 16 / Assumption for Analyzing 802.11 MAC Protoco. 2010 59

  8. Does it always work? 78 www.quanticol.eu � Markov chain is irreducible. I � Unique fixed point xQ ( x ) = 0. Fixed point Stat. measure 1 + 10 I 5 xQ ( x ) = 0 N = 1000 S + a x S x I π S π I a = . 3 0.209 0.234 0.209 0.234 S 10 S + 10 − 3 R a = . 1 0.078 0.126 0.11 0.13 7 Benaim Le Boudec 08 8 Cho, Le Boudec, Jiang, On the Asymptotic Validity of the Decoupling SFM, Bertinoro, June 21, 2016 16 / Assumption for Analyzing 802.11 MAC Protoco. 2010 59

  9. What happened? www.quanticol.eu 0.5 x S (mean-field) 0.4 0.3 x S 0.2 0.1 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time SFM, Bertinoro, June 21, 2016 17 / 59

  10. What happened? www.quanticol.eu 0.5 x S (mean-field) X N S ( N = 1000 ) 0.4 0.3 x S 0.2 0.1 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Time SFM, Bertinoro, June 21, 2016 17 / 59

  11. What happened? www.quanticol.eu 0.5 x S (mean-field) X N S ( N = 1000 ) 0.4 0.3 x S 0.2 0.1 0.0 0 5 10 15 20 25 30 Time SFM, Bertinoro, June 21, 2016 17 / 59

  12. What happened? www.quanticol.eu ( x S = 0 . 078 , x I = 0 . 126) , ( π S = 0 . 11 , π I = 0 . 13) I 0.0 1.0 1.0 0.0 0.0 1.0 SFM, Bertinoro, June 21, 2016 18 / R S 59

  13. What happened? www.quanticol.eu ( x S = 0 . 078 , x I = 0 . 126) , ( π S = 0 . 11 , π I = 0 . 13) I 0.0 1.0 true stationnary distribution Fixed point 1.0 0.0 0.0 1.0 SFM, Bertinoro, June 21, 2016 18 / R S 59

  14. What happened? www.quanticol.eu ( x S = 0 . 078 , x I = 0 . 126) , ( π S = 0 . 11 , π I = 0 . 13) I 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 limit cycle true stationnary distribution Fixed point 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 SFM, Bertinoro, June 21, 2016 18 / R S 59

  15. Fixed points? www.quanticol.eu Markov chain Transient regime p = pK ˙ t → ∞ Stationary π K = 0 SFM, Bertinoro, June 21, 2016 19 / 59

  16. Fixed points? www.quanticol.eu Markov chain Mean-field Transient regime p = pK ˙ x = xQ ( x ) ˙ N → ∞ t → ∞ Stationary xQ ( x ) = 0 π K = 0 ? fixed points SFM, Bertinoro, June 21, 2016 19 / 59

  17. Fixed points? www.quanticol.eu Markov chain Mean-field Transient regime p = pK ˙ x = xQ ( x ) ˙ N → ∞ t → ∞ t → ∞ Stationary xQ ( x ) = 0 xQ ( x ) = 0 π K = 0 N → ∞ fixed points SFM, Bertinoro, June 21, 2016 19 / 59

  18. Fixed points? www.quanticol.eu Markov chain Mean-field Transient regime p = pK ˙ x = xQ ( x ) ˙ N → ∞ t → ∞ if yes t → ∞ Stationary xQ ( x ) = 0 xQ ( x ) = 0 π K = 0 N → ∞ fixed points then yes Theorem ((i) Benaim Le Boudec 08,(ii) Le Boudec 12) The stationary distribution π N concentrates on the fixed points if : (i) All trajectories of the ODE converges to the fixed points. (ii) (or) The Markov chain is reversible. SFM, Bertinoro, June 21, 2016 19 / 59

  19. Steady-state: theorem www.quanticol.eu Theorem Let us consider a mean-field model for which x N converges to the solution of ˙ x = f ( x ) . Then: � If all trajectories converge to a unique fixed point x ∗ , the π N converges to x ∗ . Note: unique fixed point implies the decoupling assumption: SFM, Bertinoro, June 21, 2016 20 / 59

  20. Quiz www.quanticol.eu Consider the SIRS model: Under the stationary distribu- tion π N : S (A) As there are no fixed 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 point, there is no such stationary distribution. (B) P ( X 1 = S , X 2 = S ) ≈ limit cycle P ( X 1 = S ) P ( X 2 = S ) (C) P ( X 1 = S , X 2 = S ) > true stationnary distribution Fixed point P ( X 1 = S ) P ( X 2 = S ) 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 R I (D) P ( X 1 = S , X 2 = S ) < P ( X 1 = S ) P ( X 2 = S ) SFM, Bertinoro, June 21, 2016 21 / 59

  21. Quiz www.quanticol.eu Consider the SIRS model: Under the stationary distribu- tion π N : S (A) As there are no fixed 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 point, there is no such stationary distribution. (B) P ( X 1 = S , X 2 = S ) ≈ limit cycle P ( X 1 = S ) P ( X 2 = S ) positive correlation (C) P ( X 1 = S , X 2 = S ) > true stationnary distribution Fixed point P ( X 1 = S ) P ( X 2 = S ) 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 R I (D) P ( X 1 = S , X 2 = S ) < P ( X 1 = S ) P ( X 2 = S ) Answer: C P ( X 1 ( t ) = S , X 2 ( t ) = S ) = x 1 ( t ) 2 . Thus: positively correlated. SFM, Bertinoro, June 21, 2016 21 / 59

  22. Lyapunov functions How to show that trajectories converge to a fixed point? www.quanticol.eu SFM, Bertinoro, June 21, 2016 22 / 59

  23. Lyapunov functions How to show that trajectories converge to a fixed point? www.quanticol.eu A solution of d dt x ( t ) = xQ ( x ( t )) converges to the fixed points of xQ ( x ) = 0, if there exists a Lyapunov function f , that is: � Lower bounded: inf x f ( x ) > + ∞ � Decreasing along trajectories: d dt f ( x ( t )) < 0 , whenever x ( t ) Q ( x ( t )) � = 0. SFM, Bertinoro, June 21, 2016 22 / 59

  24. Lyapunov functions How to show that trajectories converge to a fixed point? www.quanticol.eu A solution of d dt x ( t ) = xQ ( x ( t )) converges to the fixed points of xQ ( x ) = 0, if there exists a Lyapunov function f , that is: � Lower bounded: inf x f ( x ) > + ∞ � Decreasing along trajectories: d dt f ( x ( t )) < 0 , whenever x ( t ) Q ( x ( t )) � = 0. How to find a Lyapunov function � Energy? Distance? Entropy? Luck? SFM, Bertinoro, June 21, 2016 22 / 59

  25. The relative entropy is a Lyapunov function for Markov chains www.quanticol.eu Let Q be the generator of an irreducible Markov chain and π be its stationary distribution. Let P ( t ) be the solution of d dt P ( t ) = P ( t ) Q . Theorem (e.g. Budhiraja et al 15, Dupuis-Fischer 11) The relative entropy P i log P i � R ( P � π ) = π i i is a Lyapunov function: d dt R ( P ( t ) � π ) < 0 , with equality if and only if P ( t ) = π . SFM, Bertinoro, June 21, 2016 23 / 59

  26. Relative entropy for mean-field models www.quanticol.eu Assume that Q ( x ) be a generator of an irreducible Markov chain and let π ( x ) be its stationary distribution. Let P ( t ) be the solution of d dt P ( t ) = P ( t ) Q ( P ( t )). Then dt R ( P ( t ) � π ( t )) = d d dt P ( t ) ∂ + d dt π ( t ) ∂ ∂ P R ( P ( t ) , π ( t )) ∂π R ( P ( t ) , π ( t )) � �� � � �� � ≤ 0 i x i ( t ) d = − � dt log π i ( t ) x i ( t ) d � ≤ − dt log π i ( t ) i SFM, Bertinoro, June 21, 2016 24 / 59

  27. Relative entropy for mean-field models www.quanticol.eu Assume that Q ( x ) be a generator of an irreducible Markov chain and let π ( x ) be its stationary distribution. Let P ( t ) be the solution of d dt P ( t ) = P ( t ) Q ( P ( t )). Then dt R ( P ( t ) � π ( t )) = d d dt P ( t ) ∂ + d dt π ( t ) ∂ ∂ P R ( P ( t ) , π ( t )) ∂π R ( P ( t ) , π ( t )) � �� � � �� � ≤ 0 i x i ( t ) d = − � dt log π i ( t ) x i ( t ) d � ≤ − dt log π i ( t ) i Theorem If there exists a lower bounded integral F ( x ) of − � i x i ( t ) d dt log π i ( t ) , then x �→ R ( x � π ( x )) + F ( x ) is a Lyapunov function for the mean-field model. SFM, Bertinoro, June 21, 2016 24 / 59

  28. The decoupling assumption: conclusion www.quanticol.eu � Decoupling ≈ mean-field convergence � If the rates are continuous, convergence holds for the transient regime � The stationary regime should be handle with care � The uniqueness of the fixed point is not enough. � Lyapunov functions can help but are not easy to find. SFM, Bertinoro, June 21, 2016 25 / 59

  29. Outline www.quanticol.eu The decoupling method: finite and infinite time horizon 1 Illustration of the method Finite time horizon: some theory Steady-state regime Rate of convergence 2 Optimal control and mean-field games 3 Centralized control Decentralized control and games Conclusion and recap 4 SFM, Bertinoro, June 21, 2016 26 / 59

  30. A martingale argument www.quanticol.eu The drift of a mean-field model is X ( t ) satisfies 1 lim dt E [ X ( t + dt ) − X ( t ) | X ( t ) = x ] = f ( x ) dt → 0 1 lim dt var [ X ( t + dt ) − X ( t ) − f ( X ( t )) | X ( t ) = x ] ≤ C / N dt → 0 This means that: � t M ( t ) = X ( t ) − ( x 0 − f ( X ( s )) ds ) 0 is such that: E [ M ( t ) | F s ] = M ( s ) ∧ var [ M ( t )] ≤ Ct / N . � �� � � �� � Small variance M ( t ) is a martingale SFM, Bertinoro, June 21, 2016 27 / 59

  31. Martingale concentration results www.quanticol.eu Let M ( t ) be such that: E [ M ( t ) | F s ] = M ( s ) ∧ var [ M ( t )] ≤ C / N . � �� � � �� � Small variance M ( t ) is a martingale Then: (Doob’s inequality): � � C sup � M ( t ) � ≥ ǫ ≤ N ǫ 2 . P t ≤ T SFM, Bertinoro, June 21, 2016 28 / 59

  32. Mean-field convergence www.quanticol.eu Going back to slide 1, we have: � t X ( t ) = x 0 + f ( X ( s )) ds + M ( t ) � �� � 0 small by previous slide SFM, Bertinoro, June 21, 2016 29 / 59

  33. Mean-field convergence www.quanticol.eu Going back to slide 1, we have: � t X ( t ) = x 0 + f ( X ( s )) ds + M ( t ) � �� � 0 small by previous slide Is X ( t ) close to ˙ x = f ( x )? SFM, Bertinoro, June 21, 2016 29 / 59

  34. The initial value problem “Dynamical systems 101” www.quanticol.eu The initial value problem: � ˙ x = f ( x ) x (0) = x 0 ∈ R d . The existence and solution is guaranteed by the Picard-Cauchy theorem: � If f is Lipschitz-continuous on R d , then there exists a unique solution on [0 , T ]. SFM, Bertinoro, June 21, 2016 30 / 59

  35. Uniqueness of solution “Dynamical system 101 (ctn)” www.quanticol.eu Reminder: f is Lipschitz-continuous if there exists L such that: ∀ x , y ∈ R d : � f ( x ) − f ( y ) � ≤ L � x − y � . SFM, Bertinoro, June 21, 2016 31 / 59

  36. Uniqueness of solution “Dynamical system 101 (ctn)” www.quanticol.eu Reminder: f is Lipschitz-continuous if there exists L such that: ∀ x , y ∈ R d : � f ( x ) − f ( y ) � ≤ L � x − y � . � t � t If x ( t ) = x 0 + 0 f ( x ( s )) ds and y ( t ) = y 0 + 0 f ( y ( s )) ds + ε then � t � x ( t ) − y ( t ) � ≤ L � x ( s ) − y ( s ) � + � x 0 − y 0 � + ε. 0 SFM, Bertinoro, June 21, 2016 31 / 59

  37. Uniqueness of solution “Dynamical system 101 (ctn)” www.quanticol.eu Reminder: f is Lipschitz-continuous if there exists L such that: ∀ x , y ∈ R d : � f ( x ) − f ( y ) � ≤ L � x − y � . � t � t If x ( t ) = x 0 + 0 f ( x ( s )) ds and y ( t ) = y 0 + 0 f ( y ( s )) ds + ε then � t � x ( t ) − y ( t ) � ≤ L � x ( s ) − y ( s ) � + � x 0 − y 0 � + ε. 0 Gronwall’s Lemma: this implies that � x ( t ) − y ( t ) � ≤ ( � x 0 − y 0 � + ε ) e Lt . SFM, Bertinoro, June 21, 2016 31 / 59

  38. Consequence www.quanticol.eu Theorem If X N (0) = x 0 , then: � � � 1 � � � � X N ( t ) − x ( t ) e LT . � � sup ≤ O √ E � N t ≤ T SFM, Bertinoro, June 21, 2016 32 / 59

  39. Rate of convergence: recap and some extensions www.quanticol.eu The speed of convergence can be extended to � Non-smooth dynamics (one sided Lipschitz functions) � Steady-state (if f is C 2 and unique attractor) � E [ X ( t )] It cannot be extended to � General non-Lipschitz dynamics. � Steady-state with no attractor. SFM, Bertinoro, June 21, 2016 33 / 59

  40. Outline www.quanticol.eu The decoupling method: finite and infinite time horizon 1 Illustration of the method Finite time horizon: some theory Steady-state regime Rate of convergence 2 Optimal control and mean-field games 3 Centralized control Decentralized control and games Conclusion and recap 4 SFM, Bertinoro, June 21, 2016 34 / 59

  41. Optimal control www.quanticol.eu � Stochastic optimal control: closed-loop policies actions(t+1)=function(state(t)). � Deterministic optimal control: open-loop policies are optimal. SFM, Bertinoro, June 21, 2016 35 / 59

  42. Markov decision processes Reference: Puterman (2014) www.quanticol.eu Definition: a Markov decision process (MDP) � State space / action space � Transition probabilities : p ( X ( t + 1) = j | X ( t ) = i , action ) � Instantaneous cost: cost(t, state, action). � Objective: min E [ cost ( t , X t , action )] SFM, Bertinoro, June 21, 2016 36 / 59

  43. Markov decision processes Reference: Puterman (2014) www.quanticol.eu Example: You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Definition: a Markov decision process (MDP) � State space { 1. . . 6 } / action space = { stop, continue } � Transition probabilities : p ( X ( t + 1) = j | X ( t ) = i , action ) p ( X ( t + 1) = i ) = 1 / 6 if continue. p ( X ( t + 1) = X ( t )) = 1 if stop. � Instantaneous cost: cost(t, state, action). � Objective: min E [ cost ( t , X t , action )] SFM, Bertinoro, June 21, 2016 36 / 59

  44. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 2 Example: 3 4 5 6 SFM, Bertinoro, June 21, 2016 37 / 59

  45. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 1 2 2 Example: 3 3 4 4 5 5 6 6 SFM, Bertinoro, June 21, 2016 37 / 59

  46. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 3.5 1 2 3.5 2 Example: 3 3.5 3 4 4 4 5 5 5 6 6 6 SFM, Bertinoro, June 21, 2016 37 / 59

  47. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 4.25 3.5 1 2 4.25 3.5 2 Example: 3 4.25 3.5 3 4 4.25 4 4 5 5 5 5 6 6 6 6 SFM, Bertinoro, June 21, 2016 37 / 59

  48. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 4.66 4.25 3.5 1 2 4.66 4.25 3.5 2 Example: 3 4.66 4.25 3.5 3 4 4.66 4.25 4 4 5 5 5 5 5 6 6 6 6 6 SFM, Bertinoro, June 21, 2016 37 / 59

  49. Example of Markov decision process www.quanticol.eu You can throw a 6-face dice up to 5 times. You win the number on the last dice. When should you stop? Value iteration (Bellman’s equation) V t ( i ) = max action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . t 1 2 3 4 5 i 1 4.95 4.66 4.25 3.5 1 2 4.95 4.66 4.25 3.5 2 Example: 3 4.95 4.66 4.25 3.5 3 4 4.95 4.66 4.25 4 4 5 5 5 5 5 5 6 6 6 6 6 6 SFM, Bertinoro, June 21, 2016 37 / 59

  50. The curse of dimensionality www.quanticol.eu To solve Bellman’s equation, we need to iterate over the whole state space. V t ( i ) = min action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . SFM, Bertinoro, June 21, 2016 38 / 59

  51. The curse of dimensionality www.quanticol.eu To solve Bellman’s equation, we need to iterate over the whole state space. V t ( i ) = min action cost ( t , i , action )+ E [ V t +1 ( X ( t + 1) | X ( t ) = i , action )] . Alternative: � Approximate dynamic programming (learning) � Mean-field optimal control SFM, Bertinoro, June 21, 2016 38 / 59

  52. Example of mean-field control www.quanticol.eu MDP Mean-field optimization Find π ( t , X ) to minimize Find a ( t ) to minimize �� � V π, N = E � T cost ( X t , π ( t , X t )) V a = cost ( x t , a t ) dt t 0 subject to P ( X t +1 = i | X t = subject to ˙ x t = f ( x t , a t ) j , π ( . ) = a ) = P i , j , a . SFM, Bertinoro, June 21, 2016 39 / 59

  53. Example of mean-field control www.quanticol.eu MDP Mean-field optimization Find π ( t , X ) to minimize Find a ( t ) to minimize �� � V π, N = E � T cost ( X t , π ( t , X t )) V a = cost ( x t , a t ) dt t 0 subject to P ( X t +1 = i | X t = subject to ˙ x t = f ( x t , a t ) j , π ( . ) = a ) = P i , j , a . Theorem (G. Gaujal, Le Boudec 2012) If the drift and costs are Lipschitz, then � the V N , ∗ → V ∗ � An open-loop policy a ∗ is optimal SFM, Bertinoro, June 21, 2016 39 / 59

  54. Mean-field control: example www.quanticol.eu ρ γ m I ( t ) I R 0 Proportion of Recovered and Vaccinated Population S Proportion of Susceptible Population π ( t ) V 20 80 40 60 OPT: MAX MFE: MAX 60 40 80 20 OPT: NO OPT: MAX MFE: NO MFE: NO 0 0 20 40 60 80 SFM, Bertinoro, June 21, 2016 40 / Proportion of Infected Population 59

  55. Outline www.quanticol.eu The decoupling method: finite and infinite time horizon 1 Illustration of the method Finite time horizon: some theory Steady-state regime Rate of convergence 2 Optimal control and mean-field games 3 Centralized control Decentralized control and games Conclusion and recap 4 SFM, Bertinoro, June 21, 2016 41 / 59

  56. Motivation www.quanticol.eu Mean field games (Lions and Lasry, 2007 and Caines, 2007) capture the dynamic evolution of a large population of strategic players. SFM, Bertinoro, June 21, 2016 42 / 59

  57. Game Taxinomy www.quanticol.eu � static games: payoff � Stochastic (repeated) matrix per player. games: payoff is the Strategy of one player is a (disc.) sum from 0 to T . (randomized) action. Strategy of a player is a policy (function). � population games: infinite number of identical � Mean field games: players. Players profiles replaced dynamic games over by action profiles. infinite number of players. SFM, Bertinoro, June 21, 2016 43 / 59

  58. Game Taxinomy www.quanticol.eu � static games: payoff � Stochastic (repeated) matrix per player. games: payoff is the Strategy of one player is a (disc.) sum from 0 to T . (randomized) action. Strategy of a player is a Solution of the game: policy (function). Nash equilibrium. Solution: Sub-game Perfect Eq. + folk � population games: infinite theorem. number of identical � Mean field games: players. Players profiles replaced dynamic games over by action profiles. infinite number of players. Solution of the game: Solution of the game: Wardrop equilibrium mean field equilibrium. SFM, Bertinoro, June 21, 2016 43 / 59

  59. Static game example The prisoner’s dilemma www.quanticol.eu Two possible actions: { C , D } . The cost matrix is: C D C 1 , 1 3 , 0 (1) D 0 , 3 2 , 2 SFM, Bertinoro, June 21, 2016 44 / 59

  60. Static game example The prisoner’s dilemma www.quanticol.eu Two possible actions: { C , D } . The cost matrix is: C D C 1 , 1 3 , 0 (1) D 0 , 3 2 , 2 Lemma There exists a unique Nash equilibrium that consists in playing D. SFM, Bertinoro, June 21, 2016 44 / 59

  61. Do the equilibria converge? www.quanticol.eu repeat Static game ( N players) Stochastic (repeated games) N → ∞ repeat population games Mean-field games SFM, Bertinoro, June 21, 2016 45 / 59

  62. Do the equilibria converge? www.quanticol.eu repeat Static game ( N players) Stochastic (repeated games) ?? N → + ∞ ?? N → ∞ repeat population games Mean-field games SFM, Bertinoro, June 21, 2016 45 / 59

  63. Stochastic Games with Identical Players www.quanticol.eu Introduced by Shapley, 1953. Here, players are interchangeable: the dynamics, the costs and the strategies only depend on the population distribution . State at time t : X ( t ) = ( X 1 ( t ) , . . . , X n ( t ) , . . . , X N ( t )), with X n ( t ) ∈ S (finite set). SFM, Bertinoro, June 21, 2016 46 / 59

  64. Stochastic Games with Identical Players www.quanticol.eu Introduced by Shapley, 1953. Here, players are interchangeable: the dynamics, the costs and the strategies only depend on the population distribution . State at time t : X ( t ) = ( X 1 ( t ) , . . . , X n ( t ) , . . . , X N ( t )), with X n ( t ) ∈ S (finite set). evolves in continuous time: player n takes actions A n ( t ) ∈ A at instants distributed w.r.t. a Poisson process, independently of the others. SFM, Bertinoro, June 21, 2016 46 / 59

  65. Stochastic Games Dynamics and costs www.quanticol.eu Players interact according to a mean-field model: � � � � P X n ( t + dt ) = j � X n ( t ) = i , A n ( t ) = a , M ( t ) = m = P ij ( a , m ) dt � Strategy of a player: π : ( X ( t ) , m ) �→ A ( t ). SFM, Bertinoro, June 21, 2016 47 / 59

  66. Stochastic Games Dynamics and costs www.quanticol.eu Players interact according to a mean-field model: � � � � P X n ( t + dt ) = j � X n ( t ) = i , A n ( t ) = a , M ( t ) = m = P ij ( a , m ) dt � Strategy of a player: π : ( X ( t ) , m ) �→ A ( t ). Instantaneous cost: C ( X n ( t ) , A n ( t ) , M ( t )). Player n chooses a strategy π n to minimize her expected β -discounted payoff V ( π n , π ), knowing the strategies of the others: �� � A n has d.b. π n � V N ( π n , π ) = E e − β t C ( X n ( t ) , A n ( t ) , M ( t )) � A n ′ has d.b. π ( n ′ � = � SFM, Bertinoro, June 21, 2016 47 / 59

  67. Stochastic Games Nash Equilibria www.quanticol.eu Definition (Nash Equilibrium) For a given set of strategies Π, a strategy π ∈ Π is called a symmetric Nash equilibrium in Π for the N -player game if, for any strategy π n ∈ Π, V N ( π, π ) ≤ V N ( π n , π ) . Existence is guaranteed when the dynamics and the costs are continuous functions of the population (Fink, 1964). SFM, Bertinoro, June 21, 2016 48 / 59

  68. Mean-Field Game Model www.quanticol.eu In the mean-field limit, the population distribution m π ( t ) ∈ P ( S ) satisfies the mean-field equation: � � m π m π i ( t ) Q ij ( a , m π ( t )) π i , a ( m π ( t )) . ˙ j ( t ) = (2) i ∈S a ∈A SFM, Bertinoro, June 21, 2016 49 / 59

  69. Mean-Field Game Model www.quanticol.eu In the mean-field limit, the population distribution m π ( t ) ∈ P ( S ) satisfies the mean-field equation: � � m π m π i ( t ) Q ij ( a , m π ( t )) π i , a ( m π ( t )) . ˙ j ( t ) = (2) i ∈S a ∈A We focus on a particular player, that we call Player 0. Thanks to the decoupling assumption, the P ( X 0 = j ) = x j satisfies: � � x i ( t ) Q ij ( a , m π ( t )) π n x j ( t ) = ˙ i , a ( t ) . (3) i ∈S a ∈A SFM, Bertinoro, June 21, 2016 49 / 59

  70. Mean-Field Game Model Instantaneous cost and mean-field equilibria www.quanticol.eu The discounted cost of Player 0 is �� � � ∞ � V ( π 0 , π ) = x i ( t ) C i , a ( m π ( t )) π 0 i , a ( m π ( t )) e − β t dt , 0 i ∈S a ∈A Definition (Mean-Field Equilibrium) A strategy is a (symmetric) mean-field equilibrium if V ( π MFE , π MFE ) ≤ V ( π, π MFE ) . SFM, Bertinoro, June 21, 2016 50 / 59

  71. Convergence of continuous policies www.quanticol.eu Theorem (Existence of equilibrium, Doncel, G., Gaujal 2016) Assume that Q ij ( a , m ) and C ia ( m ) are continuous in m . Then, there always exists a mean-field equilibrium. Applying the Kakutani fixed point theorem for infinite dimension spaces to the population distribution (instead of directly to strategies). Does not require convexity assumptions as in Gomes, Mohr, Souza, 2013. Theorem (Convergence, Tembine et al., 2009) If C i , a ( m ) , Q ij ( a , m ) and the policy π i ( m ) are continuous in m then the population of the finite game converges to the solution of the differential equation (2) and the evolution of one player converges to the solution of (3) . SFM, Bertinoro, June 21, 2016 51 / Question : where is the catch? 59

  72. Non-convergence in General www.quanticol.eu We consider a matching game version of the prisoner’s dilemma. The state space: S = { C , D } and A = S . Population distribution is m = ( m C , m D ). Cost of a player: � m C + 3 m D if i = C C i , i ( m ) = 2 m D if i = D This is the expected cost of a player matched with another player at random and using the cost matrix: C D C 1 , 1 3 , 0 (4) D 0 , 3 2 , 2 Lemma There exists a unique mean-field equilibrium π ∞ that consists in always playing D. SFM, Bertinoro, June 21, 2016 52 / 59

  73. Non-convergence in General (II) www.quanticol.eu Let us define the following stationary strategy for N players: � D if M C < 1 π N ( M ) = C if M C = 1 . “play C as long as everyone else is playing C. Play D as soon as another player deviates to D.” SFM, Bertinoro, June 21, 2016 53 / 59

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend