DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION - PowerPoint PPT Presentation

MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION Nicolas Gast, Bruno Gaujal Jean-Yves Le Boudec, Jan 24, 2012 1

Contents 1. Mean Field Interaction Model 2. Mean Field Interaction Model with Central Control 3. Convergence and Asymptotically Optimal Policy 4. Performance of sub-optimal policies 2

1 MEAN FIELD INTERACTION MODEL 3

Mean Field Interaction Model Time is discrete “Occupancy measure” M N (t) = distribution of object states at time t N objects, N large Example [Khouzani 2010 ]: Object n has state X n (t) M N (t) = (S(t), I(t), R(t), D(t)) (X N 1 (t), …, X N N (t)) is Markov with S(t)+ I(t) + R(t) + D(t) =1 Objects are observable only S(t) = proportion of nodes in through their state state `S’ β I S I α q b D R 4

Mean Field Interaction Model Time is discrete “Occupancy measure” M N (t) = distribution of object states at time t N objects, N large Object n has state X n (t) Theorem [Gast (2011)] (X N 1 (t), …, X N N (t)) is Markov M N (t) is Markov Objects are observable only Called “ Mean Field through their state Interaction Models ” in the Performance Evaluation community [McDonald(2007), Benaïm and Le Boudec(2008)] 5

Intensity I(N) I(N) = expected number of transitions per object per time unit A mean field limit occurs when we re-scale time by I(N) i.e. we consider X N (t/I(N)) I(N) = O(1): mean field limit is in discrete time [Le Boudec et al (2007)] I(N) = O(1/N): mean field limit is in continuous time [Benaïm and Le Boudec (2008)] 6

Virus Infection [Khouzani 2010] α = 0.1 I N nodes, homogeneous, pairwise (S+R, I) (S, I) meetings mean field limit One interaction per time slot, I(N) = 1/N ; mean field limit is an ODE dead nodes Occupancy measure is M(t) = (S(t), I(t), R(t), D(t)) with S(t)+ I(t) + R(t) + D(t) =1 S+R S(t) = proportion of nodes in state `S’ or S N = 100, q=b =0.1, β =0.6 α = 0.7 I β I S I α q b D S+R R or S 7

The Mean Field Limit Under very general conditions (given later) the occupancy measure converges, in law, to a deterministic process, m(t), called the mean field limit Finite State Space => ODE 8

Sufficient Conditions for Convergence [Kurtz 1970], see also [Bordenav et al 2008], [Graham 2000] Sufficient conditon verifiable by inspection: Example: I(N) = 1/N Second moment of number of objects affected in one timeslot = o(N) Similar result when mean field limit is in discrete time [Le Boudec et al 2007] 9

2 MEAN FIELD INTERACTION MODEL WITH CENTRAL CONTROL 10

Markov Decision Process Central controller Policy π selects action at every time slot Action state A (metric, compact) Optimal policy can be assumed Markovian Running reward depends on (X N 1 (t), …, X N N (t)) -> action state and action Controller observes only Goal : maximize expected object states reward over horizon T => π depends on M N (t) only 11

Example θ = 0.68 θ = 0.65 θ = 0. 8 12

Optimal Control Optimal Control Problem Can be found by iterative methods Find a policy π that achieves (or approaches) the supremum in State space explosion (for m) m is the initial condition of occupancy measure 13

Can We Replace MDP By Mean Field Limit ? Assume the mean field model converges to fluid limit for every action E.g. mean and std dev of transitions per time slot is O(1) Can we replace MDP by optimal control of mean field limit ? 14

Controlled ODE Mean field limit is an ODE Control = action function α (t) Example: if t > t 0 α ( t ) = 1 else α ( t ) = 0 α α 15

Optimal Control for Fluid Limit t 0 =5.6 Optimal function α (t) Can be obtained with Pontryagin’s maximum principle or Hamilton Jacobi Bellman equation. t 0 =1 t 0 =25 16

3 CONVERGENCE, ASYMPTOTICALLY OPTIMAL POLICY 17

Convergence Theorem Theorem [Gast 2011] Under reasonable regularity and scaling assumptions: Optimal value for system Optimal value for fluid with N objects (MDP) limit 18

Convergence Theorem Theorem [Gast 2011] Under reasonable regularity and scaling assumptions: Does this give us an asymptotically optimal policy ? Optimal policy of system with N objects may not converge 19

Asymptotically Optimal Policy Let be an optimal policy Theorem [Gast 2011] for mean field limit Define the following control for the system with N objects At time slot k, pick same action Optimal value for system with N objects (MDP) as optimal fluid limit would take at time t = k I(N) Value of this policy This defines a time dependent policy. Let = value function when applying to system with N objects 20

4 Asymptotic evaluation of policies 22

Control policies exhibit discontinuities N servers, speed 1-p One central server, speed pN serves LQF (taken from Tsitsiklis, Xu 11) The drift is: 1 Discontinuity arrises because of the strategy LQF. 23

Differential inclusions as good approx. Replace by differential Discontinuous ODE: inclusion Here : no solution Theorem [Gast-2011b] Under reasonnable scaling assumptions (but w ithout regularity) • The differential inclusion has at least one solution • As N grows, X(t) goes to the solutions of the DI. • If unique attractor x*, the stationary distribution concentrates on x*. 24

In (Tsitsiklis,Xu 2011), they use an ad-hoc argument to show that as N grows, the steady state concentrates on Easily retrieved by solving the equation 0 F(x) 25

Conclusions Optimal control on mean field limit is justified A practical, asymptotically optimal policy can be derived Use of differential inclusion to evaluate policies. 26

Questions ? [Gast 2011] N. Gast, B. Gaujal, and J.Y. Le Boudec. Mean field for Markov Decision Processes: from Discrete to Continuous Optimization. To appear in IEEE Transaction on Automatic Control , 2012 [Gast 2011b] N. Gast and B. Gaujal. Markov chains with discontinuous drifts have differential inclusions limits. application to stochastic stability and mean field approximation. Inria EE 7315. Short version: N. Gast and B. Gaujal. Mean eld limit of non-smooth systems and differential inclusions. MAMA Workshop , 2010. [Ethier and Kurtz (2005)] Stewart Ethieru and Thomas Kurtz. Markov Processes, Characterization and Convergence. Wiley 2005. [Benaim and Le Boudec(2008)] M Benaim and JY Le Boudec. A class of mean field interaction models for computer and communication systems, Performance Evaluation, 65 (11-12): 823 — 838. 2008 [Khouzani 2010] M.H.R. Khouzani, S. Sarkar, and E. Altman. Maximum damage malware attack in mobile wireless networks. In IEEE Infocom , San Diego, 2010 27

DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION - PowerPoint PPT Presentation

MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION Nicolas Gast, Bruno Gaujal Jean-Yves Le Boudec, Jan 24, 2012 1 Contents 1. Mean Field Interaction Model 2. Mean Field Interaction Model with Central

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes

Renewal Processes Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic Processes 8, October 27

S C DECISION E N C E decision science SDS CMU What is Decision Science? Behavioral

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Birth-Death Processes Birth-Death Processes: Transient Solution Poisson Process: State

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Introduction to Simulation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Scientific Computing I Part II: A Continuous Model The Heat Equation Module 5: Heat Transfer

What are we going to study in this class? Ling324 Meaning from Linguistic Expression: Semantics

Te Text Generation from Kn Knowledge Graphs with Gr Graph Transforme rmers NAACL19 Rik

Towards a Formal Semantics for FHM, Part I FPL Away Days 2011 Henrik Nilsson Joint work with

Outline Simulation modeling characteristics CSCI: 4210/6210 Concept of Time A DES

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Towards a Theory of Information Flow -Transducers in the Finitary Process Soup Dynamics

Sambuz

Useful Links

Newsletter

Mail Us

DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION - PowerPoint PPT Presentation

MEAN FIELD FOR MARKOV DECISION PROCESSES: FROM DISCRETE TO CONTINUOUS OPTIMIZATION Nicolas Gast, Bruno Gaujal Jean-Yves Le Boudec, Jan 24, 2012 1 Contents 1. Mean Field Interaction Model 2. Mean Field Interaction Model with Central

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Implementing Processes Implementing Processes Review: Threads vs vs. Processes . Processes

Renewal Processes Bo Friis Nielsen 1 1 DTU Informatics 02407 Stochastic Processes 8, October 27

S C DECISION E N C E decision science SDS CMU What is Decision Science? Behavioral

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Birth-Death Processes Birth-Death Processes: Transient Solution Poisson Process: State

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Introduction to Simulation Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Scientific Computing I Part II: A Continuous Model The Heat Equation Module 5: Heat Transfer

What are we going to study in this class? Ling324 Meaning from Linguistic Expression: Semantics

Te Text Generation from Kn Knowledge Graphs with Gr Graph Transforme rmers NAACL19 Rik

Towards a Formal Semantics for FHM, Part I FPL Away Days 2011 Henrik Nilsson Joint work with

Outline Simulation modeling characteristics CSCI: 4210/6210 Concept of Time A DES

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical

Towards a Theory of Information Flow -Transducers in the Finitary Process Soup Dynamics

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Simulation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation