Team Optimal Control of Coupled Subsystems with Mean-Field Sharing - PowerPoint PPT Presentation

Team Optimal Control of Coupled Subsystems with Mean-Field Sharing Jalal Arabneydi and Aditya Mahajan Electrical and Computer Engineering Department, McGill University Email: jalal.arabneydi@mail.mcgill.ca Date: December 15th, 2014 (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 1 / 23

Outline Introduction 1 Problem Formulation & Main results 2 Example 3 Generalizations 4 Summary 5 (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 2 / 23

Motivation What do we mean by team control problem? Any setup in which agents (decision makers) need to collaborate with each other to achieve a common task. Team optimal control of decentralized stochastic systems arises in applications in: Networked control systems Robotics Communication networks Transportation networks Sensor networks Smart grids Economics Etc. No solution approach exists for general infinite-horizon decentralized control systems. In general, these problems belong to NEXP complexity class. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 3 / 23

Brief Literature Review Classical information structure: All agents have identical information. Non-classical information structure: Agents have different information sets. Examples of non-classical information structure: Static team (Radner 1962, Marschack and Radner 1972) Dynamic team (Witsenhausen 1971, Witsenhausen 1973) Specific information structure Partially nested (Ho and Chu 1972) One-step delayed sharing (Witsenhausen 1971, Yoshikawa 1978) n-step delayed sharing (Witsenhausen 1971, Varaiya 1978, Nayyar 2011) Common past sharing (Aicardi 1978) Periodic sharing (Ooi 1997) Belief sharing (Yuksel 2009) Partial history sharing (Nayyar 2013) This work introduces a new information structure : Mean-field sharing (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 4 / 23

Problem Formulation Notation: N : Number of homogeneous subsystems (not necessarily large). X i t 2 X : State of subsystem i 2 { 1 , . . . , N } at time t . U i t 2 U : Action of subsystem i 2 { 1 , . . . , N } at time t . Mean-Field: N N Z t ( x ) = 1 Z t = 1 X X ( X i t = x ) , x 2 X or δ X i t . N N i =1 i =1 All system variables are finite-valued. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 5 / 23

Problem Formulation Problem statement: Dynamics of subsystem i : X i t +1 = f t ( X i t , U i t , W i t , Z t ) , i 2 { 1 , . . . , N } . Mean-field sharing Information structure: U i t = g i t ( Z 1: t , X i t ), where g i t is called control law of subsystem i at time t . Control strategy: The collection g i = ( g i 1 , . . . , g i T ) of control laws of subsystem i over time is control strategy of subsystem i . The collection g = ( g 1 , . . . , g N ) of control strategies is control strategy of the system. Optimization problem: Let X t = ( X i t ) N i =1 and U t = ( U i t ) N i =1 . We are interested in finding a strategy g that minimizes " T # X g J ( g ) = ` t ( X t , U t ) . t =1 (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 6 / 23

Problem Formulation Assumptions: (A1) Initial states ( X i 1 ) N i =1 are i.i.d. random variables. (A2) Disturbances at time t , ( W i t ) N i =1 , are i.i.d. random variables. (A3) Let X t := ( X i t ) N i =1 and W t := ( W i t ) N i =1 ; then, { X 1 , { W t } T t =1 } are mutually independent. (A4) All controllers use identical control laws . Note that: (A1), (A2), and (A3) are standard assumptions in Markov decision problems. In general,(A4) leads to a loss in performance. However, it is a standard assumption in the literature on large scale systems for reasons of simplicity , fairness , and robustness . (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 7 / 23

Main Results We identify a dynamic program to compute an optimal strategy. In particular, Theorem 2: Let ∗ t be a solution to the following dynamic program: at time t for every z t V t ( z t ) = min γ t ( [ ` t ( X t , U t ) + V t +1 ( Z t +1 ) | Z t = z t , Γ t = γ t ]) where γ t : X ! U and γ t = t ( z t ). Define g ∗ t ( z , x ) := ∗ t ( z )( x ) , 8 x 2 X , 8 z . Then, g ∗ = ( g ∗ 1 , . . . , g ∗ T ) is an optimal strategy. Salient feature of the model: Very few assumptions on the model. Allow for mean-field coupled dynamics. Allow for arbitrary coupled cost. (We do not assume cost to be weakly coupled.) (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 8 / 23

Main Results Salient feature of the results: Computing globally optimal solution. Solution approach works for arbitrary number of controllers . State space of dynamic program increases polynomially (rather than exponentially) w.r.t. the number of controllers. Action space of dynamic program does not depend on the number of controllers. The size of information state does not increase with time; hence, the results naturally extend to infinite horizon under standard assumptions. The results extend naturally to randomized strategies by considering ∆( U ) as the action space. Since the dynamic program is based on common information, each agent can in- dependently solve the dynamic program and compute the optimal strategy in a decentralized manner . (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 9 / 23

Proof Approach Step 1 : We follow common information approach [Nayyar, Mahajan, and Teneket- zis 2013], and convert the decentralized control problem into a centralized control problem. Step 2 : We exploit the symmetry of the problem (with respect to the controllers) to show that the mean-field Z t is an information state for the centralized problem identified in Step 1. We then use this information state Z t to obtain a dynamic programming decomposition. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 10 / 23

Step 1: An Equivalent Centralized System We define Γ t and t as follows: Γ t ( · ) := g t ( Z 1: t , · ) , Γ t : X 7! U , Γ t = t ( Z 1: t ) := g t ( Z 1: t , · ) . Symmetric control laws assumption g i t =: g t , 8 i , implies that Γ i t =: Γ t , 8 i . Equivalent Centralized Control Problem The objective is to minimize " T # ˆ ψ X ` t ( X t , Γ t ( X 1 t ) , . . . , Γ t ( X N J ( ψ ) = t )) . t =1 (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 11 / 23

Step 2: Identifying an Information State Lemma 2: For any choice γ 1: t of Γ 1: t , any realization z 1: t of Z 1: t , and any x 2 X N , ( x 2 H ( z t )) ( X t = x | Z 1: t = z 1: t , Γ 1: t = γ 1: t ) = ( X t = x | Z t = z t ) = | H ( z t ) | where H ( z ):= { x 2X N : 1 P N i =1 δ x i = z } . N Proof Outline: By induction, it is shown above conditional probability is indifferent to permutation of x ; hence, mean-field is sufficient to characterize it. The latter property is proved using the symmetry of the model and the control laws. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 12 / 23

Step 2: Identifying an Information State Lemma 3: The expected per-step cost may be written as a function of Z t and Γ t . In particular, there exists a function ˆ ` t (that does not depend on strategy ψ ) s.t. [ ` t ( X t , Γ t ( X 1 t ) , . . . , Γ t ( X N t )) | Z 1: t , Γ 1: t ] =: ˆ ` t ( Z t , Γ t ) . Proof Outline: Consider [ ` t ( X t , Γ t ( X 1 t ) , . . . , Γ t ( X N t )) | Z 1: t = z 1: t , Γ 1: t = γ 1: t ] X ` t ( x , γ t ( x 1 ) , . . . , γ t ( x N )) ( X t = x | Z 1: t = z 1: t , Γ 1: t = γ 1: t ) =: ˆ = ` t ( Z t , Γ t ) . x Substituting the result of Lemma 2, and simplifying gives the result. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 13 / 23

Step 2: Identifying an Information State Lemma 4: For any choice γ 1: t of Γ 1: t , any realization z 1: t of Z 1: t , and any z , ( Z t +1 = z | Z 1: t = z 1: t , Γ 1: t = γ 1: t ) = ( Z t +1 = z | Z t = z t , Γ t = γ t ) . Also, the above conditional probability does not depend on strategy ψ . Proof Outline: The result relies on the independence of the noise processes across subsystems and Lemma 2. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 14 / 23

Dynamic Program Theorem 1: In the equivalent centralized problem, there is no loss of optimality in restricting attention to Markov strategy i.e. Γ t = t ( Z t ). Furthermore, optimal policy ψ ∗ is obtained by solving the following dynamic program γ t (ˆ V t ( z t ) = min ` t ( z t , γ t ) + [ V t +1 ( Z t +1 ) | Z t = z t , Γ t = γ t ]) where γ t : X ! U . Proof Outline: Z t is an information state for the equivalent centralized problem because: As shown in Lemma 3, the per-step cost can be written as a function of Z t and Γ t . As shown in Lemma 4, { Z t } T t =1 a controlled Markov process with control action Γ t . Thus, the result follows from standard results in Markov decision theory. (J. Arabneydi, Email: jalal.arabneydi@mail.mcgill.ca ) Conference on Decision and Control 2014 15 / 23

Team Optimal Control of Coupled Subsystems with Mean-Field Sharing - PowerPoint PPT Presentation

Team Optimal Control of Coupled Subsystems with Mean-Field Sharing Jalal Arabneydi and Aditya Mahajan Electrical and Computer Engineering Department, McGill University Email: jalal.arabneydi@mail.mcgill.ca Date: December 15th, 2014 (J.

Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill

Mission Updates Payload and Subsystems Updates Rocket and Subsystems Updates

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Pawel K. Olszewski, PhD pawel@waikato.ac.nz TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MIT ROCKET TEAM FLIGHT READINESS REVIEW 2 Overview Mission Updates Rocket and Subsystems

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Statistics in Biology The Mean Mean ( x ) is a measure of the central tendency of a set of data

Notion of mean point in the data Why bother about mean point? Defining mean point can be

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

As a prelude to the back-analysis intended for the full MAE Center report that is currently under

and Subsystems A follow up session on UE4s async execution model Michele Mischitelli Main

MapReduce Design Patterns This section is based on the book by Jimmy Lin Now lets look at

Applying Hierarchical and Role-Based Access Control to XML Documents Jason Crampton Information

Separation Logic for Non-local Control Flow and Block Scope Variables Robbert Krebbers Joint

Type- and Control-Flow Analysis Matthew Fluet mtf@cs.rit.edu Department of Computer Science

iRODS UGM2017 Welcome Carolien Besselink CIO Utrecht University Information and Technology

Cyber Security Information Sharing Oscar Serrano NCI Agency Cyber Security Service Line

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs 32nd ACM International

Cloud: How Big Is Your Risk? Prasidh Srikanth Booth #450 Agenda Cloud BYOD Security Booth

Team Optimal Control of Coupled Subsystems with Mean-Field Sharing - PowerPoint PPT Presentation

Team Optimal Control of Coupled Subsystems with Mean-Field Sharing Jalal Arabneydi and Aditya Mahajan Electrical and Computer Engineering Department, McGill University Email: jalal.arabneydi@mail.mcgill.ca Date: December 15th, 2014 (J.

Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill

Mission Updates Payload and Subsystems Updates Rocket and Subsystems Updates

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Pawel K. Olszewski, PhD pawel@waikato.ac.nz TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM TEAM

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MIT ROCKET TEAM FLIGHT READINESS REVIEW 2 Overview Mission Updates Rocket and Subsystems

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Statistics in Biology The Mean Mean ( x ) is a measure of the central tendency of a set of data

Notion of mean point in the data Why bother about mean point? Defining mean point can be

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

As a prelude to the back-analysis intended for the full MAE Center report that is currently under

and Subsystems A follow up session on UE4s async execution model Michele Mischitelli Main

MapReduce Design Patterns This section is based on the book by Jimmy Lin Now lets look at

Applying Hierarchical and Role-Based Access Control to XML Documents Jason Crampton Information

Separation Logic for Non-local Control Flow and Block Scope Variables Robbert Krebbers Joint

Type- and Control-Flow Analysis Matthew Fluet mtf@cs.rit.edu Department of Computer Science

iRODS UGM2017 Welcome Carolien Besselink CIO Utrecht University Information and Technology

Cyber Security Information Sharing Oscar Serrano NCI Agency Cyber Security Service Line

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs 32nd ACM International

Cloud: How Big Is Your Risk? Prasidh Srikanth Booth #450 Agenda Cloud BYOD Security Booth

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root