Distributed intelligence in multi agent systems Usman Khan - - PowerPoint PPT Presentation

distributed intelligence in multi agent systems
SMART_READER_LITE
LIVE PREVIEW

Distributed intelligence in multi agent systems Usman Khan - - PowerPoint PPT Presentation

Department of Electrical and Computer Engineering Distributed intelligence in multi agent systems Usman Khan Department of Electrical and Computer Engineering Tufts University Workshop on Distributed Optimization, Information Processing, and


slide-1
SLIDE 1

Department of Electrical and Computer Engineering

Distributed intelligence in multi‐agent systems

Usman Khan Department of Electrical and Computer Engineering Tufts University Workshop on Distributed Optimization, Information Processing, and Learning Rutgers University August 21, 2017

slide-2
SLIDE 2

Department of Electrical and Computer Engineering

Who am I

Usman A. Khan

Associate Professor, Tufts

Postdoc

U‐Penn

Education

PhD, Carnegie Mellon MS, UW‐Madison BS, Pakistan

2

Tufts Harvard MIT

slide-3
SLIDE 3

Department of Electrical and Computer Engineering

My Research Lab: Projects and demos

3

Inspecting leaks in NASA’s lunar habitat Aerial Formation Flying Inference in Social Networks SHM over a campus footbridge

1 2 3 4 5 6 7 8

Dowling Hall Upper Campus North

slide-4
SLIDE 4

Department of Electrical and Computer Engineering

Trailer

4

SPARTN—Signal Processing and RoboTic Networks Lab at Tufts

slide-5
SLIDE 5

Department of Electrical and Computer Engineering

My Research Lab: Theory

5

Reza (2011‐ 15): Graph‐ theoretic estimation

Best paper Journal cover

Xi (2012‐16): Optimization

  • ver directed

graphs

4 TAC papers

Sam (2013‐): Fusion in non‐ deterministic graphs

2 Best papers 6 IEEE journal papers

Fakhteh (2014‐): Distributed estimation cont…d Xin (2016‐): Optimization, Graph theory

slide-6
SLIDE 6

Department of Electrical and Computer Engineering

My Research: In depth

Distributed Intelligence in multi‐agent systems

Estimation, optimization, and control over graphs (networks)

Mobile Dynamic Heterogeneous Directed Autonomous Non‐deterministic Applications:

Cyber‐physical systems, IoTs, Big Data Aerial SHM, Power grid, Personal exposome Distributed Optimization: Path planning and Formation control

6

slide-7
SLIDE 7

Department of Electrical and Computer Engineering

Optimization over directed graphs

7

slide-8
SLIDE 8

Department of Electrical and Computer Engineering

Problem

  • Agents interact over a graph
  • Directional informational flow
  • No center with all information

8

slide-9
SLIDE 9

Department of Electrical and Computer Engineering

A nice solution

  • Gradient Descent
  • No one knows the function f
  • Local Gradient Descent
  • Converges to only to a local optimal
  • Distributed Gradient Descent [Nedich et al., 2009]: Fuse Information

9

slide-10
SLIDE 10

Department of Electrical and Computer Engineering

Distributed Gradient Descent

  • Distributed Gradient Descent
  • W={wij} is a doubly‐stochastic matrix (underlying graph is balanced)
  • Step‐size goes to zero (but not too fast)
  • Agreement:
  • Optimality:
  • Lets do a simple analysis…

10

slide-11
SLIDE 11

Department of Electrical and Computer Engineering

  • Let W be CS but not RS
  • Then

, no agreement!

Distributed Gradient Descent

  • Distributed Gradient Descent
  • Assume the corresponding sequences converge to their limits

11

  • Let W be RS but not CS
  • Then , i.e., agreement
  • But suboptimal!
slide-12
SLIDE 12

Department of Electrical and Computer Engineering

Distributed Gradient Descent

  • Distributed Gradient Descent
  • If W is RS but not CS (unbalanced directed graphs), agents agree on a

suboptimal solution

  • Consider a modification (Nedich 2013, similar in spirit but with different execution):
  • Row‐stochasticity guarantees agreement, scaling ensures optimality
  • Estimate the left eigenvector?

12

slide-13
SLIDE 13

Department of Electrical and Computer Engineering

Estimating the left eigenvector

  • A = {aij} is row‐stochastic with
  • Consider the following iteration:
  • Every agent learns the entire left eigenvector asymptotically
  • Similar method learns the right eigenvector for CS matrices

13

slide-14
SLIDE 14

Department of Electrical and Computer Engineering

Optimization over directed graphs: Recipe

  • 1. Design row‐ or column‐stochastic weights
  • 2. Estimate the non‐1 eigenvector for the eval of 1
  • 3. Scale to remove the imbalance
  • Side note: Push‐sum algorithm (Gehrke et al., 2003; Vetterli et al., 2010)

14

slide-15
SLIDE 15

Department of Electrical and Computer Engineering

Related work (a very small sample)

  • Algorithms over undirected graphs:

Distributed Gradient Descent (Nedich et al., 2009) Non‐smooth EXTRA (Yin et al., Apr. 2014) Fuses information over past two iterates Use gradient information over past two iterates Smooth, Strong‐convexity, Linear convergence NEXT (Scutari et al., Dec. 2015) Functions are smooth non‐convex + non‐smooth convex Harnessing smoothness … (Li et al., May 2016) Some similarities to EXTRA

15

slide-16
SLIDE 16

Department of Electrical and Computer Engineering

Related work (a small sample)

  • Add push‐sum to the previous obtain algorithms for directed graphs:

Gradient Push (Nedich et al., 2013) Sub‐linear convergence DEXTRA (Khan et al., Oct. 2015) Strong‐convexity, Linear convergence Difficult to compute step‐size interval SONATA (Scutari et al., Jul. 2016) Functions are (smooth non‐convex + non‐smooth convex) Sub‐linear convergence ADD‐OPT (Khan et al., Jun. 2016) and PUSH‐DIGing (Nedich et al., Jul. 2016) Strong‐convexity, Linear convergence Step‐size interval lower bound is 0

  • All these algorithms employ column‐stochastic matrices

16

slide-17
SLIDE 17

Department of Electrical and Computer Engineering

Column‐ vs. Row‐stochastic Weights

  • Incoming weights are simpler to design
  • For column sum to be 1, agent i cannot design the incoming weights as it

does not know the neighbors of i1 and i2

  • Column‐stochastic weights thus are designed at outgoing edges
  • Requires the knowledge of out‐neighbors or out‐degree

17

i i4 i3 i1 i2

slide-18
SLIDE 18

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • A = {aij} is row‐stochastic
  • Row‐stochastic weight design is simple
  • However, in contrast to CS methods:

Agents run an nth order consensus for the left eigenvector Agents need unique identifiers

18

slide-19
SLIDE 19

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • A = {aij} is row‐stochastic
  • Vector form of the algorithm:
  • In contrast, with a column‐stochastic B, ADDOPT/PUSH‐DIGing is:
  • Iterate does not result in agreement
  • The function argument is scaled by the right eigenvector
  • Ensures optimality

19

slide-20
SLIDE 20

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • Algorithm:
  • A simple intuitive argument:
  • Assume each sequence converges to its limit, then
  • Every agent agrees on c

20

slide-21
SLIDE 21

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • Algorithm:
  • Show that c is the optimal solution
  • Sum the update over k:

21

slide-22
SLIDE 22

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • Algorithm:
  • Asymptotically

22

slide-23
SLIDE 23

Department of Electrical and Computer Engineering

Optimization with Row‐stochastic weights

  • Algorithm:
  • We assumed that the sequences reach their limit
  • However, under what conditions and at what rate?

23

slide-24
SLIDE 24

Department of Electrical and Computer Engineering

Convergence conditions

  • Assume strong‐connectivity, Lipschitz‐continuous

gradients, strongly‐convex functions

  • Consider
  • If some norm of tk goes to 0, then each element goes to 0 and the

sequences converge to their limits

24

slide-25
SLIDE 25

Department of Electrical and Computer Engineering

Convergence conditions

  • Assume strong‐connectivity, Lipschitz‐continuous

gradients, strongly‐convex functions

  • Lemma: Hk goes to 0 linearly
  • Lemma: Spectral radius of G is less than 1

25

slide-26
SLIDE 26

Department of Electrical and Computer Engineering

Convergence conditions

  • 26
slide-27
SLIDE 27

Department of Electrical and Computer Engineering

Convergence Rate

  • The rate variable γ is the max of fusion rate and the rate at which G

decays

27

slide-28
SLIDE 28

Department of Electrical and Computer Engineering

Some comparison

28

slide-29
SLIDE 29

Department of Electrical and Computer Engineering

Conclusions

  • Optimization with row‐stochastic matrices
  • Does not require the knowledge of out‐neighbors or out‐degree

Agents require unique identifiers

  • Strongly‐convex functions with Lipschitz‐continuous graidents
  • Strongly‐connected directed graphs
  • Linear convergence

29

slide-30
SLIDE 30

Department of Electrical and Computer Engineering

More Information

My webpage: http://www.eecs.tufts.edu/~khan/ My email: khan@ece.tufts.edu My Lab’s YouTube channel: https://www.youtube.com/user/SPARTNatTufts/videos/

30