ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS Giorgio Picci - - PowerPoint PPT Presentation

almost sure convergence of random gossip algorithms
SMART_READER_LITE
LIVE PREVIEW

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS Giorgio Picci - - PowerPoint PPT Presentation

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS Giorgio Picci with T. Taylor, ASU Tempe AZ. Wofgang Runggaldiers Birthday, Brixen July 2007 1 CONSENSUS FOR RANDOM GOSSIP ALGORITHMS Consider a finite set of nodes representing say


slide-1
SLIDE 1

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS

Giorgio Picci

with T. Taylor, ASU Tempe AZ. Wofgang Runggaldier’s Birthday, Brixen July 2007

1

slide-2
SLIDE 2

CONSENSUS FOR RANDOM GOSSIP ALGORITHMS

Consider a finite set of nodes representing say wireless sensors or dis- tributed computing units, can they achieve a common goal by exchang- ing information only locally ? exchanging information locally for the purpose of forming a common esti- mate of some physical variable x; Each node k forms his own estimate xk(t), t ∈ Z+ and updates according to exchange of information with a neighbor. Neighboring pairs are chosen randomly Q: will all local estimates {xk(t), k = 1,...,n} converge to the same value as t → ∞?.

2

slide-3
SLIDE 3

DYNAMICS OF RANDOM GOSSIP ALGORITHMS

While two nodes vi and vj are in communication, they exchange informa- tion to refine their own estimate using the neighbor’s estimate. Model this adjustament in discrete time by a simple symmetric linear rela- tion xi(t +1) = xi(t)+ p(x j(t)−xi(t)) xj(t +1) = xj(t)+ p(xi(t)−xj(t)) where p is some positive gain parameter modeling the speed of adjust-

  • ment. For stability need to impose that |1 − 2p| ≤ 1 and hence 0 ≤ p ≤ 1.

For p = 1

2 you take the average of the two measurements so that xi(t +1) =

x j(t +1).

3

slide-4
SLIDE 4

DYNAMICS OF RANDOM GOSSIP ALGORITHMS

The whole coordinate vector x(t) ∈ Rn evolves according to x(t +1) = A(e)x(t), the matrix A(e) ∈ Rn×n depending on the edge e = vivj selected at that par- ticular time instant; A(e) =

            

1 ··· ... ... ... . . . 1− p ··· p . . . 1 . . . . . . ... . . . . . . 1 . . . p ··· 1− p ... 1

            

= In − p

  • 1vi −1v j
  • 1vi −1v j

T

4

slide-5
SLIDE 5

EIGENSPACES

The vector 1vi has the ith entry equal to 1 and zero otherwise. A(e) is a symmetric doubly stochastic matrix. The value 1−2p is a simple eigenvalue associated to the eigenvector (1vi −1v j), A(e)(1vi −1vj) = (1vi −1v j)− p(1vi −1v j)2 = (1−2p)(1vi −1v j) the orthogonal (codimension one) subspace

  • 1vi −1vj

is the eigenspace

  • f the eigenvalue 1.

Let 1 := [1,...,1]⊤. Want x(t) to converge to the subspace {1} := {α1; α ∈ R}. This would be automatically true for a fixed irreducible d-stochastic matrix.

5

slide-6
SLIDE 6

A CONTROLLABILITY LEMMA

Lemma 1 Let G = (V,E) be a graph. Then

  • 1vi −1vj

span {1}; i.e. span{1vi −1v j : (vivj) ∈ E} = 1⊥ iff G is connected. Corollary 1 Let G′ = (V,E′) with E′ ⊆ E be a subgraph of G. Let {ei : 1 ≤ i ≤ m′} be an ordering of E′, and let π denote a permutation of {1,2,··· ,m′}. Let B(E′,π) = m′

i=1A(eπi), where the product is ordered from right to left.

Then B(E′,π)

1⊥ < 1

if and only if G′ is connected.

6

slide-7
SLIDE 7

THE EDGE PROCESS

Let Ω = EN, be the space of all semi-infinite sequences taking values in E, and let σ : Ω → Ω denote the shift map: σ(e0,e1,e2,··· ,en,···) = (e1,e2,··· ,en,···). Let evk : Ω → E denote the evaluation on the kth term. Let µ denote an ergodic shift invariant probability measure on Ω, so that the edge process e(k) : ω → evk(ω) is ergodic. Special cases: e(k) is iid, or an ergodic Markov chain. However, what we shall do works for general ergodic processes. Consider the function C : Ω×Z → Rn×n, C(ω,t) :=

t−1

  • i=0

A(evi(ω)) =

t−1

  • i=0

A(ev0(σiω)) which by stationarity of e obeys the composition rule C(ω,t +s) =C(σtω,s)C(ω,t) with C(ω,0) = I. Such a function is called a matrix cocycle.

7

slide-8
SLIDE 8

MULTIPLICATIVE ERGODIC THEOREM

Theorem 1 [Oseledet’s Multiplicative Ergodic Theorem] Let µ be a shift invariant probability measure on Ω and suppose that the shift map σ : Ω → Ω is ergodic and that log+C(ω,t) is in L1. Then the limit Λ = lim

t→∞

  • C(ω,t)TC(ω,t)

1

2t

(1) exists with probability one, is symmetric and nonnegative definite, and is µ a.s. independent of ω. Let λ1 < λ2 < ···λk for k ≤ n be the distinct eigenvalues of Λ, let Ui denote the eigenspace of λi, and let Vi = i

j=1Uj.

Then for u ∈ Vi −Vi−1, lim

t→∞

1 t logC(ω,t)u = log(λi). (2) The numbers λi are called the Lyapunov exponents of C.

8

slide-9
SLIDE 9

MULTIPLICATIVE ERGODIC THEOREM

The Lyapunov exponents control the exponential rate of convergence (or non-convergence) to consensus. The matrices A(e) are doubly stochastic as are any matrix products of them, C(ω,t). If follows that the constant functions on V, {1}, as well as the mean zero functions in {1}⊥ are invariant under the action of this cocycle and of its transpose. Thus these subspaces are also invariant under the limiting matrix Λ of the Oseledet’s theorem. There is a Lyapunov exponent associated with the subspace {1} which, it is not difficult to see, is one. There are n−1 Lyapunov exponents associated with the subspace 1⊥, so the key point is to characterize them.

9

slide-10
SLIDE 10

CONVERGENCE TO CONSENSUS

For x ∈ Rn use the symbol ¯ x := 1

n

n

i=1xi. The main convergence result

follows. Theorem 2 Let G = (V,E) be a connected graph and let e(t) be an ergodic stochastic process taking values on E. Suppose that the support of the probability distribution induced by e(t) is all of E. Let the gossip algorithm be initialized at x(0) = x0. Then there is a (deterministic) constant |λ| < 1 and a (random) constant Kλ such that x(t)− ¯ x01 < Kλλtx0 − ¯ x01 µ-almost surely.

10

slide-11
SLIDE 11

OPEN QUESTIONS

  • Rate of convergence (for L2...)
  • Multiple gossiping : more than one pair of communicating edges per

time slot,

  • Convergence is merely associated to the time T it takes the algorithm

to visit a spanning tree with positive probability. Indeed, the actual rate

  • f convergence of the algorithm is just determined by T.
  • Much remains to be done !!!

11

slide-12
SLIDE 12

REFERENCES

  • W. Runggaldier (circa 1970): STILLE WASSER GRUNDEN TIEF, unpub-

lished (although well known among specialists).

12