Privacy and Fault-Tolerance in Distributed Optimization Nitin - - PowerPoint PPT Presentation

privacy and fault tolerance in distributed optimization
SMART_READER_LITE
LIVE PREVIEW

Privacy and Fault-Tolerance in Distributed Optimization Nitin - - PowerPoint PPT Presentation

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign Acknowledgements Shripad Gade Lili Su S X f i ( x ) argmin x X i =1 i Applications f 1 (x) g fi(x) = cost for robot i


slide-1
SLIDE 1

Privacy and Fault-Tolerance in Distributed Optimization

Nitin Vaidya University of Illinois at Urbana-Champaign

slide-2
SLIDE 2

Shripad Gade Lili Su

Acknowledgements

slide-3
SLIDE 3

∈ argmin

x∈X S

X

i=1

fi(x)

i

slide-4
SLIDE 4

Applications

g fi(x) = cost for robot i

to go to location x

g Minimize total cost

  • f rendezvous

∈ argmin

x∈X S

X

i=1

fi(x)

i

x

f1(x) f2(x)

x1 x2

slide-5
SLIDE 5

Applications

5

Minimize cost

Σ fi(x)

i f1(x) f2(x) f4(x) f3(x)

Learning

slide-6
SLIDE 6

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Distributed Optimization

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Fault-tolerance

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Privacy

∈ argmin

x∈X S

X

i=1

fi(x)

i Outline

slide-7
SLIDE 7

Distributed Optimization

7

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

𝑔

&

Server

𝑔

#

𝑔

"

slide-8
SLIDE 8

Client-Server Architecture

8

f1(x) f2(x) f4(x) f3(x)

𝑔

&

Server

𝑔

#

𝑔

"

slide-9
SLIDE 9

Client-Server Architecture

g Server maintains estimate 𝑦( g Client i knows 𝑔

)(𝑦)

𝑦( 𝑔

&

Server

𝑔

#

𝑔

"

slide-10
SLIDE 10

Client-Server Architecture

g Server maintains estimate 𝑦( g Client i knows 𝑔

)(𝑦)

In iteration k+1

g Client i

iDownload 𝑦( from server iUpload gradient 𝛼𝑔

)(𝑦()

𝛼𝑔

)(𝑦()

𝑦( 𝑔

&

Server

𝑔

#

𝑔

"

slide-11
SLIDE 11

Client-Server Architecture

g Server maintains estimate 𝑦( g Client i knows 𝑔

)(𝑦)

In iteration k+1

g Client i

iDownload 𝑦( from server iUpload gradient 𝛼𝑔

)(𝑦()

g Server

𝑦(-& ⟵ 𝑦( − 𝛽( 2 𝛼𝑔

) 𝑦(

  • )

𝛼𝑔

)(𝑦()

𝑔

&

Server

𝑔

#

𝑔

"

slide-12
SLIDE 12

Variations

g Stochastic g Asynchronous g …

12

slide-13
SLIDE 13

Peer-to-Peer Architecture

f1(x) f2(x) f4(x) f3(x)

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-14
SLIDE 14

Peer-to-Peer Architecture

g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔

) 𝑦(

slide-15
SLIDE 15

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Distributed Optimization

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Fault-tolerance

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Privacy

∈ argmin

x∈X S

X

i=1

fi(x)

i Outline

slide-16
SLIDE 16

𝑔

&

Server

𝑔

#

𝑔

"

𝛼𝑔

)(𝑦()

slide-17
SLIDE 17

Server observes gradients è privacy compromised

𝑔

&

Server

𝑔

#

𝑔

"

𝛼𝑔

)(𝑦()

slide-18
SLIDE 18

𝑔

&

Server

𝑔

#

𝑔

"

Achieve privacy and yet collaboratively optimize 𝛼𝑔

)(𝑦()

Server observes gradients è privacy compromised

slide-19
SLIDE 19

Related Work

g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy

19

slide-20
SLIDE 20

Differential Privacy

20

𝑔

&

Server

𝑔

#

𝑔

"

𝛼𝑔

) 𝑦( + 𝜻𝒍

slide-21
SLIDE 21

Differential Privacy

21

𝑔

&

Server

𝑔

#

𝑔

"

𝛼𝑔

) 𝑦( + 𝜻𝒍

Trade-off privacy with accuracy

slide-22
SLIDE 22

Proposed Approach

22

g Motivated by secret sharing g Exploit diversity … Multiple servers / neighbors

slide-23
SLIDE 23

Proposed Approach

23

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

Privacy if subset of servers adversarial

slide-24
SLIDE 24

Proposed Approach

24

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Privacy if subset of neighbors adversarial

slide-25
SLIDE 25

Proposed Approach

g Structured noise that

“cancels” over servers/neighbors

25

slide-26
SLIDE 26

Intuition

26

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

x1 x2

slide-27
SLIDE 27

Intuition

27

Server 1

𝑔

&&

Server 2

𝑔

&#

𝑔

#&𝑔 ##

𝑔

"&𝑔 "#

x1 x2

Each client simulates multiple clients

slide-28
SLIDE 28

Intuition

28

Server 1

𝑔

&&

Server 2

𝑔

&#

𝑔

&&(𝑦) + 𝑔&# 𝑦 = 𝑔 & 𝑦

𝑔

#&𝑔 ##

𝑔

"&𝑔 "#

x1 x2

𝑔

)8(𝑦) not necessarily convex

slide-29
SLIDE 29

Algorithm

g Each server maintains an estimate

In each iteration

g Client i

iDownload estimates from corresponding server iUpload gradient of 𝑔

)

g Each server updates estimate using received gradients

slide-30
SLIDE 30

Algorithm

g Each server maintains an estimate

In each iteration

g Client i

iDownload estimates from corresponding server iUpload gradient of 𝑔

)

g Each server updates estimate using received gradients g Servers periodically exchange estimates to perform a

consensus step

slide-31
SLIDE 31

Claim

g Under suitable assumptions, servers eventually reach

consensus in

31

∈ argmin

x∈X S

X

i=1

fi(x)

i

slide-32
SLIDE 32

Privacy

32

Server 1

𝑔

&&

Server 2

𝑔

&#

𝑔

#&𝑔 ##

𝑔

"&𝑔 "#

𝑔

&&+ 𝑔 #&+𝑔 "&

𝑔

#&+ 𝑔 ##+𝑔 "#

slide-33
SLIDE 33

Privacy

g Server 1 may learn 𝑔

&&, 𝑔 #&, 𝑔 "&, 𝑔 #&+ 𝑔 ##+𝑔 "#

g Not sufficient to learn 𝑔

)

33

Server 1

𝑔

&&

Server 2

𝑔

&#

𝑔

#&𝑔 ##

𝑔

"&𝑔 "#

𝑔

&&+ 𝑔 #&+𝑔 "&

𝑔

#&+ 𝑔 ##+𝑔 "#

slide-34
SLIDE 34

g Function splitting not necessarily practical g Structured randomization as an alternative

34

𝑔

&&(𝑦) + 𝑔&# 𝑦 = 𝑔& 𝑦

slide-35
SLIDE 35

Structured Randomization

g Multiplicative or additive noise in gradients g Noise cancels over servers

35

slide-36
SLIDE 36

Multiplicative Noise

36

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

x1 x2

slide-37
SLIDE 37

Multiplicative Noise

37

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

x1 x2

slide-38
SLIDE 38

Multiplicative Noise

38

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

𝛽𝛼𝑔

&(x1)

𝛾𝛼𝑔

&(𝑦2)

x1 x2 𝛽+𝛾=1

slide-39
SLIDE 39

Multiplicative Noise

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

𝛽𝛼𝑔

&(x1)

𝛾𝛼𝑔

&(𝑦2)

x1 x2 𝛽+𝛾=1

Suffices for this invariant to hold

  • ver a larger number of iterations
slide-40
SLIDE 40

Multiplicative Noise

Server 1

𝑔

&

𝑔

#

Server 2

𝑔

"

𝛽𝛼𝑔

&(x1)

𝛾𝛼𝑔

&(𝑦2)

x1 x2 𝛽+𝛾=1

Noise from client i to server j not zero-mean

slide-41
SLIDE 41

Claim

g Under suitable assumptions, servers eventually reach

consensus in

41

∈ argmin

x∈X S

X

i=1

fi(x)

i

slide-42
SLIDE 42

Peer-to-Peer Architecture 𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-43
SLIDE 43

Reminder …

g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate

𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔

) 𝑦(

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-44
SLIDE 44

Proposed Approach

g Each agent shares noisy estimate with neighbors

  • Scheme 1 – Noise cancels over neighbors
  • Scheme 2 – Noise cancels network-wide

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-45
SLIDE 45

Proposed Approach

g Each agent shares noisy estimate with neighbors

  • Scheme 1 – Noise cancels over neighbors
  • Scheme 2 – Noise cancels network-wide

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

x + ε1 x + ε2 ε1 + ε2 = 0 (over iterations)

slide-46
SLIDE 46

Peer-to-Peer Architecture

g Poster today

Shripad Gade

slide-47
SLIDE 47

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Distributed Optimization

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Fault-tolerance

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Privacy

∈ argmin

x∈X S

X

i=1

fi(x)

i Outline

slide-48
SLIDE 48

Fault-Tolerance

g Some agents may be faulty g Need to produce “correct” output despite the faults

48

slide-49
SLIDE 49

Byzantine Fault Model

g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude

49

slide-50
SLIDE 50

Peer-to-Peer Architecture

g fi(x) = cost for robot i

to go to location x

g Faulty agent may choose

arbitrary cost function

x

f1(x) f2(x)

x1 x2

slide-51
SLIDE 51

Peer-to-Peer Architecture

51

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-52
SLIDE 52

𝑔

&

Server

𝑔

#

𝑔

"

Client-Server Architecture

𝛼𝑔

)(𝑦()

slide-53
SLIDE 53

Fault-Tolerant Optimization

g The original problem is not meaningful

53

∈ argmin

x∈X S

X

i=1

fi(x)

i

slide-54
SLIDE 54

Fault-Tolerant Optimization

g The original problem is not meaningful g Optimize cost over only non-faulty agents

∈ argmin

x∈X S

X

i=1

fi(x)

i

∈ argmin

x∈X S

X

i=1

fi(x)

i good

slide-55
SLIDE 55

Fault-Tolerant Optimization

g The original problem is not meaningful g Optimize cost over only non-faulty agents

∈ argmin

x∈X S

X

i=1

fi(x)

i

∈ argmin

x∈X S

X

i=1

fi(x)

i good

Impossible!

slide-56
SLIDE 56

Fault-Tolerant Optimization

g Optimize weighted cost over only non-faulty agents g With 𝛃i as close to 1/ good as possible

∈ argmin

x∈X S

X

i=1

fi(x)

i good

𝛃i

slide-57
SLIDE 57

Fault-Tolerant Optimization

g Optimize weighted cost over only non-faulty agents

∈ argmin

x∈X S

X

i=1

fi(x)

i good

𝛃i

With t Byzantine faulty agents: t weights may be 0

slide-58
SLIDE 58

Fault-Tolerant Optimization

g Optimize weighted cost over only non-faulty agents

∈ argmin

x∈X S

X

i=1

fi(x)

i good

𝛃i

t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)

slide-59
SLIDE 59

Centralized Algorithm

g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents?

X

slide-60
SLIDE 60

Centralized Algorithm: Scalar argument x

Define a virtual function G(x) whose gradient is

  • btained as follows

60

slide-61
SLIDE 61

Define a virtual function G(x) whose gradient is

  • btained as follows

At a given x

g Sort the gradients of the n local cost functions

61

Centralized Algorithm: Scalar argument x

slide-62
SLIDE 62

Define a virtual function G(x) whose gradient is

  • btained as follows

At a given x

g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients

62

Centralized Algorithm: Scalar argument x

slide-63
SLIDE 63

Define a virtual function G(x) whose gradient is

  • btained as follows

At a given x

g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x

63

Centralized Algorithm: Scalar argument x

slide-64
SLIDE 64

Define a virtual function G(x) whose gradient is

  • btained as follows

At a given x

g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x

Virtual function G(x) is convex

Centralized Algorithm: Scalar argument x

slide-65
SLIDE 65

Define a virtual function G(x) whose gradient is

  • btained as follows

At a given x

g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x

Virtual function G(x) is convex à Can optimize easily

Centralized Algorithm: Scalar argument x

slide-66
SLIDE 66

g Gradient filtering similar to centralized algorithm

… require “rich enough” connectivity … correlation between functions helps

g Vector case harder

… redundancy between functions helps

66

Peer-to-Peer Fault-Tolerant Optimization

slide-67
SLIDE 67

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Distributed Optimization

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Fault-tolerance

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

Privacy

∈ argmin

x∈X S

X

i=1

fi(x)

i Summary

slide-68
SLIDE 68

Thanks!

disc.ece.illinois.edu

slide-69
SLIDE 69

69

slide-70
SLIDE 70

70

slide-71
SLIDE 71

Distributed Peer-to-Peer Optimization

g Each agent maintains local estimate x

In each iteration

g Compute weighted average with neighbors’ estimates

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

slide-72
SLIDE 72

Distributed Peer-to-Peer Optimization

g Each agent maintains local estimate x

In each iteration

g Compute weighted average with neighbors’ estimates g Apply own gradient to own estimate

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔

) 𝑦(

slide-73
SLIDE 73

Distributed Peer-to-Peer Optimization

g Each agent maintains local estimate x

In each iteration

g Compute weighted average with neighbors’ estimates g Apply own gradient to own estimate g Local estimates converge to

𝑔

"

𝑔

#

𝑔

$

𝑔

%

𝑔

&

∈ argmin

x∈X S

X

i=1

fi(x)

i

𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔

) 𝑦(

slide-74
SLIDE 74

74

RSS – Locally Balanced

Perturbations

g Add to zero (locally per node) g Bounded (≤ Δ)

Algorithm

g Node j selects d@

A,B such that ∑ d@ A,B

  • B

= 0 and 𝑒(

8,) ≤ Δ

g Share w@

A,B = x@ A + d@ A,B with node i

g Consensus and (Stochastic) Gradient Descent

slide-75
SLIDE 75

75

RSS – Network Balanced

Perturbations

g Add to zero (over network) g Bounded (≤ Δ)

Algorithm

g Node j computes perturbation d@

A

  • sends sA,B to i
  • add received sB,A and subtract sent sA,B ⇒ d@

A = ∑ rcvd − ∑ sent

  • g Obfuscate state w@

A = x@ A + d@ A shared with neighbors

g Consensus and (Stochastic) Gradient Descent

slide-76
SLIDE 76

Convergence

Let x PAQ = ∑ α@ x@

A Q

  • / ∑ α@

Q

  • and α@ = 1/ k
  • 𝑔 x

PAQ − 𝑔 𝑦∗ ≤ 𝒫 log 𝑈 𝑈

  • + 𝒫 Δ#log 𝑈

𝑈

  • g Asymptotic convergence of iterates to optimum

g Privacy-Convergence Trade-off g Stochastic gradient updates work too

76

slide-77
SLIDE 77

Function Sharing

g Let fB(x) be bounded degree polynomials

Algorithm

g Node j shares sA,B x with node i g Node j obfuscates using pA x = ∑sB,A x − ∑sA,B(x) g Use f

^A x = fA x + pA(x) and use distributed gradient descent

77

slide-78
SLIDE 78

Function Sharing - Convergence

g Function Sharing iterates converge to correct

  • ptimum (∑f

^B x = f(x))

g Privacy:

If vertex connectivity of graph ≥ f then no group of f nodes can estimate true functions 𝑔

) (or any good

subset)

g pA(x) is also similar to fA(x) then it can hide fB x well

78