Privacy and Fault-Tolerance in Distributed Optimization Nitin - - PowerPoint PPT Presentation
Privacy and Fault-Tolerance in Distributed Optimization Nitin - - PowerPoint PPT Presentation
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign Acknowledgements Shripad Gade Lili Su S X f i ( x ) argmin x X i =1 i Applications f 1 (x) g fi(x) = cost for robot i
Shripad Gade Lili Su
Acknowledgements
∈ argmin
x∈X S
X
i=1
fi(x)
i
Applications
g fi(x) = cost for robot i
to go to location x
g Minimize total cost
- f rendezvous
∈ argmin
x∈X S
X
i=1
fi(x)
i
x
f1(x) f2(x)
x1 x2
Applications
5
Minimize cost
Σ fi(x)
i f1(x) f2(x) f4(x) f3(x)
Learning
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Distributed Optimization
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Fault-tolerance
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Privacy
∈ argmin
x∈X S
X
i=1
fi(x)
i Outline
Distributed Optimization
7
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
𝑔
&
Server
𝑔
#
𝑔
"
Client-Server Architecture
8
f1(x) f2(x) f4(x) f3(x)
𝑔
&
Server
𝑔
#
𝑔
"
Client-Server Architecture
g Server maintains estimate 𝑦( g Client i knows 𝑔
)(𝑦)
𝑦( 𝑔
&
Server
𝑔
#
𝑔
"
Client-Server Architecture
g Server maintains estimate 𝑦( g Client i knows 𝑔
)(𝑦)
In iteration k+1
g Client i
iDownload 𝑦( from server iUpload gradient 𝛼𝑔
)(𝑦()
𝛼𝑔
)(𝑦()
𝑦( 𝑔
&
Server
𝑔
#
𝑔
"
Client-Server Architecture
g Server maintains estimate 𝑦( g Client i knows 𝑔
)(𝑦)
In iteration k+1
g Client i
iDownload 𝑦( from server iUpload gradient 𝛼𝑔
)(𝑦()
g Server
𝑦(-& ⟵ 𝑦( − 𝛽( 2 𝛼𝑔
) 𝑦(
- )
𝛼𝑔
)(𝑦()
𝑔
&
Server
𝑔
#
𝑔
"
Variations
g Stochastic g Asynchronous g …
12
Peer-to-Peer Architecture
f1(x) f2(x) f4(x) f3(x)
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Peer-to-Peer Architecture
g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔
) 𝑦(
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Distributed Optimization
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Fault-tolerance
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Privacy
∈ argmin
x∈X S
X
i=1
fi(x)
i Outline
𝑔
&
Server
𝑔
#
𝑔
"
𝛼𝑔
)(𝑦()
Server observes gradients è privacy compromised
𝑔
&
Server
𝑔
#
𝑔
"
𝛼𝑔
)(𝑦()
𝑔
&
Server
𝑔
#
𝑔
"
Achieve privacy and yet collaboratively optimize 𝛼𝑔
)(𝑦()
Server observes gradients è privacy compromised
Related Work
g Cryptographic methods (homomorphic encryption) g Function transformation g Differential privacy
19
Differential Privacy
20
𝑔
&
Server
𝑔
#
𝑔
"
𝛼𝑔
) 𝑦( + 𝜻𝒍
Differential Privacy
21
𝑔
&
Server
𝑔
#
𝑔
"
𝛼𝑔
) 𝑦( + 𝜻𝒍
Trade-off privacy with accuracy
Proposed Approach
22
g Motivated by secret sharing g Exploit diversity … Multiple servers / neighbors
Proposed Approach
23
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
Privacy if subset of servers adversarial
Proposed Approach
24
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Privacy if subset of neighbors adversarial
Proposed Approach
g Structured noise that
“cancels” over servers/neighbors
25
Intuition
26
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
x1 x2
Intuition
27
Server 1
𝑔
&&
Server 2
𝑔
&#
𝑔
#&𝑔 ##
𝑔
"&𝑔 "#
x1 x2
Each client simulates multiple clients
Intuition
28
Server 1
𝑔
&&
Server 2
𝑔
&#
𝑔
&&(𝑦) + 𝑔&# 𝑦 = 𝑔 & 𝑦
𝑔
#&𝑔 ##
𝑔
"&𝑔 "#
x1 x2
𝑔
)8(𝑦) not necessarily convex
Algorithm
g Each server maintains an estimate
In each iteration
g Client i
iDownload estimates from corresponding server iUpload gradient of 𝑔
)
g Each server updates estimate using received gradients
Algorithm
g Each server maintains an estimate
In each iteration
g Client i
iDownload estimates from corresponding server iUpload gradient of 𝑔
)
g Each server updates estimate using received gradients g Servers periodically exchange estimates to perform a
consensus step
Claim
g Under suitable assumptions, servers eventually reach
consensus in
31
∈ argmin
x∈X S
X
i=1
fi(x)
i
Privacy
32
Server 1
𝑔
&&
Server 2
𝑔
&#
𝑔
#&𝑔 ##
𝑔
"&𝑔 "#
𝑔
&&+ 𝑔 #&+𝑔 "&
𝑔
#&+ 𝑔 ##+𝑔 "#
Privacy
g Server 1 may learn 𝑔
&&, 𝑔 #&, 𝑔 "&, 𝑔 #&+ 𝑔 ##+𝑔 "#
g Not sufficient to learn 𝑔
)
33
Server 1
𝑔
&&
Server 2
𝑔
&#
𝑔
#&𝑔 ##
𝑔
"&𝑔 "#
𝑔
&&+ 𝑔 #&+𝑔 "&
𝑔
#&+ 𝑔 ##+𝑔 "#
g Function splitting not necessarily practical g Structured randomization as an alternative
34
𝑔
&&(𝑦) + 𝑔&# 𝑦 = 𝑔& 𝑦
Structured Randomization
g Multiplicative or additive noise in gradients g Noise cancels over servers
35
Multiplicative Noise
36
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
x1 x2
Multiplicative Noise
37
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
x1 x2
Multiplicative Noise
38
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
𝛽𝛼𝑔
&(x1)
𝛾𝛼𝑔
&(𝑦2)
x1 x2 𝛽+𝛾=1
Multiplicative Noise
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
𝛽𝛼𝑔
&(x1)
𝛾𝛼𝑔
&(𝑦2)
x1 x2 𝛽+𝛾=1
Suffices for this invariant to hold
- ver a larger number of iterations
Multiplicative Noise
Server 1
𝑔
&
𝑔
#
Server 2
𝑔
"
𝛽𝛼𝑔
&(x1)
𝛾𝛼𝑔
&(𝑦2)
x1 x2 𝛽+𝛾=1
Noise from client i to server j not zero-mean
Claim
g Under suitable assumptions, servers eventually reach
consensus in
41
∈ argmin
x∈X S
X
i=1
fi(x)
i
Peer-to-Peer Architecture 𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Reminder …
g Each agent maintains local estimate x g Consensus step with neighbors g Apply own gradient to own estimate
𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔
) 𝑦(
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Proposed Approach
g Each agent shares noisy estimate with neighbors
- Scheme 1 – Noise cancels over neighbors
- Scheme 2 – Noise cancels network-wide
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Proposed Approach
g Each agent shares noisy estimate with neighbors
- Scheme 1 – Noise cancels over neighbors
- Scheme 2 – Noise cancels network-wide
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
x + ε1 x + ε2 ε1 + ε2 = 0 (over iterations)
Peer-to-Peer Architecture
g Poster today
Shripad Gade
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Distributed Optimization
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Fault-tolerance
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Privacy
∈ argmin
x∈X S
X
i=1
fi(x)
i Outline
Fault-Tolerance
g Some agents may be faulty g Need to produce “correct” output despite the faults
48
Byzantine Fault Model
g No constraint on misbehavior of a faulty agent g May send bogus messages g Faulty agents can collude
49
Peer-to-Peer Architecture
g fi(x) = cost for robot i
to go to location x
g Faulty agent may choose
arbitrary cost function
x
f1(x) f2(x)
x1 x2
Peer-to-Peer Architecture
51
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
𝑔
&
Server
𝑔
#
𝑔
"
Client-Server Architecture
𝛼𝑔
)(𝑦()
Fault-Tolerant Optimization
g The original problem is not meaningful
53
∈ argmin
x∈X S
X
i=1
fi(x)
i
Fault-Tolerant Optimization
g The original problem is not meaningful g Optimize cost over only non-faulty agents
∈ argmin
x∈X S
X
i=1
fi(x)
i
∈ argmin
x∈X S
X
i=1
fi(x)
i good
Fault-Tolerant Optimization
g The original problem is not meaningful g Optimize cost over only non-faulty agents
∈ argmin
x∈X S
X
i=1
fi(x)
i
∈ argmin
x∈X S
X
i=1
fi(x)
i good
Impossible!
Fault-Tolerant Optimization
g Optimize weighted cost over only non-faulty agents g With 𝛃i as close to 1/ good as possible
∈ argmin
x∈X S
X
i=1
fi(x)
i good
𝛃i
Fault-Tolerant Optimization
g Optimize weighted cost over only non-faulty agents
∈ argmin
x∈X S
X
i=1
fi(x)
i good
𝛃i
With t Byzantine faulty agents: t weights may be 0
Fault-Tolerant Optimization
g Optimize weighted cost over only non-faulty agents
∈ argmin
x∈X S
X
i=1
fi(x)
i good
𝛃i
t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)
Centralized Algorithm
g Of the n agents, any t may be faulty g How to filter cost functions of faulty agents?
X
Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is
- btained as follows
60
Define a virtual function G(x) whose gradient is
- btained as follows
At a given x
g Sort the gradients of the n local cost functions
61
Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is
- btained as follows
At a given x
g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients
62
Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is
- btained as follows
At a given x
g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x
63
Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is
- btained as follows
At a given x
g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x
Virtual function G(x) is convex
Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is
- btained as follows
At a given x
g Sort the gradients of the n local cost functions g Discard smallest t and largest t gradients g Mean of remaining gradients = Gradient of G at x
Virtual function G(x) is convex à Can optimize easily
Centralized Algorithm: Scalar argument x
g Gradient filtering similar to centralized algorithm
… require “rich enough” connectivity … correlation between functions helps
g Vector case harder
… redundancy between functions helps
66
Peer-to-Peer Fault-Tolerant Optimization
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Distributed Optimization
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Fault-tolerance
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Privacy
∈ argmin
x∈X S
X
i=1
fi(x)
i Summary
Thanks!
disc.ece.illinois.edu
69
70
Distributed Peer-to-Peer Optimization
g Each agent maintains local estimate x
In each iteration
g Compute weighted average with neighbors’ estimates
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
Distributed Peer-to-Peer Optimization
g Each agent maintains local estimate x
In each iteration
g Compute weighted average with neighbors’ estimates g Apply own gradient to own estimate
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔
) 𝑦(
Distributed Peer-to-Peer Optimization
g Each agent maintains local estimate x
In each iteration
g Compute weighted average with neighbors’ estimates g Apply own gradient to own estimate g Local estimates converge to
𝑔
"
𝑔
#
𝑔
$
𝑔
%
𝑔
&
∈ argmin
x∈X S
X
i=1
fi(x)
i
𝑦(-& ⟵ 𝑦( − 𝛽(𝛼𝑔
) 𝑦(
74
RSS – Locally Balanced
Perturbations
g Add to zero (locally per node) g Bounded (≤ Δ)
Algorithm
g Node j selects d@
A,B such that ∑ d@ A,B
- B
= 0 and 𝑒(
8,) ≤ Δ
g Share w@
A,B = x@ A + d@ A,B with node i
g Consensus and (Stochastic) Gradient Descent
75
RSS – Network Balanced
Perturbations
g Add to zero (over network) g Bounded (≤ Δ)
Algorithm
g Node j computes perturbation d@
A
- sends sA,B to i
- add received sB,A and subtract sent sA,B ⇒ d@
A = ∑ rcvd − ∑ sent
- g Obfuscate state w@
A = x@ A + d@ A shared with neighbors
g Consensus and (Stochastic) Gradient Descent
Convergence
Let x PAQ = ∑ α@ x@
A Q
- / ∑ α@
Q
- and α@ = 1/ k
- 𝑔 x
PAQ − 𝑔 𝑦∗ ≤ 𝒫 log 𝑈 𝑈
- + 𝒫 Δ#log 𝑈
𝑈
- g Asymptotic convergence of iterates to optimum
g Privacy-Convergence Trade-off g Stochastic gradient updates work too
76
Function Sharing
g Let fB(x) be bounded degree polynomials
Algorithm
g Node j shares sA,B x with node i g Node j obfuscates using pA x = ∑sB,A x − ∑sA,B(x) g Use f
^A x = fA x + pA(x) and use distributed gradient descent
77
Function Sharing - Convergence
g Function Sharing iterates converge to correct
- ptimum (∑f
^B x = f(x))
g Privacy:
If vertex connectivity of graph ≥ f then no group of f nodes can estimate true functions 𝑔
) (or any good
subset)
g pA(x) is also similar to fA(x) then it can hide fB x well
78