Privacy & Security in Machine Learning / Optimization
Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu
Research Interests Distributed algorithms Distributed shared memory - - PowerPoint PPT Presentation
Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu Research Interests Distributed algorithms Distributed shared memory systems Distributed
Privacy & Security in Machine Learning / Optimization
Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu
Research Interests
Distributed algorithms
Distributed shared memory systems Distributed computations over wireless networks Distributed optimization
2
Privacy and Security Machine Learning / Optimization for
Privacy and Security Machine Learning / Optimization for
Outline
Motivation – distributed machine learning Research problems
– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
CIFAR-10 dataset
Example – Image Classification
Deep Neural Networks
1 3 2
x1 x3 x2
layer 1 W132
a1 a2 a3 a4
W111 neuron
1 3 2
x1 x3 x2
layer 1 W132 0.8 dog 0.1 cat 0.09 ship 0.01 car input
W111 parameters
Deep Neural Networks
1 3 2
x1 x3 x2
layer 1 W132
a1 a2 a3 a4
3
x3 x2
W132 W131
s(X2W131+X3W132+b13)
3
x3 x2
W132 W131
s(X2W131+X3W132+b13) s(z)
z
Rectifier Linear Unit
How to train your dragon
Given a machine structure Parameters are the only free variables
Choose parameters that maximize accuracy
13
network
How to train your network
Given a machine structure Parameters are the only free variables
Choose parameters to maximize accuracy Optimize a suitably defined cost function h(w) to find the right parameter vector w
How to train your network
1 3 2
x1 x3 x2
W132
a1 a2 a3 a4
Optimize a suitably defined cost function h(w) to find the right parameter vector w parameters w
Cost Function h(w)
Consider input x True classification y(x) Machine classification a(x,w) using parameters w Cost for input x = || y(x)-a(x,w) ||2 Total cost h(w) = Σ || y(x)-a(x,w) ||2
x
16
Convex Optimization
Wikipedia
h(W) w131 w132 W = (w131,w132,…)
Convex Optimization W = (w131,w132,…) W[1] = (4,3,...) W[0]
Convex Optimization W = (w131,w132,…) W[1] = (4,3,...) W[2] = (3,2,…) W[0] W[1]
Convex Optimization W = (w131,w132,…) W[0] W[1] W[2]
Convex Optimization W = (w131,w132,…) W[0] W[1] W[2] W[3]
22
Training
Optimize cost function h(w) Machine parameters w
So far …
h(w)
Outline
Motivation – distributed machine learning Research problems
– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Distributed Machine Learning
Data is distributed
across different agents
Mobile users Hospitals Competing vendors
Distributed Machine Learning
Data is distributed
across different agents
Mobile users Hospitals Competing vendors
Agent 1 Agent 2 Agent 3 Agent 4
Distributed Machine Learning
Data is distributed
across different Collaborate to learn agents
Distributed Machine Learning
Data is distributed
across different Collaborate to learn agents Training
Optimize cost function
Σ hi(w)
i h1(w) h2(w) h4(w) h3(w) Machine parameters w
Distributed Optimization
30+ years or work Recent interest due to machine learning applications
28
Distributed Optimization
Different architectures
Peer-to-peer
29
h1(w) h3(w) h2(w)
Distributed Optimization
Different architectures
Peer-to-peer Parameter server
h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server
Distributed Gradient Method h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0]
Distributed Gradient Method h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0]
Distributed Gradient Method h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] T = ½W3[0] + ¼W1[0]+ ¼W2[0]
Distributed Gradient Method h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] W3[0] = T - 𝛃 ∇h3(T) T = ½W3[0] + ¼W1[0]+ ¼W2[0]
Works in incomplete networks too !! h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] W3[0] = T - 𝛃 ∇h3(T) T = ½W3[0] + ¼W1[0]+ ¼W2[0]
Parameter Server Architecture h1(w) h3(w) h2(w) Parameter server
W[1] = W[0] – 𝛃 ∑∇hi(W[0]) W[0] ∇h2(W[0]) ∇h1(W[0])
Outline
Motivation – distributed machine learning Research problems
– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Privacy Challenge
Peers may learn
each other’s data
Parameter server
may learn data
h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server
Privacy-Preserving Optimization
Optimize cost function
Σ hi(w)
i Can agents collaboratively learn, and yet protect own data ?
Peer-to-Peer Architecture h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0]
Add Inter-Dependent Noise h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0]+n1 W1[0]+n1 W2[0]+n2 W3[0]+n3
Add Inter-Dependent Noise h1(w) h3(w) h2(w)
W1[0] W2[0] W3[0] W1[0]+n1 W1[0]+n1 W2[0]+n2 W3[0]+n3
n1 + n2 + n3 = 0
Key Idea
Add correlated noise in information exchanged
between agents
Noise “cancels” over the network But can prevent coalition of bad agents learning
information about others
43
Privacy-Preserving Optimization
Optimize cost function
Σ hi(w)
i Can agents collaboratively learn, and yet protect own data ?
* conditions apply
Privacy-Preserving Optimization
Optimize cost function
Σ hi(w)
i Can agents collaboratively learn, and yet protect own data ?
* conditions apply
Outline
Motivation – distributed machine learning Research problems
– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Adversarial Agents
Adversarial agents
may send bogus information
Learned parameters impacted
48
h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server
Adversarial Agents
49
h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server
Can good agents learn despite bad agents?
Adversarial Agents
50
h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server
Can good agents learn despite bad agents?
Key Idea
Need to filter bad information Define “outliers” appropriately
51
Outline
Motivation – distributed machine learning Research problems
– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Adversarial Samples
Machine learning seems to work well If it seems too good to be true …
55
Adversarial Samples
Several researchers have shown that
it is easy to fool a machine
57
sample sample
Can we solve the problem?
Some interesting ideas that seem promising
in early evaluations … but not mature enought to report yet
59
Summary
Achieving privacy/security in learning is non-trivial Some promising progress Plenty to keep us busy for a while …
disc.ece.illinois.edu
Collaborators
Lili Su (Ph.D. candidate) Shripad Gade (Ph.D. candidate) Nishad Phadke (BS thesis) Brian Wang (BS thesis) Professor Jungmin So (on sabbatical)
Collaborators
Lili Su (Ph.D. candidate) Shripad Gade (Ph.D. candidate) Nishad Phadke (BS thesis) Brian Wang (BS thesis) Professor Jungmin So (on sabbatical)
Other related effort -- fault-tolerant control
Professor Aranya Chakarabortty (on sabbatical)
Summary
Achieving privacy/security in learning is non-trivial Some promising progress Plenty to keep us busy for a while …
disc.ece.illinois.edu
64
65
66
67
Parameter Server Architecture
Distributed gradient method
h1(w) h3(w) h2(w) Parameter server
W[0]
Distributed Optimization h1(w) h3(w) h2(w) Parameter server
W[0] W[0]
Distributed Optimization h1(w) h3(w) h2(w) Parameter server
W[0] W[0] ∇h1(W[0]) ∇h2(W[0])