Research Interests Distributed algorithms Distributed shared memory - - PowerPoint PPT Presentation

research interests
SMART_READER_LITE
LIVE PREVIEW

Research Interests Distributed algorithms Distributed shared memory - - PowerPoint PPT Presentation

Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu Research Interests Distributed algorithms Distributed shared memory systems Distributed


slide-1
SLIDE 1

Privacy & Security in Machine Learning / Optimization

Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu

slide-2
SLIDE 2

Research Interests

Distributed algorithms

 Distributed shared memory systems  Distributed computations over wireless networks  Distributed optimization

2

slide-3
SLIDE 3

Privacy and Security Machine Learning / Optimization for

slide-4
SLIDE 4

Privacy and Security Machine Learning / Optimization for

slide-5
SLIDE 5

Outline

 Motivation – distributed machine learning  Research problems

– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

slide-6
SLIDE 6
slide-7
SLIDE 7

CIFAR-10 dataset

Example – Image Classification

slide-8
SLIDE 8

Deep Neural Networks

1 3 2

x1 x3 x2

layer 1 W132

a1 a2 a3 a4

W111 neuron

slide-9
SLIDE 9

1 3 2

x1 x3 x2

layer 1 W132 0.8 dog 0.1 cat 0.09 ship 0.01 car input

  • utput

W111 parameters

slide-10
SLIDE 10

Deep Neural Networks

1 3 2

x1 x3 x2

layer 1 W132

a1 a2 a3 a4

slide-11
SLIDE 11

3

x3 x2

W132 W131

s(X2W131+X3W132+b13)

slide-12
SLIDE 12

3

x3 x2

W132 W131

s(X2W131+X3W132+b13) s(z)

z

Rectifier Linear Unit

z

slide-13
SLIDE 13

How to train your dragon

 Given a machine structure  Parameters are the only free variables

 Choose parameters that maximize accuracy

13

network

slide-14
SLIDE 14

How to train your network

 Given a machine structure  Parameters are the only free variables

 Choose parameters to maximize accuracy Optimize a suitably defined cost function h(w) to find the right parameter vector w

slide-15
SLIDE 15

How to train your network

1 3 2

x1 x3 x2

W132

a1 a2 a3 a4

Optimize a suitably defined cost function h(w) to find the right parameter vector w parameters w

slide-16
SLIDE 16

Cost Function h(w)

 Consider input x  True classification y(x)  Machine classification a(x,w) using parameters w  Cost for input x = || y(x)-a(x,w) ||2  Total cost h(w) = Σ || y(x)-a(x,w) ||2

x

16

slide-17
SLIDE 17

Convex Optimization

Wikipedia

h(W) w131 w132 W = (w131,w132,…)

slide-18
SLIDE 18

Convex Optimization W = (w131,w132,…) W[1] = (4,3,...) W[0]

slide-19
SLIDE 19

Convex Optimization W = (w131,w132,…) W[1] = (4,3,...) W[2] = (3,2,…) W[0] W[1]

slide-20
SLIDE 20

Convex Optimization W = (w131,w132,…) W[0] W[1] W[2]

slide-21
SLIDE 21

Convex Optimization W = (w131,w132,…) W[0] W[1] W[2] W[3]

slide-22
SLIDE 22

22

Training

Optimize cost function h(w) Machine parameters w

So far …

h(w)

slide-23
SLIDE 23

Outline

 Motivation – distributed machine learning  Research problems

– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

slide-24
SLIDE 24

Distributed Machine Learning

 Data is distributed

across different agents

Mobile users Hospitals Competing vendors

slide-25
SLIDE 25

Distributed Machine Learning

 Data is distributed

across different agents

Mobile users Hospitals Competing vendors

Agent 1 Agent 2 Agent 3 Agent 4

slide-26
SLIDE 26

Distributed Machine Learning

 Data is distributed

across different  Collaborate to learn agents

slide-27
SLIDE 27

Distributed Machine Learning

 Data is distributed

across different  Collaborate to learn agents Training

Optimize cost function

Σ hi(w)

i h1(w) h2(w) h4(w) h3(w) Machine parameters w

slide-28
SLIDE 28

Distributed Optimization

 30+ years or work  Recent interest due to machine learning applications

28

slide-29
SLIDE 29

Distributed Optimization

Different architectures

 Peer-to-peer

29

h1(w) h3(w) h2(w)

slide-30
SLIDE 30

Distributed Optimization

Different architectures

 Peer-to-peer  Parameter server

h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server

slide-31
SLIDE 31

Distributed Gradient Method h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0]

slide-32
SLIDE 32

Distributed Gradient Method h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0]

slide-33
SLIDE 33

Distributed Gradient Method h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] T = ½W3[0] + ¼W1[0]+ ¼W2[0]

slide-34
SLIDE 34

Distributed Gradient Method h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] W3[0] = T - 𝛃 ∇h3(T) T = ½W3[0] + ¼W1[0]+ ¼W2[0]

slide-35
SLIDE 35

Works in incomplete networks too !! h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0] W3[0] = T - 𝛃 ∇h3(T) T = ½W3[0] + ¼W1[0]+ ¼W2[0]

slide-36
SLIDE 36

Parameter Server Architecture h1(w) h3(w) h2(w) Parameter server

W[1] = W[0] – 𝛃 ∑∇hi(W[0]) W[0] ∇h2(W[0]) ∇h1(W[0])

slide-37
SLIDE 37

Outline

 Motivation – distributed machine learning  Research problems

– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

slide-38
SLIDE 38

Privacy Challenge

 Peers may learn

each other’s data

 Parameter server

may learn data

h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server

slide-39
SLIDE 39

Privacy-Preserving Optimization

Optimize cost function

Σ hi(w)

i Can agents collaboratively learn, and yet protect own data ?

slide-40
SLIDE 40

Peer-to-Peer Architecture h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0] W1[0] W2[0] W3[0]

slide-41
SLIDE 41

Add Inter-Dependent Noise h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0]+n1 W1[0]+n1 W2[0]+n2 W3[0]+n3

slide-42
SLIDE 42

Add Inter-Dependent Noise h1(w) h3(w) h2(w)

W1[0] W2[0] W3[0] W1[0]+n1 W1[0]+n1 W2[0]+n2 W3[0]+n3

n1 + n2 + n3 = 0

slide-43
SLIDE 43

Key Idea

 Add correlated noise in information exchanged

between agents

 Noise “cancels” over the network  But can prevent coalition of bad agents learning

information about others

43

slide-44
SLIDE 44

Privacy-Preserving Optimization

Optimize cost function

Σ hi(w)

i Can agents collaboratively learn, and yet protect own data ?

Yes!*

* conditions apply

slide-45
SLIDE 45

Privacy-Preserving Optimization

Optimize cost function

Σ hi(w)

i Can agents collaboratively learn, and yet protect own data ?

Yes!*

* conditions apply

slide-46
SLIDE 46
slide-47
SLIDE 47

Outline

 Motivation – distributed machine learning  Research problems

– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

slide-48
SLIDE 48

Adversarial Agents

 Adversarial agents

may send bogus information

 Learned parameters impacted

48

h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server

slide-49
SLIDE 49

Adversarial Agents

49

h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server

Can good agents learn despite bad agents?

slide-50
SLIDE 50

Adversarial Agents

50

h1(w) h3(w) h2(w) h1(w) h3(w) h2(w) Parameter server

Can good agents learn despite bad agents?

Yes!*

slide-51
SLIDE 51

Key Idea

 Need to filter bad information  Define “outliers” appropriately

51

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Outline

 Motivation – distributed machine learning  Research problems

– Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

slide-55
SLIDE 55

Adversarial Samples

 Machine learning seems to work well  If it seems too good to be true …

55

slide-56
SLIDE 56

Adversarial Samples

 Several researchers have shown that

it is easy to fool a machine

slide-57
SLIDE 57

57

slide-58
SLIDE 58
  • riginal adversarial

sample sample

slide-59
SLIDE 59

Can we solve the problem?

May be … or not

 Some interesting ideas that seem promising

in early evaluations … but not mature enought to report yet

59

slide-60
SLIDE 60

Summary

 Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while …

disc.ece.illinois.edu

slide-61
SLIDE 61

Collaborators

 Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical)

slide-62
SLIDE 62

Collaborators

 Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical)

Other related effort -- fault-tolerant control

 Professor Aranya Chakarabortty (on sabbatical)

slide-63
SLIDE 63

Summary

 Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while …

disc.ece.illinois.edu

slide-64
SLIDE 64

64

slide-65
SLIDE 65

65

slide-66
SLIDE 66

66

slide-67
SLIDE 67

67

slide-68
SLIDE 68

Parameter Server Architecture

 Distributed gradient method

h1(w) h3(w) h2(w) Parameter server

W[0]

slide-69
SLIDE 69

Distributed Optimization h1(w) h3(w) h2(w) Parameter server

W[0] W[0]

slide-70
SLIDE 70

Distributed Optimization h1(w) h3(w) h2(w) Parameter server

W[0] W[0] ∇h1(W[0]) ∇h2(W[0])