research interests
play

Research Interests Distributed algorithms Distributed shared memory - PowerPoint PPT Presentation

Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu Research Interests Distributed algorithms Distributed shared memory systems Distributed


  1. Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu

  2. Research Interests Distributed algorithms  Distributed shared memory systems  Distributed computations over wireless networks  Distributed optimization 2

  3. Privacy Machine for and Learning / Security Optimization

  4. Privacy Machine for and Learning / Security Optimization

  5. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  6. Example – Image Classification CIFAR-10 dataset

  7. Deep neuron Neural Networks layer 1 a 1 W 111 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4

  8. output parameters input layer 1 0.8 W 111 dog x 1 1 0.1 cat x 2 2 0.09 ship 3 x 3 W 132 0.01 car

  9. Deep Neural Networks layer 1 a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4

  10. x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132

  11. s( z) Rectifier Linear Unit z x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132 z

  12. network How to train your dragon  Given a machine structure  Parameters are the only free variables  Choose parameters that maximize accuracy 13

  13. How to train your network  Given a machine structure  Parameters are the only free variables  Choose parameters to maximize accuracy Optimize a suitably defined cost function h(w) to find the right parameter vector w

  14. How to train your network a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4 Optimize a suitably defined cost function h(w) parameters to find the right parameter vector w w

  15. Cost Function h(w)  Consider input x  True classification y(x)  Machine classification a(x,w) using parameters w  Cost for input x = || y(x)-a(x,w) || 2  Total cost h(w) = Σ || y(x)-a(x,w) || 2 x 16

  16. Convex Optimization h(W) W = (w 131, w 132 , …) w 132 w 131 Wikipedia

  17. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] = (4,3,...)

  18. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[1] = (4,3,...) W[2] = (3,2, …)

  19. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2]

  20. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2] W[3]

  21. So far … h(w) Training Machine  parameters Optimize w cost function h(w) 22

  22. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  23. Distributed Machine Learning  Data is distributed across different agents  Mobile users  Hospitals  Competing vendors

  24. Distributed Machine Learning  Data is distributed across different Agent 1 Agent 2 agents  Mobile users  Hospitals  Competing vendors Agent 3 Agent 4

  25. Distributed Machine Learning  Data is distributed  across different Collaborate to learn agents

  26. Distributed Machine Learning  Data is distributed  across different Collaborate to learn agents h 1 (w) h 2 (w) Training  Machine Optimize parameters cost function w Σ h i (w) i h 3 (w) h 4 (w)

  27. Distributed Optimization  30+ years or work  Recent interest due to machine learning applications 28

  28. Distributed Optimization Different architectures h 1 (w) h 2 (w)  Peer-to-peer h 3 (w) 29

  29. Distributed Optimization Different architectures h 1 (w) h 2 (w)  Peer-to-peer h 3 (w) Parameter server  Parameter server h 1 (w) h 3 (w) h 2 (w)

  30. Distributed Gradient Method W 1 [0] h 1 (w) h 2 (w) W 2 [0] h 3 (w) W 3 [0]

  31. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]

  32. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0]

  33. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]

  34. Works in incomplete networks too !! W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]

  35. Parameter Server Architecture Parameter W[1] = W[0] – 𝛃 ∑ ∇ h i (W[0]) server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)

  36. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  37. Privacy Challenge h 1 (w) h 2 (w)  Peers may learn each other’s data h 3 (w) Parameter  Parameter server may learn data server h 1 (w) h 3 (w) h 2 (w)

  38. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Optimize cost function Σ h i (w) i

  39. Peer-to-Peer Architecture W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]

  40. Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 h 3 (w) W 3 [0]

  41. Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 n 1 + n 2 + n 3 = 0 h 3 (w) W 3 [0]

  42. Key Idea  Add correlated noise in information exchanged between agents  Noise “ cancels ” over the network  But can prevent coalition of bad agents learning information about others 43

  43. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i

  44. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i

  45. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  46. Adversarial Agents h 1 (w) h 2 (w)  Adversarial agents may send bogus information h 3 (w)  Learned parameters impacted Parameter server h 1 (w) h 3 (w) h 2 (w) 48

  47. Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 49

  48. Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? Yes! * h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 50

  49. Key Idea  Need to filter bad information  Define “ outliers ” appropriately 51

  50. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  51. Adversarial Samples  Machine learning seems to work well  If it seems too good to be true … 55

  52. Adversarial Samples  Several researchers have shown that it is easy to fool a machine

  53. 57

  54. original adversarial sample sample

  55. Can we solve the problem? May be … or not  Some interesting ideas that seem promising in early evaluations … but not mature enought to report yet 59

  56. Summary  Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while … disc.ece.illinois.edu

  57. Collaborators  Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical)

  58. Collaborators  Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical) Other related effort -- fault-tolerant control  Professor Aranya Chakarabortty (on sabbatical)

  59. Summary  Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while … disc.ece.illinois.edu

  60. 64

  61. 65

  62. 66

  63. 67

  64. Parameter Server Architecture  Distributed gradient method Parameter W[0] server h 1 (w) h 3 (w) h 2 (w)

  65. Distributed Optimization Parameter W[0] server W[0] h 1 (w) h 3 (w) h 2 (w)

  66. Distributed Optimization Parameter W[0] server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend