Privacy-Aware Machine Learning Systems Borja Balle Data is the New - - PowerPoint PPT Presentation

privacy aware machine learning systems
SMART_READER_LITE
LIVE PREVIEW

Privacy-Aware Machine Learning Systems Borja Balle Data is the New - - PowerPoint PPT Presentation

Privacy-Aware Machine Learning Systems Borja Balle Data is the New Oil The Economist, May 2017 The Importance of (Data) Privacy 4.5.2016 Official Journal of the European Union L 119/1 EN Universal declaration of human rights


slide-1
SLIDE 1

Privacy-Aware Machine Learning Systems

Borja Balle

slide-2
SLIDE 2

Data is the New Oil

The Economist, May 2017

slide-3
SLIDE 3

The Importance of (Data) Privacy

Universal declaration of human rights Article 12. No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference

  • r attacks.

#DeleteFacebook

I

(Legislative acts)

REGULATIONS

REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL

  • f 27 April 2016
  • n the protection of natural persons with regard to the processing of personal data and on the free

movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

(Text with EEA relevance) THE EUROPEAN PARLIAMENT AND THE COUNCIL OF THE EUROPEAN UNION,

Having regard to the Treaty on the Functioning of the European Union, and in particular Article 16 thereof, Having regard to the proposal from the European Commission, After transmission of the draft legislative act to the national parliaments, Having regard to the opinion of the European Economic and Social Committee (1), Having regard to the opinion of the Committee of the Regions (2), Acting in accordance with the ordinary legislative procedure (3), Whereas: (1) The protection of natural persons in relation to the processing of personal data is a fundamental right. Article 8(1) of the Charter of Fundamental Rights of the European Union (the ‘Charter’) and Article 16(1) of the Treaty on the Functioning of the European Union (TFEU) provide that everyone has the right to the protection of personal data concerning him or her. (2) The principles of, and rules on the protection of natural persons with regard to the processing of their personal data should, whatever their nationality or residence, respect their fundamental rights and freedoms, in particular their right to the protection of personal data. This Regulation is intended to contribute to the accomplishment of an area of freedom, security and justice and of an economic union, to economic and social progress, to the strengthening and the convergence of the economies within the internal market, and to the well-being of natural persons. (3) Directive 95/46/EC of the European Parliament and of the Council (4) seeks to harmonise the protection of fundamental rights and freedoms of natural persons in respect of processing activities and to ensure the free flow

  • f personal data between Member States.

4.5.2016 L 119/1 Official Journal of the European Union

EN (1) OJ C 229, 31.7.2012, p. 90. (2) OJ C 391, 18.12.2012, p. 127. (3) Position of the European Parliament of 12 March 2014 (not yet published in the Official Journal) and position of the Council at first reading of 8 April 2016 (not yet published in the Official Journal). Position of the European Parliament of 14 April 2016. (4) Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (OJ L 281, 23.11.1995, p. 31).

slide-4
SLIDE 4

“Only You, Your Doctor, and Many Others May Know”

  • L. Sweeney. Technology Science, 2015

“Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)”

  • A. Narayanan & V. Shmatikov. Security and Privacy, 2008

Vijay Pandurangan. tech.vijayp.ca, 2014

Anonymization Fiascos

slide-5
SLIDE 5

Privacy Risks in Machine Learning

Abstract—We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model’s training

  • dataset. To perform membership inference against a target model,

we make adversarial use of machine learning and train our own inference model to recognize differences in the target model’s predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classi- fication models trained by commercial “machine learning as a service” providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspec- tive, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

Membership Inference Attacks Against Machine Learning Models

Reza Shokri Cornell Tech Marco Stronati∗ INRIA Congzheng Song Cornell Vitaly Shmatikov Cornell Tech

Security and Privacy, 2017

This paper presents exposure, a simple-to-compute metric that can be applied to any deep learning model for measuring the memorization of secrets. Using this metric, we show how to extract those secrets efficiently using black-box API access. Further, we show that un- intended memorization occurs early, is not due to over- fitting, and is a persistent issue across different types of models, hyperparameters, and training strategies. We ex- periment with both real-world models (e.g., a state-of- the-art translation model) and datasets (e.g., the Enron email dataset, which contains users’ credit card numbers) to demonstrate both the utility of measuring exposure and the ability to extract secrets. Finally, we consider many defenses, finding some in- effective (like regularization), and others to lack guaran-

  • tees. However, by instantiating our own differentially-

private recurrent model, we validate that by appropri- ately investing in the use of state-of-the-art techniques, the problem can be resolved, with high utility.

The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets

Nicholas Carlini University of California, Berkeley Chang Liu University of California, Berkeley Jernej Kos National University of Singapore ´ Ulfar Erlingsson Google Brain Dawn Song University of California, Berkeley

ArXiv, 2018

slide-6
SLIDE 6

What Makes Privacy Difficult?

High-dimensional data Side information

slide-7
SLIDE 7

Privacy Enhancing Technologies (PETS)

  • Initially a sub-field of applied cryptography

– Now percolating into databases, machine learning, statistics, etc.

  • Privacy-preserving release (eg. differential privacy)

– Release statistics/models/datasets while preventing reverse-engineering of the original data

  • Privacy-preserving computation (eg. secure multi-party

computation)

– Perform computations on multi-party data without ever exchanging the inputs in plaintext

slide-8
SLIDE 8

Privacy-Preserving Release

Trusted Curator Privacy Barrier

slide-9
SLIDE 9

Differential Privacy: Informal Definition

Data Analysis Algorithm

?

Randomized

Bart or Milhouse?

slide-10
SLIDE 10

A randomized algorithm ! ∶ #$ → & satisfies differential privacy with parameter ' if for any pair of datasets ( and (’ differing in a single row and for any possible output *, the following inequality is satisfied: ℙ ! ( = * ≤ ./ℙ ! (′ = *

Differential Privacy

[DMNS’06; Godel Prize 2017]

... approximate differential privacy with parameters (', 2) ... set of outputs E ... ℙ ! ( ∈ 4 ≤ ./ℙ ! (′ ∈ 4 + 2

' 2

slide-11
SLIDE 11

Fundamental Properties of Differential Privacy

  • Compositionality

– Enables rigorous engineering through modularity

  • Quantifiable

– Amenable to mathematical analysis, continuous instead of black-or-white

  • Robust to side knowledge

– Protects even in the event of collusions and side information

slide-12
SLIDE 12

Multi-Party Data Analysis

Treatment Outcome Medical Data Census Data Financial Data

  • Attr. 1
  • Attr. 2

  • Attr. 4
  • Attr. 5

  • Attr. 7
  • Attr. 8

  • 1.0

54.3 … North 34 … 5 1 … 1.5 1 0.6 … South 12 … 10 …

  • 0.3

1 16.0 … East 56 … 2 … 0.7 35.0 … Centre 67 … 15 1 … 3.1 1 20.2 … West 29 … 7 1 …

slide-13
SLIDE 13

The Trusted Party “Solution”

(secure channel) (secure channel) (secure channel)

Trusted Party

Receives plain-text data, runs algorithm, returns result to parties

The Trusted Party assumption:

  • Introduces a single point of failure (with disastrous consequences)
  • Relies on weak incentives (especially when private data is valuable)
  • Requires agreement between all data providers

=> Useful but unrealistic. Maybe can be simulated?

slide-14
SLIDE 14

Secure Multi-Party Computation (MPC) f(x1, x2, . . . , xp) = y

Public:

xi

Private:

(party i)

Goal:

Compute f in a way that each party learns y (and nothing else!) Tools:

Oblivious Transfers (OT), Garbled Circuits (GC), Homomorphic Encryption (HE), etc

Guarantees:

Honest but curious adversaries, malicious adversaries, computationally bounded adversaries, collusions

slide-15
SLIDE 15

Challenges and Trade-offs

  • Protocols: out of the box vs. tailored
  • Threat models: semi-honest vs. malicious
  • Interaction: off-line vs. on-line
  • Trusted external parties: speed vs. privacy
  • Scalability: amount of data, dimensions, # parties
slide-16
SLIDE 16

In This Talk…

Part I: Privacy-Preserving Distributed Linear Regression on High-Dimensional Data Part II: Private Nearest Neighbors Classification in Federated Databases

PETS 2017, with Adria Gascon, Phillipp Schoppmann, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans Preprint, with Adria Gascon and Phillipp Schoppmann

slide-17
SLIDE 17

Linear Regression - Overview

Features:

  • Vertically partitioned data
  • Scalable to millions of records and hundreds of dimensions
  • Open source implementation

https://github.com/schoppmp/linreg-mpc

Tools:

  • Several standard MPC constructions (GC, OT, SS, …)
  • Efficient private inner product protocols
  • Conjugate gradient descent robust to fixed-point encodings
slide-18
SLIDE 18

Functionality: Multi-Party Linear Regression

Y ∈ Rn X = [X1 X2] ∈ Rn×d

Training Data Private Inputs Party 1:

X2

Party 2:

X1, Y

min

θ∈Rd kY Xθk2 + λkθk2

(X>X + λI)θ = X>Y

Linear Regression

(optimization) (closed-form solution)

slide-19
SLIDE 19

Solving

Aggregation and Solving Phases

O(nd2) O(d3) A = X>X + λI b = X>Y

Aggregation

θ = A−1b

X>X = " X>

1 X1

X>

1 X2

X>

2 X1

X>

2 X2

#

(cross-party products) (eg. Cholesky)

O(kd2)

(eg. k-CGD)

Approximate iterative solver

slide-20
SLIDE 20

Protocol Overview

Data Provider Data Provider Data Provider Crypto Provider Computing Provider

  • 1. CrP distributes correlated

randomness

  • 2. DPs run multiple inner

product protocols to get additive share of (A,b)

  • 3. CoP get GC for solving

linear system from CrP

  • 4. DPs send garbled shares of

(A,b) to CoP

  • 5. CoP executes GC and

returns solution to DPs

Aggregation Phase Solving Phase Alternative: CrP and CoP simulated by non-colluding parties

slide-21
SLIDE 21

Aggregation Phase – Arithmetic Secret Sharing X>

1 X2

(matrix product)

f(x1, x2) = hx1, x2i

(inner product b/w columns)

Party 1

x1 x2 − b s1 = x1 · (x2 − b) − c a, c

Party 2

x2 x1 + a s2 = (x1 + a) · b − d b, d s1 + s2 = x1 · x2 a · b = c + d

  • blivious transfer / 3rd party
slide-22
SLIDE 22

Solving Phase – Garbled Circuits

(PSD linear system)

Aθ = b

A = X

i

Ai b = X

i

bi

(Ai, bi)

(party i’s input: arithmetic share) Solved with Conjugate Gradient Descent (CGD) AND Gate

b1 = 0 b2 = 1 bout = 0

Truth table

Year Device / Paper 32 bit floating point multiplication (ms) 1961 IBM 1620E 17.7 1980 Intel 8086 CPU (software) 1.6 1980 Intel 8087 FPU 0.019 2015 Pullonen et al. @ FC&DS 38.2 2015 Demmler et al. @ CCS 9.2

}

Garbled Gate

Key1 Key2 Keyout

(or fail)

Encrypted truth table

slide-23
SLIDE 23

Fixed-point + Conjugate Gradient Descent

Total number of bits = bi + bf + 1 bi = number of integer bits bf = number of fractional bits

20 40 CGD Iteration t 10−18 10−15 10−12 10−9 10−6 10−3 Residual ÎAθt ≠ bÎ bi = 8 bi = 9 bi = 10 bi = 11 Float 20 40 CGD Iteration t 10−18 10−15 10−12 10−9 10−6 10−3 Residual ÎAθt ≠ bÎ

Textbook CGD Normalized CGD

slide-24
SLIDE 24

Experimental Results

Name d n Optimal FP-CGD (32 bits) Cholesky (32 bits) RMSE time RMSE time RMSE Student Performance 30 395 4.65 19s 4.65 (-0.0%) 5s 4.65 (-0.0%) Auto MPG 7 398 3.45 2s 3.45 (-0.0%) 0s 3.45 (-0.0%) Communities and Crime 122 1994 0.14 4m27s 0.14 (0.3%) 4m35s 0.14 (-0.0%) Wine Quality 11 4898 0.76 3s 0.76 (-0.0%) 0s 0.80 (4.2%) Bike Sharing Dataset 12 17379 145.06 4s 145.07 (0.0%) 1s 145.07 (0.0%) Blog Feedback 280 52397 31.89 24m5s 31.90 (0.0%) 53m24s 32.19 (0.9%) CT slices 384 53500 8.31 44m46s 8.34 (0.4%) 2h13m31s 8.87 (6.7%) Year Prediction MSD 90 515345 9.56 4m16s 9.56 (0.0%) 3m50s 9.56 (0.0%) Gas sensor array 16 4208261 90.33 48s 95.05 (5.2%) 42s 95.06 (5.2%) n d Number of parties 2 3 5 OT TI OT TI OT TI 5 · 104 20 1m50s 1s 1m32s 2s 1m7s 2s 5 · 104 100 42m12s 25s 34m39s 32s 24m58s 37s 5 · 105 20 18m18s 15s 14m29s 18s 12m10s 21s 5 · 105 100 7h3m56s 4m47s 5h20m52s 6m1s 4h17m8s 6m58s 1 · 106 100

  • 10m1s
  • 12m42s
  • 14m48s

1 · 106 200

  • 39m16s
  • 49m56s
  • 59m22s

Aggregation Phase Solving Phase

slide-25
SLIDE 25

Related Work

[1] Hall et al. (2011). Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics. [2] Nikolaenko et al. (2013). Privacy-preserving ridge regression on hundreds of millions of records. In Security and Privacy (SP). [3] Bogdanov et al. (2016). Rmind: a tool for cryptographically secure statistical analysis. IEEE Transactions on Dependable and Secure Computing.

Ref Crypto Solver n (max) d (max) Iterative Bottleneck

[1] HE Newton 50K 22 Local (40) Computation [2] HE+GC Cholesky 10M 14 No Both [3] SS CGD 10K 10 Network (10) Network * SS+GC CGD 1M 500 Local (20) Computation [4] HE GD-VWT 97 8 Local (4) Computation [5] SS SGD 1M 784 Network (100-1000) Network

[4] Esperanca et al. (2017). Encrypted Accelerated Least Squares Regression. In AISTATS. [5] Mohassel et al. (2017). SecureML: A System for Scalable Privacy-Preserving Machine Learning. In Security and Privacy (SP).

slide-26
SLIDE 26

Linear Regression - Conclusion

  • Full system is accurate and fast, available as open source
  • Scalability requires hybrid MPC protocols and non-trivial engineering
  • Robust fixed-point CGD inside GC has many other applications
  • Security against malicious adversaries
  • Classification with quadratic loss
  • Kernel ridge regression
  • Differential privacy on the covariance / at the output
  • Models without a closed-form solution (eg. logistic regression, DNN)
  • Library of re-usable ML components, complete data science pipeline

Summary Extensions Future Work

slide-27
SLIDE 27

In This Talk…

Part I: Privacy-Preserving Distributed Linear Regression on High-Dimensional Data Part II: Private Nearest Neighbors Classification in Federated Databases

PETS 2017, with Adria Gascon, Phillipp Schoppmann, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans Preprint, with Adria Gascon and Phillipp Schoppmann

slide-28
SLIDE 28

Document Classification - Overview

Private Computation Party 3 Party 2 Party 1 ?

Client

Setup:

  • Federated database held by multiple

(untrusting) parties

  • Database and client’s document should be

kept private

  • k-NN classification with TF-IDF features and

cosine similarity Contributions:

  • Multi-party computational DP protocol
  • DP computation of IDFs
  • MPC protocol for sparse inner products
  • Privacy against arbitrary collusions
slide-29
SLIDE 29

Document Classification with Nearest Neighbors d

ψd ∈ R|V |

ψd(v) = tfd(v) · idfZ(v)

idfZ(v) ≈ log |Z| |Zv|

v

Z

document dataset

V

vocabulary

score(d, x) = hψd, ψxi kψdkkψxk

  • 1. For each x in Z compute the score
  • 2. Label d by majority on top k scores
slide-30
SLIDE 30

Secret Sharing Baseline

= +

Plaintext TF-IDF2 for Z Additively shared TF-IDF2 between owners of Z Party 1 Party 2

Pros: Shares can pre-computed, inner product protocol Cons: Additive shares destroy sparsity

+

Party 1 Party 2

+ +

Client Vector aggregation and top k selection in standard MPC (eg. SPDZ) Plaintext TF for d

slide-31
SLIDE 31

Sparse Protocol

1.Compute IDFs on dataset Z using differential privacy

  • Implement Laplace and Exponential mechanism inside MPC protocol (eg.

SPDZ). Yields Computational Differential Privacy guarantees.

2.Use custom sparse matrix-vector multiplication protocol

  • Run between client and each data provider
  • Produce arithmetic shares as output

3.Aggregate shares to get scores and select top k

  • Same as in baseline protocol
slide-32
SLIDE 32

Computing IDFs with Differential Privacy

Algorithm 1: DP IDFs Input: Public: n, V, c0, L, "0 Input: Private: Counts {|Zi|v}v2V for i 2 [n] Output: Privatized values {˜ cv}v2V foreach v 2 V do Compute cv = Pn

i=1 |Zi|v

end for ` = 1, . . . , L do Sample v 2 V with probability / exp("0cv) Sample ⌘ from Lap(1/"0) Release ˜ cv = cv + ⌘ Remove v from V end For each v 2 V release ˜ cv = c0

Theorem 2. For any "0 2 (0, 0.9] and 2 [0, 1] the Algo- rithm 1 is (", )-DP with " = min ⇢ 2L"0, 2L"2

0 +

q 4L"2

0 log(1/)

  • .

Theorem 3. Let c0 = Θ(pm). If m is large enough, then with high probability we have kidf ˜ idfk1 kidfk1  ˜ O ✓ L V 1 "0m + ✓ 1 L V ◆ log(m) ◆ .

Term t Count ct L ⇡ IDFmax ⇡ c0

  • 1. sample

∝ exp(εct/2L)

  • 2. reveal ˜

ct = ct + Lap(2L/")

slide-33
SLIDE 33

Private Sparse Multiplication

Party 1 MPC Party 2

ICol

A {i | Coli(A) 6= 0}

Broadcast lA

  • ICol

A

  • Input: A 2 Zl⇥m

q

IRow

B

{j | Rowj(B) 6= 0} Broadcast lB

  • IRow

B

  • Input: B 2 Zm⇥n

q

ICol

A

IRow

B

Compute ICol

A \ IRow B . Then, choose random pair of

permutations π1, π2 of {1, . . . , lA+lB} such that for all k1 2 {1, . . . , lA}, k2 2 {1, . . . , lB}: ICol

A

  • k1 = IRow

B

  • k2 , π1(k1) = π2(k2).

F PERM π1 π2 ˆ A 0l⇥(lA+lB) For i = 1 to lA: i0 ICol

A

  • i

Coli( ˆ A) Coli0(A) ˜ A permuteCols( ˆ A, π1) ˆ B 0(lA+lB)⇥n For j = 1 to lB: j0 IRow

B

  • j

Rowj(ˆ B) Rowj0(B) ˜ B permuteRows(ˆ B, π2) Choose random C1, C2 2 Zl⇥n

q

such that C1 + C2 = ˜ A · ˜ B F MULT ˜ A ˜ B C1 C2

  • Idea: Reduce sparse multiplication to

non-sparse multiplication

  • How: Find common non-zero coefficients

and restrict to these coordinates

  • In MPC: Private set intersection
  • Leakage: Upper bound on number of

non-zeros

slide-34
SLIDE 34

Illustrative Experiments

Speed (vs. sparsity) Accuracy (vs. privacy)

slide-35
SLIDE 35

Document Classification - Conclusion

  • Non-parametric models are challenging from the privacy point of view
  • Changes in privacy assumptions enable different solutions
  • Protocols with different speed/privacy/accuracy trade-offs
  • Sparse matrix-vector multiplication is an important primitive for PMPML
  • Better DP algorithms for feature extraction
  • Other features instead of TF-IDF
  • Full open source implementation

Conclusions Future Work

slide-36
SLIDE 36

Take Home Points

  • Re-visiting basic ML algorithms from an MPC+DP

perspective yields important insights for tackling more complex problems

  • ML can motivate the development of new MPC

primitives (eg. linear algebra)

  • Rich toolbox, plenty of unexplored combinations
  • Trade-offs: privacy/speed/accuracy
  • Genuine interdisciplinary effort