Using Fully y Homomorphic Encryp yption for St Statistical - - PowerPoint PPT Presentation

using fully y homomorphic encryp yption for st
SMART_READER_LITE
LIVE PREVIEW

Using Fully y Homomorphic Encryp yption for St Statistical - - PowerPoint PPT Presentation

Using Fully y Homomorphic Encryp yption for St Statistical Analysi sis s of Categori rical, , Or Ordinal and and Num Numeric ical al Data Wen-jie Lu 1 , Shohei Kawasaki 1 , Jun Sakuma 1,2,3 1. University of Tsukuba, Japan 2. JST


slide-1
SLIDE 1

Using Fully y Homomorphic Encryp yption for St Statistical Analysi sis s of Categori rical, , Or Ordinal and and Num Numeric ical al Data

Wen-jie Lu1, Shohei Kawasaki1, Jun Sakuma1,2,3

  • 1. University of Tsukuba, Japan
  • 2. JST CREST
  • 3. RIKEN Center for AIP

1

slide-2
SLIDE 2

Statistical Analysis on the Cloud

Cloud computing is useful for statistical analysis

  • Gather distributed data, and reduce hardware cost.
  • Minimal interactions between data providers and the cloud.
  • The cloud does most of the work for the analyst.

Query & Result Data collection Third party cloud server Multiple data providers Analyst

2

slide-3
SLIDE 3

Cloud Computing with Sensitive Data

  • Using outside cloud servers raises privacy concerns.
  • E.g, medical records, federal data.
  • We want to calculate statistics on the cloud while

keeping the data secret.

Sensitive data

Third party cloud server

3

slide-4
SLIDE 4

Secure Multiparty Computation (SMC)

  • Off-the-shelf tools for SMC protocols
  • Yao’s garbled circuit (GC).
  • Fully homomorphic encryption (FHE).
  • But development cost and efficiency hinder

applications of GC and FHE in the cloud. Z = F(x, y)

x y

Only reveals Z!

x, y: private input F: public function

Yao Andrew. Protocols for secure secure computation. 1982.

  • Gentry. Fully homomorphic encryption using ideal lattices. 2009.

4

slide-5
SLIDE 5

GC on the Cloud Environment

5

Secret Sharing GC protocol

GC requires a large development cost

  • Multiple servers are needed.
  • Assume no collusion between servers.
  • Fast network is necessary for computation.
  • E.g., 10Gbps bandwidth.
slide-6
SLIDE 6

FHE on the Cloud Environment

  • Less development cost
  • Single server is enough.
  • Rapid network is not necessary.
  • But might be inefficient in practice
  • Encrypt bits one by one.
  • 1~10 ms per evaluation.
  • 1~10 megabytes per ciphertext.

Gentry et al. Homomorphic Evaluation of the AES Circuit. 2012.

6

ciphertexts FHE protocol

slide-7
SLIDE 7

Observation

  • Purpose of encrypting bits separately
  • To evaluate any Boolean function.
  • But to do statistical analysis, we can use
  • matrix arithmetic operation.
  • comparison operation.

7

slide-8
SLIDE 8

Our Result

  • Two new FHE-based primitives:
  • Matrix Operations
  • Batch Greater-than
  • Secure statistical protocols:
  • histogram (count),
  • order of counts,
  • contingency table (with cell-suppression),
  • percentile,
  • principal component analysis (PCA),
  • linear regression.
  • Source codes: https://github.com/fionser/CODA

8

slide-9
SLIDE 9

Preliminaries: Fully Homomorphic Encryption

  • Public-private key scheme.
  • Data providers & cloud share the public key.
  • The analyst holds the private key.
  • Allow addition (subtraction) and multiplication
  • n encrypted integers.
  • Analogy: black box with gloves

Brakerski et al. Fully Homomorphic Encryption without Bootstrapping. 2012. 9

slide-10
SLIDE 10

Preliminaries: Packing (Batching)

  • Enable to encrypt and process vectors at no extra cost.

N.P. Smart et al. Fully homomorphic SIMD operations. 2011. 1 2 3 4 8 7 6 5

+

9 9 9 9

x Single homomorphic

  • peration

1 2 3 4 8 7 6 5 8 14 18 20

  • Fewer ciphertexts
  • Faster computation

Multiple results

10

slide-11
SLIDE 11

Preliminaries: Slot Manipulation

Rotate slots of the encrypted vector.

Halevi et al. Algorithms in Helib. 2014.

1 2 3 4

>> 2

3 4 1 2

Replicate a specific slot.

8 5 1 5

@3

1 1 1 1

11

slide-12
SLIDE 12

Part II Technical Details

  • Data preprocessing.
  • Efficient matrix multiplication on ciphertexts.
  • Comparing two encrypted integers.
  • Example of two protocols:
  • Contingency table with cell-suppression
  • Linear regression

(for other protocols, refer to our paper).

12

slide-13
SLIDE 13

Data Preprocessing

  • Numerical data: fixed-point representation
  • 3.14159 → ⌈3.14159 ×1000⌋ = 3142
  • Precision (e.g., 1000) determined in advance
  • Categorical data: 1-of-k representation
  • Gender (i.e., k = 2). Female → [1, 0] and Male → [0, 1]
  • Ordinal data: stair-case encoding

13

slide-14
SLIDE 14

Proposed Matrix Primitive

  • Used for adding & multiplying encrypted matrices
  • Encrypt each row separately by packing.
  • Row-wise encryption.
  • Horizontally partitioned data
  • Efficient and layout consistent.
  • 𝑃 𝑂2 homomorphic operations.

14

slide-15
SLIDE 15

Matrix Multiplication[1/2]

  • Encrypt the matrix row by row with packing.

15

1 1 2 2 a b c d 1a+2c 1b+2d

multiply add multiply 11 2 3 42 × 1𝑏 𝑐 𝑑 𝑒2 = 11𝑏 + 2𝑑 1𝑐 + 2𝑒 3𝑏 + 4𝑑 3𝑐 + 4𝑒2

Replicate @1 @2

1 2 3

slide-16
SLIDE 16

Matrix Multiplication[1/2]

  • Encrypt the matrix row by row with packing.
  • N2 replications, multiplications and additions
  • 𝑃 𝑂2 complexity compared to 𝑃 𝑂3 (no packing).
  • Also row-wisely encrypted resulting matrix.

3 3 4 4 a b c d 1a+2c 1b+2d

16

multiply add multiply

3a+4c 3b+4d

11 2 3 42 × 1𝑏 𝑐 𝑑 𝑒2 = 11𝑏 + 2𝑑 1𝑐 + 2𝑒 3𝑏 + 4𝑑 3𝑐 + 4𝑒2

Replicate @1 @2

slide-17
SLIDE 17

Matrix Multiplication[2/2]

17

  • Layout consistency is important for developing

efficient statistical protocols.

  • Statistical algorithms need iterative matrix multiplications

Efficient for single multiplication

Layout consistent ??

Still efficient for iterative multi. Inefficient for iterative multi. Heavy layout adjustment Yes No

slide-18
SLIDE 18

Experimental Settings of Matrix Primitive

  • Implementations:
  • FHE: HElib (C++ based)
  • GC : ObliVM (java based)
  • Evaluated on 32-bit integers
  • Networks:
  • LAN (about 88 Mbps)
  • WAN (about 48 Mbps)

18

  • HElib. https://github.com/shaih/HElib.

Liu et al. ObliVM: A programming framework for secure computation. 2015.

slide-19
SLIDE 19

Evaluation of Matrix Primitive

0.1 1 10 100 1000 10000 2 4 8 16 32 64

Second #Matrix Dimension

FHE-LAN FHE-WAN GC-LAN GC-WAN

  • When do iterative multiplications, FHE-based primitive can
  • ffer better performance.
  • Save communication cost between each iteration

Execution Time

19 0.1 1 10 100 1000 10000 100000 2 4 8 16 32 64

MB Matrix Dimension

GC

16640 132096 1052672 8404992 67174400 537133056

FHE

Communication Cost

Elapsed Time (s)

Data Transferred (MB)

slide-20
SLIDE 20

Greater-than (GT) Primitive

GT e 𝑦 , 𝑓 𝑧 → 𝑓(𝑦 >? 𝑧) s.t. 0 ≤ 𝑦, 𝑧 ≤ D

  • [Golle06] based on Paillier cryptosystem:

𝑗𝑔 𝑦 > 𝑧 𝑢ℎ𝑓𝑜 ∃𝑙 ∈ 1, 𝐸 → 𝑦 − 𝑧 − 𝑙 = 0

  • Combination with packing gives great improvements:

𝑓 𝑦, … , 𝑦 − 𝑓 𝑧, … , 𝑧 − [1, 2, … , 𝐸] → 𝑓(𝜽)

  • 0 ∈ 𝜽 ⟺ 𝑦 > 𝑧 (i.e., decryption is needed)
  • Complexity from 𝐸 to ⌈D/ℓ ⌉.
  • Golle. A private stable matching algorithm. 2006.

20

Replicated D times

slide-21
SLIDE 21

Experimental Settings for GT Primitive

  • Implementations:
  • FHE: HElib (C++ based)
  • GC : ObliVM (java based)
  • Domain 𝐸 = 24 ~ 224
  • Number of slots ℓ ≈ 1700.
  • Networks:
  • LAN (about 88 Mbps)
  • WAN (about 48 Mbps)

21

  • HElib. https://github.com/shaih/HElib.

Liu et al. ObliVM: A programming framework for secure computation. 2015.

slide-22
SLIDE 22

Evaluation of Greater-than Primitive

0.1 1 10 100 1000 4 8 12 16 20 24

Second #Bits

FHE-LAN FHE-WAN GC-LAN GC-WAN

Works for small domains, which is enough for ordinal statistics.

22 0.001 0.01 0.1 1 10 100 1000 10000 4 8 12 16 20 24

MB #Bits

GC

76 88 100 112 124 136

FHE

Execution Time Communication Cost

Elapsed Time (s) Data Transferred (MB)

slide-23
SLIDE 23

Secure Statistical Protocols

  • Contingency table with cell-suppression protocol:
  • Use the greater-than primitive.
  • One round protocol between cloud and analyst.
  • Linear regression protocol:
  • Use the matrix primitive.
  • Two rounds protocol.
  • Use a Plaintext Precision Expansion technique (discuss it

latter).

23

slide-24
SLIDE 24

Contingency Table

Gender Smoke

Male Smoker Female Non-smoker Male Non-Smoker

Categorical data

Smoker Non-smoker Male

1 1

Female

1

Contingency Table

  • Indicator encoding:

Male → [1, 0], Female → [0, 1] Smoker → [1, 0], Non-smoker → [0, 1]

  • Basic Idea: multiply & rotate

[a1, a2] x [b1, b2] counts Male-Smoker, and Female-Nonsmoker [a1, a2] x ([b1, b2]>>1) = [a1, a2] x [b2, b1] gives other two counts.

  • Improvement with no extra preprocessing
  • O(max(k1,k2)) => O(log k1k2).

24

K1 = 2

K2 = 2

slide-25
SLIDE 25

Contingency Table: Cell Suppression

Smoker Non-smoker

Male 20 11 Female 3 12

Smoker Non-smoker

Male 20 11 Female 12

if < 10 zero out

Origin Table Suppressed Table

  • Protect the privacy of rare individuals.
  • Given a ciphertext 𝑓(𝑦), to compute 𝑓 𝑧

where

if 𝑦 > threshold then 𝑧 = 𝑦 else 𝑧 = some random value

  • 𝐻𝑈 𝑓 𝑦 , threshold = 𝑓 𝜽 . iff 𝑦 > threshold, then 0 ∈ 𝜽.
  • To compute {𝑓 𝑦 + 𝒔 , 𝑓 𝜽 + 𝒔 , 𝑓 𝜽 × 𝒔′ }
  • Non-zero random vectors 𝒔, 𝒔’
  • If 0 ∈ 𝜽, we have 0 ∈ 𝜽×𝒔’, then we can get 𝒔 and know 𝑦.

25

slide-26
SLIDE 26

Contingency Table Performance Evaluation

#records = 4000

  • Complexity increases logarithmically with the table sizes.
  • Most of the work (>90%) done by the cloud.

26

(k1k2)

Elapsed Time (s)

slide-27
SLIDE 27

Linear Regression (LR)

  • From data

𝒚𝑗, 𝑧𝑗

𝑗 , computes a model 𝒙 s.t.

𝒙 = (𝒀T𝒀)no𝒀T𝒛

  • The inversion of an encrypted matrix.

Division-free Matrix Inversion (𝑹, 𝜇): set 𝑩 o = 𝑹, 𝑺 o = 𝑱, 𝑏(o) = 𝜇, and iterate 𝑺 vwo = 2𝑏(v)𝑺 v − 𝑺 v 𝑩 v 𝑩 vwo = 2𝑏(v)𝑩 v − 𝑩 v 𝑩 v 𝑏(vwo) = 𝑏(v)𝑏(v)

[Guo06] 𝑺 v gives a good approximation to 𝜇yz𝑹no if 𝜇 is close to largest eigenvalue of 𝑹 (use PCA to compute 𝜇).

Layout consistency leads to efficient iterative protocols.

Guo et al. A Schur-Newton method for the matrix pth root and its inverse. 2006.

27

slide-28
SLIDE 28

Plaintext Precision Expansion (PPE)

  • Division-free algorithms introduce large integers. (𝜇yz)
  • But the current FHE library allows at most 60-bit integers.
  • Allows division-free algorithms without changing the

FHE library.

  • Uses K different FHE parameters (each b-bit < 60)
  • Achieves an equivalent Kb-bit parameter.
  • Increases the time by K times, but naturally parallelizable.
  • Direct application of the Chinese Remainder Theorem.

28

slide-29
SLIDE 29

Experiments: Linear Regression

16.90 18.34 62.685 67.62 189.07

  • Negligible decryption time (less than 2 s).
  • 20x faster than previous FHE solution [Wu et al. 12]
  • 5 dimensions (400+ mins).
  • Good scalability (reduced execution using more cores).

29

Number of Dimensions Elapsed Time (min)

slide-30
SLIDE 30

Summary

  • Secure statistical analysis in the cloud with multiple

data providers.

  • Two primitives
  • Matrix operation and greater-than
  • Two protocols.
  • Contingency table and linear regression.
  • Encoding and packing can improve FHE's balance

between generality and efficiency.

30