Se Secur ure Data a Type pes: A A Simp mple Ab Abstract - - PowerPoint PPT Presentation

se secur ure data a type pes a a simp mple ab abstract
SMART_READER_LITE
LIVE PREVIEW

Se Secur ure Data a Type pes: A A Simp mple Ab Abstract - - PowerPoint PPT Presentation

Se Secur ure Data a Type pes: A A Simp mple Ab Abstract ction for Co Confidentiality-Preserving g Da Data An Analytics Savvas Savvides, Julian Stephen, Masoud Saeida Ardekani, Vinaitheerthan Sundaram, Patrick Eugster Purdue


slide-1
SLIDE 1

Se Secur ure Data a Type pes: A A Simp mple Ab Abstract ction for Co Confidentiality-Preserving g Da Data An Analytics

Savvas Savvides, Julian Stephen, Masoud Saeida Ardekani, Vinaitheerthan Sundaram, Patrick Eugster Purdue University

slide-2
SLIDE 2

Introduction

Query Results Data Leakage Requirement: Confidentiality–preserving query execution

2

slide-3
SLIDE 3

Preserving Confidentiality

  • Fully homomorphic encryption (FHE)
  • Can express arbitrary computations
  • High overhead for complex queries
  • Partially homomorphic encryption (PHE)
  • Allows specific operations over encrypted data
  • E.g., addition, multiplication, comparisons, pattern match
  • Mutually incompatible (limited expressiveness)

3

slide-4
SLIDE 4

Current PHE-based Solutions

Drawbacks 1. Compilation transparent to data constraints 2. Compilation largely ignores encryption scheme properties Untrusted Cloud Trusted Client Side Trusted Service

ASHE [OSDI’16] Paillier [EUROCRT’99] E(x) + E(y)

  • E(x) + y
  • E(x) × y
  • Performance

symmetric asymmetric Security high high

4

3. No/Limited use of trusted service a) Give up (CryptDB [SOSP’11]) b) Split execution (Monomi [VLDB’13]) c) Re-encryption (Crypsis [ASE’14])

slide-5
SLIDE 5

Cuttlefish

Compilation Techniques Planner Engine

Untrusted Cloud Trusted Client Side Trusted Service Secure data types (SDTs)

  • Capture constraints and structure of

data à Compilation techniques

  • More optimized queries

à Planner engine

  • More efficient deployment
  • Can utilize trusted hardware

Encryption scheme properties

  • Capture supported operations,

performance and security guarantees of encryption schemes

5

slide-6
SLIDE 6

Secure Data Types

  • Sensitivity levels
  • high, low, public
  • Accounts for different security guarantees offered by cryptosystems
  • Data range
  • +/- numbers
  • Fixed ranges, e.g., 100-200
  • Composite types
  • Values containing multiple parts, e.g., dates, addresses, phones
  • E.g., composite[(4:int[+])-(2:int[range(1-12)])-(2:int[range(1-31)])]
  • Also: decimal accuracy, uniqueness, tokenization,

enumerated types, etc.

6

slide-7
SLIDE 7

Compilation Techniques

  • Expression rewriting
  • Simplify expressions involving composite types
  • E.g., d ≥ 2010-01-01 AND d < 2011-01-01

à y ≥ 2010 OR (y == 2010 AND m ≥ 01) ... à y == 2010

  • Condition expansion
  • Expand conditions to aggressively filter rows, based on

range information

  • E.g., x + y > c

à y > (c – max(x)) AND x + y > c

  • Similarly for [+, -, ×, /] and [==, >, ≥, <, ≤]

Short-circuit

7

slide-8
SLIDE 8

Compilation Techniques (cont.)

  • Selective encryption
  • Choose encryption scheme that does not require use of

trusted service

  • E.g., (x + y) × z where z is public
  • See paper for more compilation techniques

ASHE [OSDI’16] Paillier [EUROCRT’99] E(x) + E(y)

  • E(x) + y
  • E(x) × y
  • Performance

symmetric asymmetric ASHE [OSDI’16] Paillier [EUROCRT’99] E(x) + E(y)

  • E(x) + y
  • E(x) × y
  • Performance

symmetric asymmetric

à(ashe(x) + ashe(y)) × z à(paillier(x) + paillier(y)) × z

8

slide-9
SLIDE 9

Planner Engine

  • Cuttlefish Heuristic
  • Use a cost model to choose between re-encryption and split execution

at each step

  • Utilize trusted hardware, if available, to deploy an in-cloud re-

encryption service

A B C D E A B C D E

Greedy split execution

A B C D E

Greedy re-encryption

A B C D E

Cuttlefish heuristic Requires trusted service Split execution Re-encryption

9

slide-10
SLIDE 10

Evaluation

Cuttlefish

  • Apache Spark 2.1
  • Cuttlefish-TH: trusted service deployed using trusted hardware

(Intel SGX)

  • Cuttlefish-CS: trusted service deployed using remote client side

Setup

  • TPC-H and TPC-DS (subset) at scale 100
  • Cloud: 20 AWS m4.xlarge instances (4 CPUs and 16GB memory)
  • Client: 1 AWS c4.2xlarge instance (8 CPUs and 15GB memory)

10

slide-11
SLIDE 11

System Performance

100 200 300 400 500 600 700 800 Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Latency (s) Plaintext Cuttlefish-TH Cuttlefish-CS Monomi Crypsis

Average overhead compared to plaintext

  • Cuttlefish-TH: 2.34 ×
  • Cuttlefish-CS: 3.05 ×

Average performance gains

  • 3.35× faster than Monomi
  • 3.71× faster than Crypsis

TPC-H

11

slide-12
SLIDE 12

Compilation Techniques Performance

100 200 300 400 500 600 700 800 900 Q03 Q07 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98 Latency (s)

Plaintext Cuttlefish-TH

  • Expression rewriting
  • Condition expansion
  • Selective encryption
  • Efficient encryption

Average overhead compared to plaintext

  • With Compilation techniques: 1.69 ×
  • Without Compilation techniques: 4.23 ×

TPC-DS

12

slide-13
SLIDE 13

Conclusion

  • Cuttlefish enables efficient data analytics in public

clouds

  • Secure data types
  • Capture constraints and structure of data
  • Compilation techniques
  • Enable more efficient queries
  • Planner engine
  • Optimized use of trusted service

13

slide-14
SLIDE 14

Thank you!

14