 
              Se Secur ure Data a Type pes: A A Simp mple Ab Abstract ction for Co Confidentiality-Preserving g Da Data An Analytics Savvas Savvides, Julian Stephen, Masoud Saeida Ardekani, Vinaitheerthan Sundaram, Patrick Eugster Purdue University
Introduction Query Data Leakage Results Requirement: Confidentiality–preserving query execution 2
Preserving Confidentiality • Fully homomorphic encryption (FHE) • Can express arbitrary computations • High overhead for complex queries • Partially homomorphic encryption (PHE) • Allows specific operations over encrypted data • E.g., addition, multiplication, comparisons, pattern match • Mutually incompatible (limited expressiveness) 3
Current PHE-based Solutions Drawbacks Trusted Client Side Untrusted Cloud 1. Compilation transparent to data constraints 2. Compilation largely ignores encryption scheme properties ASHE Paillier [OSDI’16] [EUROCRT’99] E(x) + E(y) � � E(x) + y � E(x) × y � Performance symmetric asymmetric Security high high 3. No/Limited use of trusted service a) Give up (CryptDB [SOSP’11]) b) Split execution (Monomi [VLDB’13]) Trusted Service c) Re-encryption (Crypsis [ASE’14]) 4
Cuttlefish Secure data types (SDTs) Trusted Client Side Untrusted Cloud • Capture constraints and structure of data Compilation Techniques Encryption scheme properties Capture supported operations, • performance and security guarantees of encryption schemes à Compilation techniques Planner Engine More optimized queries • à Planner engine • More efficient deployment Can utilize trusted hardware • Trusted Service 5
Secure Data Types • Sensitivity levels • high , low , public • Accounts for different security guarantees offered by cryptosystems • Data range • + / - numbers • Fixed range s, e.g., 100-200 • Composite types • Values containing multiple parts, e.g., dates, addresses, phones • E.g., composite [(4:int[ + ])-(2:int[ range (1-12)])-(2:int[ range (1-31)])] • Also: decimal accuracy, uniqueness, tokenization, enumerated types, etc. 6
Compilation Techniques • Expression rewriting • Simplify expressions involving composite types • E.g., d ≥ 2010-01-01 AND d < 2011-01-01 à y ≥ 2010 OR (y == 2010 AND m ≥ 01) ... à y == 2010 • Condition expansion • Expand conditions to aggressively filter rows, based on range information • E.g., x + y > c Short-circuit à y > (c – max (x)) AND x + y > c • Similarly for [ + , - , × , / ] and [ == , > , ≥ , < , ≤ ] 7
Compilation Techniques (cont.) • Selective encryption • Choose encryption scheme that does not require use of trusted service • E.g., (x + y) × z where z is public à (ashe(x) + ashe(y)) × z à (paillier(x) + paillier(y)) × z ASHE ASHE Paillier Paillier [OSDI’16] [OSDI’16] [EUROCRT’99] [EUROCRT’99] E(x) + E(y) E(x) + E(y) � � � � E(x) + y E(x) + y � � E(x) × y E(x) × y � � Performance Performance symmetric symmetric asymmetric asymmetric • See paper for more compilation techniques 8
Planner Engine A A A A Requires trusted service B B B B Split execution C C C C Re-encryption D D D D E E E E Greedy split execution Greedy re-encryption Cuttlefish heuristic • Cuttlefish Heuristic • Use a cost model to choose between re-encryption and split execution at each step • Utilize trusted hardware , if available, to deploy an in-cloud re- encryption service 9
Evaluation Cuttlefish • Apache Spark 2.1 • Cuttlefish-TH: trusted service deployed using trusted hardware (Intel SGX) • Cuttlefish-CS: trusted service deployed using remote client side Setup • TPC-H and TPC-DS (subset) at scale 100 • Cloud: 20 AWS m4.xlarge instances (4 CPUs and 16GB memory) • Client: 1 AWS c4.2xlarge instance (8 CPUs and 15GB memory) 10
System Performance Plaintext Cuttlefish-TH 800 Cuttlefish-CS Monomi 700 Crypsis 600 Latency (s) 500 400 300 200 100 0 Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 TPC-H Average overhead compared to plaintext Average performance gains • Cuttlefish-TH: 2.34 × • 3.35× faster than Monomi • Cuttlefish-CS: 3.05 × • 3.71× faster than Crypsis 11
Compilation Techniques Performance 900 Plaintext 800 Cuttlefish-TH - Expression rewriting 700 - Condition expansion 600 - Selective encryption Latency (s) - Efficient encryption 500 400 300 200 100 0 Q03 Q07 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98 TPC-DS Average overhead compared to plaintext With Compilation techniques: 1.69 × • Without Compilation techniques: 4.23 × • 12
Conclusion • Cuttlefish enables efficient data analytics in public clouds • Secure data types • Capture constraints and structure of data • Compilation techniques • Enable more efficient queries • Planner engine • Optimized use of trusted service 13
Thank you! 14
Recommend
More recommend