Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , - - PowerPoint PPT Presentation

oblivious coopetitive analytics using hardware enclaves
SMART_READER_LITE
LIVE PREVIEW

Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , - - PowerPoint PPT Presentation

Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , Chester Leung, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica (UC Berkeley) EuroSys 2020 April 28, 2020 The need for coopetitive analytics Analytics can extract value


slide-1
SLIDE 1

Oblivious Coopetitive Analytics Using Hardware Enclaves

Ankur Dave, Chester Leung, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica (UC Berkeley) EuroSys 2020 April 28, 2020

slide-2
SLIDE 2

The need for coopetitive analytics

  • Analytics can extract value from big data
  • But datasets often span multiple competing parties
slide-3
SLIDE 3

Example: Financial risk assessment

“How much subprime debt have all banks issued?”

  • Banks want to assess

systemic risk

  • This requires cooperation

among competing banks

  • Sharing data creates

security, regulatory, business, and liability concerns

SELECT SUM(loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;

slide-4
SLIDE 4

Threat model

  • Network attacker can see

and modify all network traffic but cannot access machines

  • Malicious party attackers

can additionally see and modify computation within their machines + collude with other parties

240 bytes sent from party 2 to party 3 if (c.credit_score < 630) { result[c.ssn] += c.loan_amount }

slide-5
SLIDE 5

Approach 1: Cryptography

Specialized systems: Conclave, DJoin, private intersection-sum, Prio, UnLynx, MedCo, …

  • Limited functionality – cannot support rich analytics

Generic approaches: SMCQL, AgMPC

  • Prohibitive overhead
slide-6
SLIDE 6

Untrusted OS

Approach 2: Hardware enclaves

  • Trusted code runs shielded from OS and

processes on the same host

  • Memory access pattern leakage

Enclave Secret data Trusted code Enclave Enclave Remote attestation

slide-7
SLIDE 7

Access pattern leakage

ID Credit score Loan amount 1 720 $2,500 2 600 $500 2 600 $250 3 600 $500 Total loans $1,250

Access patterns leak information such as filter selectivity

SELECT SUM(loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;

Memory access

slide-8
SLIDE 8

Oblivious algorithms

ID Credit score Loan amount 1 720 $2,500 2 600 $500 2 600 $250 3 600 $500 Total loans $1,250

Oblivious algorithms hide access patterns at a performance cost

SELECT SUM(loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;

Memory access Dummy access

slide-9
SLIDE 9

Previous approaches using hardware enclaves

Not oblivious: SCONE, Graphene, Haven, VC3

  • Side channel leakage

Oblivious: Cipherbase, Opaque

  • Must maintain remote copy of large datasets; expensive to update
  • If applied to WAN setting, inefficient due to high-bandwidth shuffles
slide-10
SLIDE 10

Oblivious Coopetitive Queries (OCQ)

  • Designed for oblivious coopetitive analytics
  • Supports general SQL queries with better performance than

previous approaches

  • Protects against network attacker and malicious party attackers

(in the hardware enclave model)

slide-11
SLIDE 11

Oblivious Coopetitive Queries (OCQ)

OCQ Planner Jointly Approved Queries Secure Federated Plan Party n Party 2 Party 1 Shared Result Federated Execution Oblivious operators

  • n joint data

Authenticated

  • perators on

parties’ own data Parties must agree

  • n fixed queries

and input data in advance Replicated across parties Each party must have at least one hardware enclave

slide-12
SLIDE 12

Challenges and Techniques

  • 1. Combining data of mixed sensitivities

→ Approach: Mixed-sensitivity algorithms

  • 2. Query planning with sensitive cardinalities

→ Approach: Schema-aware padding

  • 3. Oblivious queries in the wide area

→ Federated- and security-aware planner

slide-13
SLIDE 13

Sensitivity propagation

Parties specify sensitivity of each table: Public or Sensitive Propagate sensitivity according to foreign keys and operators

Demographics Region Customer

⋈ ⋈

Customer

c_ssn c_name

Loan

l_id l_ssn c_zip c_credit_score

Region

r_zip r_population l_amount

Demographics

d_id d_zip d_income

Foreign key relationships

slide-14
SLIDE 14

102 103 104 105 106 107

Join inSut size

0.0 0.5 1.0 1.5 2.0 2.5 3.0

6SeeduS

Mixed-sensitivity oblivious join

Joining Sensitive tables across parties produces a mixed-sensitivity join Mixed-sensitivity oblivious join algorithm:

  • 1. Sort Public and Sensitive sides

separately

  • 2. Oblivious bitonic merge join

Up to 2.5x speedup vs. fully-oblivious join for equal-sized tables

slide-15
SLIDE 15

Schema-aware padding

  • Cardinalities are particularly sensitive in the federated setting
  • Naïve “filter push-up” approaches to padding are very expensive
  • Find tighter padding bounds using foreign key constraints

SELECT c_zip, AVG(l_amount / d_income) FROM customer JOIN loan ON c_ssn = l_ssn JOIN region ON c_zip = r_zip JOIN demographics ON r_zip = d_zip GROUP BY c_zip

Customer

c_ssn c_name

Loan

l_id l_ssn c_zip c_credit_score

Region

r_zip r_population l_amount

Demographics

d_id d_zip d_income

Foreign key relationships

slide-16
SLIDE 16

Federated planner

SELECT SUM(loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;

Loan Customer Fed Filter Broadcast to Fed Fed-Obl Mixed-Sensitivity Broadcast Join Fed-Obl Agg Collect to Single Site Single-Site-Obl Agg

Determines how to run the query and where to run each operator

Fed: Partitioned across all parties’ enclaves Fed-Obl: Partitioned across enclaves +

  • blivious algorithms

Both input tables Sensitive Data movement Data movement Single-Site-Obl: At querier’s enclaves + oblivious algorithms

slide-17
SLIDE 17

Evaluation setup

  • 5 geo-distributed parties
  • ~10 MB/s bandwidth
  • Synthetic data, table sizes 4.3 MB–10 GB
slide-18
SLIDE 18

2SDque 2C4 60C4L DJRLQ 101 102 103 104 105 106

5uQQLQJ tLme (s)

270 27 100000 230 39 200000 74 74 3000 56 16 27000

CRmRrbLdLty AsSLrLQ cRuQt DJRLQ 41 DJRLQ 45

OCQ vs. prior work

  • Orders of magnitude faster than SMCQL and DJoin due to trusted hardware
  • Faster than Opaque because OCQ can execute initial filters in plaintext
slide-19
SLIDE 19

CRPRrbLdLty AsSLrLQ cRuQt DJRLQ 41 DJRLQ 45 100 101 102 103

5uQQLQJ tLPe (s)

270 230 74 56 12 7.1 6.4 3.2 27 39 74 16 270 42 190 190 3.0 5.0

2utsRurced 2SDque 3ODLQtext federDted 2C4 2C4 w/SDddLQJ 2utsRurced 6SDrk 64/

Overhead of OCQ’s security

  • 2.2–25x overhead vs. insecure federated or outsourced Spark SQL
slide-20
SLIDE 20

Summary of OCQ’s contributions

Efficient, general framework for oblivious coopetitive analytics

  • 1. Mixed-sensitivity oblivious join and aggregation algorithms
  • 2. Schema-aware padding
  • 3. Secure coopetitive query planner