Why Your Encrypted Database Is Not Secure Paul Grubbs Tom - - PowerPoint PPT Presentation

why your encrypted database is not secure
SMART_READER_LITE
LIVE PREVIEW

Why Your Encrypted Database Is Not Secure Paul Grubbs Tom - - PowerPoint PPT Presentation

Why Your Encrypted Database Is Not Secure Paul Grubbs Tom Ristenpart Vitaly Shmatikov Outsourced Applications Today Server data data Encrypt the data! Encrypt the Data App functionality no App functionality no longer works :( longer


slide-1
SLIDE 1

Why Your Encrypted Database Is Not Secure

Paul Grubbs Tom Ristenpart Vitaly Shmatikov

slide-2
SLIDE 2

Outsourced Applications Today

data data

Server

slide-3
SLIDE 3

Encrypt the data!

slide-4
SLIDE 4

Server

App functionality no longer works :( App functionality no longer works :(

Encrypted

data

Encrypted

data

Encrypt the Data

slide-5
SLIDE 5

use property-revealing encryption (PRE) Server

Encrypted

data

Encrypted

data

  • Searchable encryption
  • Deterministic encryption
  • Order-revealing encryption

Encrypt the Data

slide-6
SLIDE 6

Building “Secure” Systems

Encrypted

data

Encrypted

data

Server Server

slide-7
SLIDE 7

Server Server

Building “Secure” Systems

“computing on encrypted data”

slide-8
SLIDE 8

Building “Secure” Systems

  • CryptDB (SOSP 2011)
  • Mylar (NSDI 2014)
  • Seabed (OSDI 2016)
  • Arx
  • Many others
  • Lots of industry and

government interest!!

  • CryptDB (SOSP 2011)
  • Mylar (NSDI 2014)
  • Seabed (OSDI 2016)
  • Arx
  • Many others
  • Lots of industry and

government interest!!

slide-9
SLIDE 9

What They Claim

slide-10
SLIDE 10

“Magically Flexible Cryptography”

slide-11
SLIDE 11

Claims

emulates fully homomorphic encryption emulates fully homomorphic encryption provable confjdentiality provable confjdentiality semantic security semantic security the database does not leak the values

  • f sensitive fjelds, even if the attacker

has side information the database does not leak the values

  • f sensitive fjelds, even if the attacker

has side information

slide-12
SLIDE 12

Fallacy #1

Encryption scheme is “secure” does not mean The system is “secure”

slide-13
SLIDE 13

What This Talk Is About

Encrypted data Encrypted data

How to take a plausible encryption scheme … and build a completely insecure system from it

slide-14
SLIDE 14

Unsafe at Any Speed

  • CryptDB (SOSP 2011)
  • Mylar (NSDI 2014)
  • Seabed (OSDI 2016)
  • Arx
  • Many others
  • Lots of industry and

government interest!!

  • CryptDB (SOSP 2011)
  • Mylar (NSDI 2014)
  • Seabed (OSDI 2016)
  • Arx
  • Many others
  • Lots of industry and

government interest!!

… insecure under ANY real-world attack If you look at an actual commodity DBMS …

slide-15
SLIDE 15
slide-16
SLIDE 16

Server

Encrypted

data

Encrypted

data

Threat Models

Active Persistent passive “Snapshot”

slide-17
SLIDE 17
  • Secure against active attacks: false

– Grubbs et al. “Breaking web applications built on top

  • f encrypted data” (CCS 2016)
  • Secure against “snapshot” attacks: false

– Grubbs et al. “Why your encrypted database is not secure” (HotOS 2017)

  • Sensitivity analysis helps: false

– Bindschaedler et al. “The tao of inference in privacy- protected databases” (forthcoming)

Claims Meet Reality

slide-18
SLIDE 18

Security Against Active Attacks

slide-19
SLIDE 19

Mylar

My secret diary Hiring plan for 2017

Add orange user ( )

Insecure proxy re- encryption scheme

[see Van Rompay et al. 2017] Server, you can convert all my searches to blue key. Here’s a token to do it. Server, you can convert all my searches to blue key. Here’s a token to do it. I trust blue user I trust blue user

slide-20
SLIDE 20

Mylar Under Active Attack

Hiring plan for 2017

Search(w)

My secret diary

( )

slide-21
SLIDE 21

some user machines … collude with the server… because the adversary broke into a user’s machine some user machines … collude with the server… because the adversary broke into a user’s machine

slide-22
SLIDE 22

Mylar Under Active Attack

Hiring plan for 2017

Search(w)

My secret diary

( ) Search(w) ( ) +

+

= H(w)

Unkeyed “hash” of keyword. Perform dictionary attack. Unkeyed “hash” of keyword. Perform dictionary attack.

slide-23
SLIDE 23

… as long as none of the users with access to that data item use a compromised machine … as long as none of the users with access to that data item use a compromised machine

slide-24
SLIDE 24

Mylar Under Active Attack

Hiring plan for 2017

Search(w)

My secret diary

None of the users with access to this data item use a compromised machine None of the users with access to this data item use a compromised machine

slide-25
SLIDE 25

Mylar in a Hospital

One nurse loses their laptop, server can compromise every doctor’s private fjles One nurse loses their laptop, server can compromise every doctor’s private fjles

slide-26
SLIDE 26

… assuming there are no queries in the snapshot

“Snapshot” Threat Model

Server Existing systems explicitly claim security False in any realistic snapshot attack on a commodity DBMS

slide-27
SLIDE 27

A Simple System Abstraction

DB OS Volatile memory Persistent storage

slide-28
SLIDE 28

Actual Attacks

DB OS Volatile memory Persistent storage

SQL injection Full-system compromise Disk theft VM snapshot leak

slide-29
SLIDE 29

Case Study: MySQL

similar issues in any other commodity DBMS

Attack What MySQL leaks Failed encrypted database Disk theft MVCC data structures Arx’s range query index SQL Injection Past query statistics Seabed’s SPLASHE scheme Full system compromise or VM snapshot leak Text of past queries CryptDB, Lewi/Wu, etc.

slide-30
SLIDE 30

Disk Theft

If this is your threat model, just use full-disk encryption

slide-31
SLIDE 31

Logs on Disk

Insert Select Update

  • >

Up

Multi-version concurrency control using log data structures

In

General query log (not widely used) Binary log records modifjcations, used for replication and recovery

MVCC log

Data modifjcation queries can be reconstructed from these logs [FHMW ’10, FKSHW ’12] In all modern SQL databases!

slide-32
SLIDE 32

Arx

Ek(1) Ek(5) Ek(2) Ek(3) Ek(7) >=2 >=2 >=2 >=2

I used up these nodes Here, refresh nodes with these ciphertexts

Ek(5) Ek(2) Ek(3)

Range queries via chained garbled circuits Tree nodes become consumed, need replacing

Poddar et al.

slide-33
SLIDE 33

Security Claim for Arx

“Arx protects the database with the same level of security as regular AES-based encryption” “Arx protects the database with the same level of security as regular AES-based encryption”

Poddar et al.

slide-34
SLIDE 34

Arx Under Snapshot Attack

Range queries via chained garbled circuits Tree nodes become consumed, need replacing

Up Ek(3) Up Ek(5) Up Ek(2)

Ek(1) Ek(5) Ek(2) Ek(3) Ek(7) Ek(2) Ek(3) Ek(5)

Here, refresh nodes with these ciphertexts

Ek(5) Ek(2) Ek(3)

Consumed nodes immediately replaced, stored in MVCC log Query access pattern recorded on disk Snapshot attacker can recover queries and plaintexts using variants of attacks from [GSBNR - S&P ‘17]

slide-35
SLIDE 35

SQL Injection

Attack What MySQL leaks Failed encrypted database Disk theft MVCC data structures Arx’s range query index SQL Injection Past query statistics Seabed’s SPLASHE scheme Full system compromise or VM snapshot leak Text of past queries CryptDB, Lewi/Wu, etc.

slide-36
SLIDE 36

SQL Injection

Malicious code Runs here SQL injection accounted for 51% of all Web application attacks in 2016 (source: Akamai)

slide-37
SLIDE 37

Diagnostic Tables

Insert Select

performance_schema

performance_schema stores current query for all threads, statistics for past queries

Inserts: Selects:

Insert 1 1 2

Separate counts for queries which involve different columns

information_schema stores current query for all users, contents of buffer cache

slide-38
SLIDE 38

Problem: Frequency Analysis

Name Has given this talk before Paul Grubbs 1 Thomas Ristenpart Vitaly Shmatikov

Order-preserving encryption reveals histogram

  • f plaintext values

This is how Naveed et al. used frequency analysis to break CryptDB: match histogram to auxiliary model of data distribution

slide-39
SLIDE 39

Seabed

Name Has given this talk before Paul Grubbs 1 Thomas Ristenpart Vitaly Shmatikov

Each possible plaintext gets its own column

Name C2 C3 aspoiwnpoinio Ek(1) Ek(0) petryoiueytiew Ek(0) Ek(1) Xncmxncmbcn Ek(0) Ek(1) (“Has …”=1) (“Has …”=0)

SELECT Count(“Has … ”) WHERE “Has …”=1 SELECT Count(C2) Separate counts for queries which involve different columns WHERE clause transformed to correct column Papadimitriou et al. (OSDI 2016)

slide-40
SLIDE 40

Example

slide-41
SLIDE 41

SQLi Extracts Diagnostic Tables

SELECT Count(C3) SELECT Count(C2) SELECT Count(C3)

performance_schema:

Selects for C2: Selects for C3: 1 2 Separate counts for queries which involve different columns

Use frequency analysis to recover plaintexts (see paper for details)

slide-42
SLIDE 42

Full-System Snapshot

Attack What MySQL leaks Failed encrypted database Disk theft MVCC data structures Arx’s range query index SQL Injection Past query statistics Seabed’s SPLASHE scheme Full system compromise or VM snapshot leak Text of past queries CryptDB, Lewi/Wu, etc.

slide-43
SLIDE 43

Full-System Compromise

Leakage of sensitive data at OS level is well-studied [CPGR, DLJKSXSW] We focus on DBMS address space, things inaccessible to users

slide-44
SLIDE 44

Data Structures and Caches

Adaptive hash index tracks pages accesses, indexes automatically MySQL query cache stores select queries and results Other query caches (memcached) MySQL manages internal heaps, does not zero freed memory! Insert Select Select

essential for performance!

slide-45
SLIDE 45

Token-Based Systems

Select

Still there. Still there. Still there. Still there.

CryptDB, Mylar, Lewi-Wu, other searchable encryption schemes cannot be semantically secure if attacker sees a single search token

Search token

100,000 more random selects… 1,000 random selects… Waited a while…

Search token

slide-46
SLIDE 46

Let Me Make Myself Perfectly Clear

These encrypted databases CANNOT be semantically secure under ANY real-world attack

slide-47
SLIDE 47
slide-48
SLIDE 48

“I Will Build My Own Database”

You can try…

Transaction logs needed to support ACID Log-structured storage Caching Adaptive data structures adjust to workload … everything in modern databases leaks information about past queries

slide-49
SLIDE 49

Sensitivity Analysis

“strong” encryption deterministic encryption

  • rder-preserving

encryption can check for equality can sort

slide-50
SLIDE 50

Auxiliary Data

IND-CPA OPE DET

Public auxiliary data (e.g., previous release of similar datasets)

Distribution

  • f values in

each column Correlations between columns

slide-51
SLIDE 51

Bayesian Inference

OPE DET

Public auxiliary data (e.g., previous release of similar dataset)

Distribution

  • f values in

each column Distribution of ciphertexts in each column

slide-52
SLIDE 52

slide 52

Observed ciphertexts Plaintext distribution (from auxiliary data) Most likely mapping of ciphertexts to plaintexts Density of multinomial distribution

Multinomial Attack

Bindschaedler et al.

slide-53
SLIDE 53
  • Optimal

– Maximum likelihood estimator for deterministic ORE

  • Outperforms previous heuristics

– Naveed et al. frequency analysis (CCS 2015) – Grubbs et al. non-crossing attacks (Oakland 2017)

  • Extends to multiple columns

– Condition distribution on previously recovered plaintexts for a dependent column

Multinomial Attack

Bindschaedler et al.

slide-54
SLIDE 54

Inferring “Sensitive” Columns

IND-CPA OPE DET

Public auxiliary data (e.g., previous release of similar dataset)

Distribution

  • f values in

each column Correlations between columns

Features Prediction

Bindschaedler et al.

Multinomial attack!

slide-55
SLIDE 55

Let’s Try with Real Data

  • Over 7 million hospital discharge

records each year

  • Demographic + medical attributes
  • Over 3 million records each year
  • Demographic attributes, income
  • Data dump from 2015 hack
  • Names and addresses of over

600,000 police offjcers

slide-56
SLIDE 56
  • HCUP-NIS hospital discharge records

– Infer if patient has a mental health or substance abuse condition with 97% accuracy – … mood disorder with 96% accuracy

  • U.S. Census American Community survey

– Recover 90% of PRE-encrypted attributes – Infer income to within $8.4K

  • Fraternal Order of Police (FOP) data dump

– Exact home addresses of 5,500 police offjcers in PA

Empirical Results

slide-57
SLIDE 57

Remember

Encryption scheme is “secure” does not mean The system is “secure”

slide-58
SLIDE 58

Advice to Practitioners