Lewis U
Security of Searchable Encrypted Cloud Storage
Rutgers U
David Cash Jason Perry
Cornell Tech
Tom Ristenpart
Skyhigh Networks
Paul Grubbs
Encrypted Cloud Storage David Cash Paul Grubbs Jason Perry Tom - - PowerPoint PPT Presentation
Security of Searchable Encrypted Cloud Storage David Cash Paul Grubbs Jason Perry Tom Ristenpart Rutgers U Skyhigh Networks Lewis U Cornell Tech Outsourced storage and searching client cloud provider give me all records containing
Lewis U
Rutgers U
David Cash Jason Perry
Cornell Tech
Tom Ristenpart
Skyhigh Networks
Paul Grubbs
client cloud provider
Outsourced storage and searching
give me all records containing “sunnyvale” , ,
techniques
client cloud provider ???
End-to-end encryption breaks searching
give me all records containing “sunnyvale”
Searchable Encryption Research
Usability
supported?
Security
a dishonest server can learn
Efficiency
used by server and client
This Talk: Not treated: More theoretical, highly secure solutions (FHE, MPC, ORAM, …)
client
Searchable Symmetric Encryption
nCeUKlK7GO5ew6mwpIra ODusbskYvBj9GX0F0bNv puxtwXKuEdbHVuYAd4mE ULgyJmzHV03ar8RDpUE1 6TfEqihoa8WzcEol8U8b Q1BzLK368qufbMMHlGvN sOVqt2xtfZhDUpDig8I0 jyWyuOedYOvYq6XPqZc2 5tDHNCLv2DFJdcD9o4FD
cloud provider
5
Want docs containing word w = “simons” Search token: Tw c1, c2, c3, … Should not learn docs or queries
[SWP’00, CGKO’06, …]
Other SE types deployed (and sold)
Typically lower security than SSE literature solutions, as we will see.
How SE is analyzed in the literature
SE uses a weakened type of definition: Crypto security definitions usually formalize e.g.: “nothing is leaked about the input, except size” Example L outputs:
records contain the same keyword
What does L-secure mean in practice?
Messy question which depends on:
Currently almost no guidance in the literature.
keyword records 45e8a 4, 9,37 092ff 9,37,93,94,95 f61b5 9,37,89,90 cc562 4,37,62,75
“this keyword is the most common” “record #37 contains every keyword, and overlaps with record #9 a lot”
Attacking SE: An example
[Islam-Kuzu-Kantarcioglu]
Under certain circumstances, queries can be learned at a high rate (80%) by a curious server who knows all of the records that were encrypted. Bad news:
One prior work: Learning queries
(sketched later)
to query recovery
recovery, with experiments
implementations.
This work: Practical Exploitability of SE Leakage
Datasets for Attack Experiments
Enron Emails
intra-company email)
Apache Emails
Processed with standard IR keyword extraction techniques (Porter stemming, stopword removal)
Outline
Query recovery using document knowledge
14
rec1 rec2 rec3 rec4 Q1 1 Q2 1 Q3 1 1 1 Q4 1 1 Q5 1 1 Q6 1
Leakage (unknown queries):
[Islam-Kuzu-Kantarcioglu]
Attack setting:
keyword records sunnyvale 4,37,62,75 rutgers 9,37,93,94,95 admissions 4, 9,37 committee 8,37,89,90
Inverted index (known):
… … …
e.g., public financial data
The IKK attack (sketch)
15
[Islam-Kuzu-Kantarcioglu]
rec1 rec2 rec3 rec4 Q1 1 Q2 1 Q3 1 1 1 Q4 1 1 Q5 1 1 Q6 1
Leakage (unknown queries):
… …
problem for finding mapping from queries to keywords
queries, certain distributions
Observation
16
When a query term returns a unique number of documents, then it can immediately be guessed The IKK attack requires the server to have virtually perfect knowledge of the document set If so, then why not just look at the number of documents returned by a query?
Query Recovery via Counts
“disambiguate” remaining queries by checking intersections
17
rec1 rec2 rec3 rec4 Q1 1 Q2 1 Q3 1 1 1 Q4 1 1 Q5 1 1 Q6 1
Leakage: Q3 matched 3 records, so it must be “rutgers” Q2 overlapped w/
“rutgers” so it must be “sunnyvale”
Query Recovery Experiment
keywords, runs in seconds
18
Setup:
Query Recovery with Partial Knowledge
knowledge
was revealed
21
Query Recovery with Partial Knowledge
Enron subset, 500 most frequent keywords (stemmed, non- stopwords), 150 queried at random, 5% of queries initially given to server as hint
Outline
Document Recovery using Partial Knowledge
Client
Emails SE index
This blob indexes some docs I happen to know and others I don’t… What does that tell me?
Passive Document Recovery Attack Setting
Leakage that we attack
queries are issued
Record 1:
The quick brown fox […] zAFDr7ZS99TztuSBIf[…] H(K,quick), H(K,brown), H(K,fox), …
Example systems:
[Lau et al’14] [He et al’14]
Record 2:
The fast red fox […] Hs9gh4vz0GmH32cXK5[…] H(K,fast), H(K,red), H(K,fox), …
Record 1: Record 2:
Simple Observation
Doc 1:
zAFDr7ZS99TztuSBIf[…] H(K,quick), H(K,brown), H(K,fox), …
Doc 2:
zAFDr7ZS99TztuSBIf[…] H(K,fast), H(K,red), H(K,fox), …
appears in other docs
Known: Unknown:
Document Recovery with Partial Knowledge
emails
Anecdotal Example
word once
The effect of one public document
Case study: A single email from the Enron corpus, sent to 500 employees
consulting group. The vocabulary of this single document gives us on average 35%
Outline
Local Proxy
Emails SE index
Chosen-Document-Addition Attacks
update protocol
Leakage from my crafted email!
Chosen-Document Attack ⇒ Learn chosen hashes
Doc 1:
The quick brown fox […]
Doc 1:
zAFDr7ZS99TztuSBIf[…] H(K,quick), H(K,brown), H(K,fox), …
New Doc:
contract sell buy VcamU4a8hXcG3F55Z[…] H(K,contract),H(K,buy), H(K,sell), …
New Doc:
Chosen Document Attack Experiment
test on Enron)
number of chosen documents emails
Chosen Document Attack Experiment Results
Systematic study of exploitability of multiple SE leakage types reveals serious vulnerabilities.
empirical characterization of what one can do with the leakage
Conclusion
quantitatively how to use SE
translation against word substitution
Future Work and Open Problems