CS2 : A Searchable Cryptographic Cloud Storage System Seny Kamara - - PowerPoint PPT Presentation

cs2 a searchable cryptographic cloud
SMART_READER_LITE
LIVE PREVIEW

CS2 : A Searchable Cryptographic Cloud Storage System Seny Kamara - - PowerPoint PPT Presentation

CS2 : A Searchable Cryptographic Cloud Storage System Seny Kamara (MSR) Charalampos Papamanthou (UC Berkeley) Tom Roeder (MSR) Cloud Computing Cloud Computing o Main concern o will my data be safe? o will anyone see it? o can anyone modify it?


slide-1
SLIDE 1

CS2: A Searchable Cryptographic Cloud Storage System

Seny Kamara (MSR) Charalampos Papamanthou (UC Berkeley) Tom Roeder (MSR)

slide-2
SLIDE 2

Cloud Computing

slide-3
SLIDE 3
  • Main concern
  • will my data be safe?
  • will anyone see it?
  • can anyone modify it?
  • Security solutions
  • VM isolation
  • Single-tenant servers
  • Access control
  • Cloud provides stronger security than self-hosting [Molnar-Schecter-10]
  • Q: but what if I don’t trust the cloud operator?

Cloud Computing

slide-4
SLIDE 4

Cloud Storage

?

slide-5
SLIDE 5

Traditional Approach

? AEncK AEncK AEncK AEncK AEncK

slide-6
SLIDE 6
  • File-based access is hard (esp. for large data)
  • Search-based access is preferred
  • Web search
  • Desktop search
  • Apple Spotlight, Google Desktop, Windows Desktop
  • Enterprise search

Search-based Access

slide-7
SLIDE 7

Two Simple Solutions to Search

?

Large comm. complexity

id2

Large local storage

Q: can we achieve the best of both?

AEncK AEncK AEncK AEncK

slide-8
SLIDE 8
  • Motivation
  • CS2 building blocks
  • Symmetric searchable encryption
  • Search authenticators
  • Proofs of storage
  • CS2 Protocols
  • for standard search
  • for assisted search
  • Experiments

Outline

slide-9
SLIDE 9

CS2 Building Blocks

slide-10
SLIDE 10

Searchable Symmetric Encryption [SWP01]

tw

EncK EncK EncK

slide-11
SLIDE 11
  • [Goldreich-Ostrovsky-96]
  • : hides everything
  • : interactive
  • [Song-Wagner-Perrig-01]
  • : non-interactive
  • : static, linear search time, leaks information
  • [Goh03, Chang-Mitzenmacher-05]
  • : non-interactive, dynamic
  • : linear search time, non-adaptive security (CKA1-security)
  • [Curtmola-Garay-K-Ostrovsky-06]
  • : non-interactive, sub-linear search (optimal), adaptive security
  • : static

Searchable Symmetric Encryption

We need new SSE!

slide-12
SLIDE 12

Proofs of Storage [ABC+07, JK07]

C π

slide-13
SLIDE 13
  • [ABC+07,JK07,SW08,DVW09,AKK09]
  • : efficient
  • : static
  • [APMT08]
  • : efficient and dynamic
  • : bounded verifications
  • [EKPT09]
  • : efficient, dynamic, unlimited verification
  • : patented

Proofs of Storage

We need new PoS!

slide-14
SLIDE 14

Search Authenticator

𝑥 π

slide-15
SLIDE 15
  • [GGP10,CVK10,CVK11]
  • : general-purpose
  • : inefficient (due to FHE) & static
  • [CRR11]
  • : general-purpose, efficient
  • : requires two non-colluding clouds
  • [BGV11]
  • : proof generation is linear & static

Search Authenticators

We need new VC/SA!

slide-16
SLIDE 16
  • Motivation
  • CS2 building blocks
  • Symmetric searchable encryption
  • Search authenticators
  • Proofs of storage
  • CS2 Protocols
  • for standard search
  • for assisted search
  • Experiments

Outline

slide-17
SLIDE 17

GOOG IBM AAPL MSFT

SSE-1 [CGKO06]

MSFT GOOG AAPL

IBM

F2 F10 F11 F2 F8 F14 F1 F2 F4 F10 F12

  • 1. Build inverted/reverse index

F11 F8 F2 F10 F1 F4 F12 F10 F2 F2 F14 #

  • 2. Randomly permute array & nodes

Posting list

slide-18
SLIDE 18

GOOG IBM AAPL MSFT GOOG IBM AAPL MSFT

SSE-1 [CGKO06]

F11 F8 F2 F10 F1 F4 F12 F10 F2 F2 F14 #

  • 2. Randomly permute array & nodes
  • 3. Encrypt nodes
slide-19
SLIDE 19

SSE-1 [CGKO06]

  • 3. Encrypt nodes
  • 4. ‚Hash‛ keyword & encrypt pointer

GOOG IBM AAPL MSFT F

K(GOOG)

Enc(•)

F

K(IBM)

Enc(•)

F

K(AAPL)

Enc(•)

FK(MSFT)

Enc(•)

slide-20
SLIDE 20
  • Non-adaptively secure ⇒ adaptive security
  • Idea #1 [Chase-K-10]
  • replace encryption scheme with symmetric non-committing encryption
  • only requires a PRF + XOR
  • : doesn’t work for dynamic data
  • Idea #2
  • Use RO + XOR

Limitations of SSE-1

slide-21
SLIDE 21
  • Static data ⇒ dynamic data
  • Problem #1:
  • given new file FN = (AAPL, …, MSFT)
  • append node for F to list of every wi in F

Limitations of SSE-1

MSFT GOOG AAPL

IBM

F2 F10 F11 F2 F8 F14 F1 F2 F4 F10 F12 FN FN

F

K(GOOG)

Enc(•)

F

K(IBM)

Enc(•)

F

K(AAPL)

Enc(•)

FK(MSFT)

Enc(•)

  • 1. Over unencrypted index
  • 2. Over encrypted index ???
slide-22
SLIDE 22
  • Static data ⇒ dynamic data
  • Problem #2:
  • When deleting a file F2 = (AAPL, …, MSFT)
  • delete all nodes for F2 in every list

Limitations of SSE-1

MSFT GOOG AAPL

IBM

F2 F10 F11 F2 F8 F14 F1 F2 F4 F10 F12

F

K(GOOG)

Enc(•)

F

K(IBM)

Enc(•)

F

K(AAPL)

Enc(•)

FK(MSFT)

Enc(•)

  • 1. Over unencrypted index
  • 2. Over encrypted index ???
slide-23
SLIDE 23
  • Static data ⇒ dynamic data
  • Idea #1
  • Memory management over encrypted data
  • Encrypted free list
  • Idea #2
  • List manipulation over encrypted data
  • Use homomorphic encryption (here just XOR) so that pointers can be

updated obliviously

  • Idea #3
  • deletion is handled using an ‚dual‛ SSE scheme
  • given deletion/search token for F2 , returns pointers to F2 ‘s nodes
  • then add them to the free list homomorphically

Limitations of SSE-1

slide-24
SLIDE 24
  • Motivation
  • Related work & our approach
  • CS2 building blocks
  • Symmetric searchable encryption
  • Search authenticators
  • Proofs of storage
  • CS2 Protocols
  • for standard search
  • for assisted search
  • Experiments

Outline

slide-25
SLIDE 25
  • Inefficient ⇒ practical
  • Idea #1
  • Design special-purpose scheme (i.e., just for verifying search)
  • Idea #2
  • Use Merkle Tree ‚on top‛ of inverted index
  • For keyword w: we efficiently verify its posting list and associated files
  • Generating proof is O(w*) instead of O(n)
  • Static ⇒ dynamic
  • Idea #1
  • Replace bottom hash with incremental hash
  • [Bellare-Goldreich-Goldwasser94, Bellare-Micciancio97]

Limitations of Verifiable Computation

slide-26
SLIDE 26

Search Authenticators

F2 F10 F11 F2 F8 F14 F1 F2 F4 F10 F12

MSFT GOOG AAPL IBM IH IH IH IH MSFT GOOG AAPL

IBM

F2 F10 F11 F2 F8 F14 F1 F2 F4 F10 F12

  • 1. Build inverted/reverse index
  • 2. Build Merkle tree w/ IH at leaves

Problem: hash functions are not hiding!

slide-27
SLIDE 27

Search Authenticators

MSFT GOOG AAPL IBM IH IH IH IH 2’. Build Merkle tree w/ IH at leaves over encrypted files Problem: server has file encryptions so he can

  • 1. IH a set of files
  • 2. check result against a leaf hash
  • 3. determine if files contain common keyword
slide-28
SLIDE 28

Search Authenticators

MSFT GOOG AAPL IBM IH IH IH IH 2’’. Build Merkle tree w/ IH at leaves over keyed hash

  • f encrypted files

Problem: server has file encryptions so he can

  • 1. IH a set of files
  • 2. check result against a leaf hash
  • 3. determine if files contain common keyword

FK( ) FK( ) FK( )

slide-29
SLIDE 29

Proofs of Storage

slide-30
SLIDE 30

CS2 Protocols

slide-31
SLIDE 31
  • Standard search
  • User searches for w
  • Server returns documents w/ w
  • Relatively straightforward combination of (dynamic) SSE, PoS & SA
  • Assisted search
  • User searches for w
  • Server returns summaries of files with w
  • User chooses a subset to retrieve
  • Server returns subset of files with w
  • More complex combination of (dynamic) SSE, PoS, SA + CRHF
  • Search can be more efficient (since less data is returned)

CS2 Protocols

slide-32
SLIDE 32
  • Definitions in ideal/real-world model
  • Cloud storage w/ standard search
  • Cloud storage w/ assisted search
  • easier to use within larger protocols (i.e., hybrid security models)
  • Single definition for all desired properties
  • guarantees composition of underlying primitives is OK
  • : definitions & proofs are complicated
  • Protocols make black-box use of primitives
  • : modularity -- replace underlying primitives

CS2 Protocols

slide-33
SLIDE 33

Experiments

slide-34
SLIDE 34
  • C++
  • Microsoft Cryptography API: Next Generation
  • RO: SHA256
  • PRFs: HMAC-SHA256
  • SKE: 128-bit AES/CBC
  • Bignum library
  • Prime fields
  • We test only the crypto overhead
  • No file transfers over network
  • No reading from disk
  • No indexing costs

Implementation

slide-35
SLIDE 35
  • Intel Xeon CPU 2.26 GHz
  • Windows Server 2008
  • 4 datasets
  • Email (enron): 4MB, 11MB, 16MB
  • ≈ every byte is a word
  • Office docs: 8MB, 100MB, 250MB, 500MB
  • Relatively few keywords
  • Media (MP3,WMA, JPG,...): 8MB, 100MB, 250MB, 500MB
  • Barely any keywords
  • Average over 10 executions

Experiments

slide-36
SLIDE 36

STORE

  • Total
  • Email (16MB): 2 mins
  • Office (500MB) :1.5 mins
  • Media (500MB): 30 s
  • Email (16GB): 40/15 hours
  • Distribution
  • Verifiability: 2/3 of cost
  • SSE: 1/3 cost
  • PoS: negl
slide-37
SLIDE 37

SEARCH

  • Total
  • Email (16MB): 0.5 secs
  • Office (500MB): 0.1 secs
  • Media (500MB): 0.025 secs
  • Distribution
  • Client verification: 80%
  • Client decryption: 10%
  • Server search + proof: 10%
slide-38
SLIDE 38

CHECK

  • Total
  • Email (16MB): 12 secs
  • Office (500MB): 12 secs
  • Media (500MB): 12 secs
  • Distribution
  • Server Proof: 95%
  • Client verify: 5%
slide-39
SLIDE 39

ADD

  • Total
  • Email (16MB): 1.5 secs
  • Office (500MB): 1.5 secs
  • Media (500MB): 1.5 secs
  • Distribution
  • Email (16MB)
  • 40% client auth state update
  • 40% server auth update
  • 20% add token
slide-40
SLIDE 40

DELETE

  • Total
  • Email (16MB): 1.5 secs
  • Office (500MB): 0.7 secs
  • Media (500MB): negl
  • Distribution
  • 40% server auth update
  • 40% client auth update
  • 20% server index update
slide-41
SLIDE 41
  • New Crypto
  • Dynamic and CKA2-secure SSE with sub-linear search
  • Sub-linear verifiable computation for search
  • Unbounded dynamic PDP
  • New Protocols
  • Ideal/real-world definitions for secure cloud storage
  • Protocol for standard search
  • Protocol for assisted search
  • Implementation & experiments
  • First experimental results for sub-linear SSE
  • Identified verification as bottleneck
  • Office docs seem to be the best workload

Summary

slide-42
SLIDE 42

Questions?