Revisiting Leakage Abuse Attacks Laura Blackstone Seny Kamara - - PowerPoint PPT Presentation

revisiting leakage abuse attacks
SMART_READER_LITE
LIVE PREVIEW

Revisiting Leakage Abuse Attacks Laura Blackstone Seny Kamara - - PowerPoint PPT Presentation

Revisiting Leakage Abuse Attacks Laura Blackstone Seny Kamara Tarik Moataz AROKI SYSTEMS Encrypted Search Trusted client Untrusted server Cat Fish Cat Dog Dog 2 Encrypted Search Cat Fish Encrypted Trusted client Cat Untrusted


slide-1
SLIDE 1

Revisiting Leakage Abuse Attacks

Tarik Moataz

AROKI
 SYSTEMS

Seny Kamara Laura Blackstone

slide-2
SLIDE 2

Encrypted Search

2

Trusted client

Cat Fish Dog Dog Cat

Untrusted server

slide-3
SLIDE 3

Encrypted Search

3

Untrusted server Trusted client

Encrypted Index

Cat Fish Dog Dog Cat

Secret key

slide-4
SLIDE 4

Encrypted Search

3

Untrusted server Trusted client

Encrypted Index

Cat

Cat Fish Dog Dog Cat

Secret key

slide-5
SLIDE 5

Encrypted Search

3

Untrusted server Trusted client

Encrypted Index

Cat

Cat Fish Dog Dog Cat Dog Cat Cat

Secret key

slide-6
SLIDE 6 4

Untrusted server Trusted client

Encrypted Index

Cat

Cat Fish Dog Dog Cat Dog Cat Cat

Secret key

Encrypted Search

slide-7
SLIDE 7 4

Untrusted server Trusted client

Encrypted Index

Cat

Cat Fish Dog Dog Cat Dog Cat Cat

Secret key

Setup Leakage


LS

Encrypted Search

slide-8
SLIDE 8 4

Untrusted server Trusted client

Encrypted Index

Cat

Cat Fish Dog Dog Cat Dog Cat Cat

Secret key

Setup Leakage


LS

Query Leakage


LQ

Encrypted Search

slide-9
SLIDE 9
  • Query equality pattern (qeq)
  • If and when the search is the same (search pattern)
  • Response identity pattern (rid)
  • The file identifiers matching the query (access pattern)
  • Co-occurrence pattern (co-occ)
  • The number of files shared by any two queries
  • Response length pattern (rlen)
  • The number of files matching a query
  • Volume pattern (vol) / Total volume pattern (tvol)
  • The number of bits of each file / the sum of file sizes in bits
5

Query Leakage Terminology

slide-10
SLIDE 10

Q: do we leak all of these patterns “at once”?

6
slide-11
SLIDE 11 7

Encrypted Search

Primitives

Property-Preserving Encryption (PPE) Fully-Homomorphic Encryption (FHE) Functional Encryption Oblivious RAM (ORAM) Structured Encryption (STE)

slide-12
SLIDE 12 7

Encrypted Search

Primitives

Property-Preserving Encryption (PPE) Fully-Homomorphic Encryption (FHE) Functional Encryption Oblivious RAM (ORAM) Structured Encryption (STE)

slide-13
SLIDE 13 8

Encrypted Search

STE- & ORAM- based schemes

rid vol co-occ qeq tvol rlen

slide-14
SLIDE 14

Baseline STE

8

Encrypted Search

STE- & ORAM- based schemes

rid vol co-occ qeq tvol rlen

slide-15
SLIDE 15

Baseline STE Semi-ORAM

8

Encrypted Search

STE- & ORAM- based schemes

rid vol co-occ qeq tvol rlen

slide-16
SLIDE 16

Baseline STE Semi-ORAM OPQ STE [this work]

8

Encrypted Search

STE- & ORAM- based schemes

rid vol co-occ qeq tvol rlen

slide-17
SLIDE 17

Baseline STE Semi-ORAM OPQ STE [this work] Full ORAM

8

Encrypted Search

STE- & ORAM- based schemes

rid vol co-occ qeq tvol rlen

slide-18
SLIDE 18

Q: can we use the disclosed leakage to recover user’s data?

9
slide-19
SLIDE 19 10

Leakage Attacks

Leakage Attack

One or more leakage pattern Input

  • Type of adversary
  • Type of auxiliary data
  • Type of actions

Assumptions User’s query or data recovery Output

slide-20
SLIDE 20 11

Leakage Attacks

Assumptions

slide-21
SLIDE 21 11

Leakage Attacks

Assumptions

  • Adversarial model
  • persistent: needs encrypted index, documents and queries
  • snapshot: needs encrypted index and documents
slide-22
SLIDE 22 11

Leakage Attacks

Assumptions

  • Adversarial model
  • persistent: needs encrypted index, documents and queries
  • snapshot: needs encrypted index and documents
  • Auxiliary information
  • known sample: needs sample from same distribution
  • known data: needs actual data or/and user queries
  • δ: fraction of adversarially-known data
slide-23
SLIDE 23 11

Leakage Attacks

Assumptions

  • Adversarial model
  • persistent: needs encrypted index, documents and queries
  • snapshot: needs encrypted index and documents
  • Auxiliary information
  • known sample: needs sample from same distribution
  • known data: needs actual data or/and user queries
  • δ: fraction of adversarially-known data
  • Passive vs. active
  • injection (chosen-data): needs to inject data
slide-24
SLIDE 24 12

Leakage Attacks

IKK Attack [Islam-Kuzu-Kantarcioglu12]

IKK Attack

co-occ Input Query recovery Output

slide-25
SLIDE 25 12

Leakage Attacks

IKK Attack [Islam-Kuzu-Kantarcioglu12]

IKK Attack

co-occ Input

  • Persistent adversary
  • Passive
  • Known sample*
  • Known queries

Assumptions Query recovery Output

slide-26
SLIDE 26 12

Leakage Attacks

IKK Attack [Islam-Kuzu-Kantarcioglu12]

IKK Attack

co-occ Input

  • Persistent adversary
  • Passive
  • Known sample*
  • Known queries

Assumptions Query recovery Output Vulnerable schemes

  • Baseline STE
  • Semi-ORAM
slide-27
SLIDE 27 13

Leakage Attacks

Count Attack [Cash-Grubbs-Perry-Ristenpart15]

Count Attack

co-occ + rlen Input Query recovery Output

slide-28
SLIDE 28 13

Leakage Attacks

Count Attack [Cash-Grubbs-Perry-Ristenpart15]

Count Attack

co-occ + rlen Input

  • Persistent adversary
  • Passive
  • Known data

Assumptions Query recovery Output

slide-29
SLIDE 29 13

Leakage Attacks

Count Attack [Cash-Grubbs-Perry-Ristenpart15]

Count Attack

co-occ + rlen Input

  • Persistent adversary
  • Passive
  • Known data

Assumptions Query recovery Output Vulnerable schemes

  • Baseline STE
  • Semi-ORAM
slide-30
SLIDE 30
  • “For example, IKK demonstrated that by observing accesses to an encrypted

email repository, an adversary can infer as much as 80% of the search queries”

  • “It is known that access patterns, to even encrypted data, can leak sensitive

information such as encryption keys [IKK]”

  • “A recent line of attacks […,Count,…] has demonstrated that such access

pattern leakage can be used to recover significant information about data in encrypted indices. For example, some attacks can recover all search queries [Count,…] …”

14

Impact of IKK & Count

slide-31
SLIDE 31

A closer look at IKK & Count attacks

15
slide-32
SLIDE 32

Non-trivial limitations

  • High known-data rates
  • Count v1 requires more than 80% and 5% of the queries
  • IKK requires more than 95% and 5% of the queries
  • Count v2 requires more than 60%
  • Practical vs. Theoretical?
  • Low-vs. high selectivity keywords
  • Experiments all run on high-selectivity keywords
  • Keywords that are frequent in the user’s data
  • Re-ran on low-selectivity keywords and failed
  • Both exploit co-occurrence
  • relatively easy to hide (using OPQ SSE)
16

High- selectivity Pseudo-low selectivity Low selectivity

0.05 0.1 0.15 0.2 2000 4000 6000 8000 10000 Frequency Keywords rank SU dataset M-MU dataset L-MU dataset

(1-2) (10-13) (≥ 13)

slide-33
SLIDE 33

Q: can we de better than IKK & Count?

17
slide-34
SLIDE 34 18

Summary of our Attacks

Known-Data attacks

slide-35
SLIDE 35 18

Summary of our Attacks

Known-Data attacks

SubgrapID Attack

rid

Query recovery

slide-36
SLIDE 36 18

Summary of our Attacks

Known-Data attacks

SubgrapID Attack

rid

Query recovery

Vulnerable schemes

  • Baseline STE
  • Semi-ORAM
slide-37
SLIDE 37 18

Summary of our Attacks

Known-Data attacks

SubgrapID Attack

rid

Query recovery

Vulnerable schemes

  • Baseline STE
  • Semi-ORAM

SubgraphVL Attack

vol

Query recovery

  • Baseline STE
  • Semi-ORAM
slide-38
SLIDE 38 18

Summary of our Attacks

Known-Data attacks

SubgrapID Attack

rid

Query recovery

Vulnerable schemes

  • Baseline STE
  • Semi-ORAM

SubgraphVL Attack

vol

Query recovery

  • Baseline STE
  • Semi-ORAM

VolAn & SelVolAn Attacks

tvol

Query recovery

  • Baseline STE
  • Semi-ORAM
  • OPQ STE
  • Full ORAM
slide-39
SLIDE 39 19

Summary of our Attacks

Injection attacks

Decoding & Binary attacks

tvol

Query recovery

Vulnerable schemes

  • Baseline STE
  • Semi-ORAM
  • OPQ STE
  • Full ORAM

First injection attack was by [Zhang-Katz-Papamanthou16] and 
 works against Baseline STE and Semi-ORAM

slide-40
SLIDE 40

The SubgraphVL Attack

20
slide-41
SLIDE 41

The SubgraphVL Attack

  • Let K⊆ D be set of known documents
  • K = (K2, K4) and D = (D1, …, D4)
21
slide-42
SLIDE 42

The SubgraphVL Attack

  • Let K⊆ D be set of known documents
  • K = (K2, K4) and D = (D1, …, D4)
21

vol(K2) vol(K4) w1 w4 w5 Known Graph

slide-43
SLIDE 43

The SubgraphVL Attack

  • Let K⊆ D be set of known documents
  • K = (K2, K4) and D = (D1, …, D4)
21

vol(K2) vol(K4) w1 w4 w5 Known Graph vol(D1) vol(D2) vol(D3) vol(D4) q1 q2 q3 q4 q5 Observed Graph

slide-44
SLIDE 44

The SubgraphVL Attack

  • We need to match qi to some wj
  • The volumes are the ground of truth
22

vol(D1) vol(D2) vol(D3) vol(D4) vol(K2) vol(K4) w1 w4 w5 Observed Graph Known Graph q1 q2 q3 q4 q5

slide-45
SLIDE 45

The SubgraphVL Attack

  • Observations: if qi = wj then
  • N(wj) ⊆ N(qi) and #N(wj) ≈ δ . #N(qi)
23

vol(D1) vol(D2) vol(D3) vol(D4) q1 q2 q3 q4 q5 vol(K2) vol(K4) w1 w4 w5 Observed Graph Known Graph

slide-46
SLIDE 46

The SubgraphVL Attack

24

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • Each query q starts with a candidate set Cq = 𝕏
  • remove all words s.t. either N(wj) ⊈ N(qi) or #N(wj) ≉ δ . N(qi)

C(q1) ={w4,w5,w1}

C(q4) = {w4,w5,w1} C(q5) ={w4,w5,w1}

Candidate Sets

slide-47
SLIDE 47

The SubgraphVL Attack

24

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • Each query q starts with a candidate set Cq = 𝕏
  • remove all words s.t. either N(wj) ⊈ N(qi) or #N(wj) ≉ δ . N(qi)

C(q1) ={w4,w5,w1}

C(q4) = {w4,w5,w1} C(q5) ={w4,w5,w1}

Candidate Sets

C(q1) ={w1}

slide-48
SLIDE 48

The SubgraphVL Attack

24

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • Each query q starts with a candidate set Cq = 𝕏
  • remove all words s.t. either N(wj) ⊈ N(qi) or #N(wj) ≉ δ . N(qi)

C(q1) ={w4,w5,w1}

C(q4) = {w4,w5,w1} C(q5) ={w4,w5,w1}

Candidate Sets

C(q1) ={w1} C(q4) = {w4}

slide-49
SLIDE 49

The SubgraphVL Attack

24

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • Each query q starts with a candidate set Cq = 𝕏
  • remove all words s.t. either N(wj) ⊈ N(qi) or #N(wj) ≉ δ . N(qi)

C(q1) ={w4,w5,w1}

C(q4) = {w4,w5,w1} C(q5) ={w4,w5,w1}

Candidate Sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

slide-50
SLIDE 50 25

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • If a single word is left that’s the match
  • Remove it from other queries’ candidate sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

Candidate Sets

The SubgraphVL Attack

slide-51
SLIDE 51 25

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • If a single word is left that’s the match
  • Remove it from other queries’ candidate sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

Candidate Sets

The SubgraphVL Attack

slide-52
SLIDE 52 25

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • If a single word is left that’s the match
  • Remove it from other queries’ candidate sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

Candidate Sets

The SubgraphVL Attack

slide-53
SLIDE 53 25

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • If a single word is left that’s the match
  • Remove it from other queries’ candidate sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

Candidate Sets

The SubgraphVL Attack

slide-54
SLIDE 54 25

Observed Graph Known Graph

N(w4) = N(w5) = N(w1) = N(q1) = N(q2) = N(q3) = N(q4) = N(q5) =

  • If a single word is left that’s the match
  • Remove it from other queries’ candidate sets

C(q1) ={w1} C(q4) = {w4} C(q5) ={w4,w5,w1}

Candidate Sets

The SubgraphVL Attack

slide-55
SLIDE 55

Evaluation of our Attacks

Setting

26
  • Enron dataset:
  • ~500K emails
  • Folder for every employee
  • Creation of different document collections
  • One user setting
  • Multiple user setting
  • Size of the query space: 500 & 5000
  • Composition of the query space
  • Query frequency::high, pseudo-low, low
slide-56
SLIDE 56 27 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 Recovery rate Partial Knowledge in % Count-Only VolAn SelVolAn SubgraphID SubgraphVL 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 Recovery rate Partial Knowledge in % Count-Only VolAn SelVolAn SubgraphID SubgraphVL

High-selectivity Low selectivity

Evaluation of our Attacks

Single User - 500 Keywords - Entire composition

slide-57
SLIDE 57 27 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 Recovery rate Partial Knowledge in % Count-Only VolAn SelVolAn SubgraphID SubgraphVL 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 Recovery rate Partial Knowledge in % Count-Only VolAn SelVolAn SubgraphID SubgraphVL

High-selectivity Low selectivity δ < 20%

Evaluation of our Attacks

Single User - 500 Keywords - Entire composition

slide-58
SLIDE 58

Summary of our Attacks

Against Enron Dataset

28

Attack Type Pattern Known Queries

δ for HS

δ for PLS δ for LS IKK known-data co Yes ≥95% ? ? Count known-data rlen Yes/No ≥80% ? ? ZKP injection rid No N/A N/A N/A SubgrapID known-data rid No ≥5% ≥50% ≥60% SubgraphVL known-data vol No ≥5% ≥50%

δ=1

recovers<10% VolAn known-data tvol No ≥85% ≥85%

δ=1

recovers<10% SelVolAn known-data tvol, rlen No ≥80% ≥85%

δ=1

recovers<10% Decoding injection tvol No N/A N/A N/A Binary injection Tvol No N/A N/A N/A

δ needed for RR ≥ 20%

Very theoretical Theoretical Practical

slide-59
SLIDE 59

Takeaways

  • Cryptanalysis in Encrypted search should be more “nuanced” — there is a lot more to learn!
  • Baseline STE is still OK for low-selectivity queries
  • ORAM-based search is also vulnerable to volume-based known-data attacks
  • ORAM-based search is also vulnerable to injection attacks
  • Subgraph attacks are practical for high-selectivity queries
  • need only δ ≥ 5%
  • Countermeasures
  • for δ < 80% use OPQ [this work]
  • for δ ≥ 80% use PBS [Kamara-M-Ohrimenko18] or use VLH or AVLH [Kamara-M19]
29
slide-60
SLIDE 60

Thank you!

https://eprint.iacr.org/2019/1175