Improved reconstruction attacks using range query leakage - - PowerPoint PPT Presentation

improved reconstruction attacks using range query leakage
SMART_READER_LITE
LIVE PREVIEW

Improved reconstruction attacks using range query leakage - - PowerPoint PPT Presentation

Improved reconstruction attacks using range query leakage Marie-Sarah Lacharit Brice Minaud Kenny Paterson Information Security Group Application Setting Storing Records in the Cloud value of record ( N possible values) record identifier


slide-1
SLIDE 1

Improved reconstruction attacks using range query leakage

Marie-Sarah Lacharité Brice Minaud Kenny Paterson Information Security Group

slide-2
SLIDE 2

Application Setting

slide-3
SLIDE 3

Storing Records in the Cloud

3 value of record (N possible values) record identifier (unique) R records

slide-4
SLIDE 4

4

give me all records with values in the range [1975, 1979]

client

Application Scenario

slide-5
SLIDE 5

Access Pattern Leakage

5

give me all records with values in the range [1975, 1979] record identifiers

client

OPE, ORE schemes, POPE, [HK16], Blind seer, [Lu12], [FJKNRS15],…

slide-6
SLIDE 6

Access Pattern Leakage and Rank Leakage

6

give me all records with values in the range [1975, 1979]

client

record identifiers b a+1 rank FH-OPE, Lewi-Wu, Arx, Cipherbase, EncKV,…

slide-7
SLIDE 7

Assumptions

  • 1. Data is dense: all values appear in at least one record.
  • 2. Queries are uniformly distributed.

Target: full reconstruction: find the value associated with each record. Best previous result (Kellaris et al., CCS 2016): Full reconstruction by analysing access pattern leakage from O(N2logN) queries.

7

slide-8
SLIDE 8

Our Main Results (eprint 2017/701)

  • Full reconstruction with O(NlogN) queries

– in fact, expected N · (3 + log N).

  • Approximate reconstruction with relative accuracy ε from

O(N · (log 1/ε)) queries – in fact, expected 5/4 · N · (log 1/ε) + O(N).

  • Approximate reconstruction using an auxiliary distribution and

rank leakage. – more efficient in practice, evaluation via simulation. – applies in the non-dense case too, giving a new attack on OPE/ORE schemes.

8

slide-9
SLIDE 9

Uniform Queries: Uniform Endpoints vs. Uniform Ranges (N=10)

9

(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) (2, 9) (2, 10) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (3, 9) (3, 10) (4, 4) (4, 5) (4, 6) (4, 7) (4, 8) (4, 9) (4, 10) (5, 5) (5, 6) (5, 7) (5, 8) (5, 9) (5, 10) (6, 6) (6, 7) (6, 8) (6, 9) (6, 10) (7, 7) (7, 8) (7, 9) (7, 10) (8, 8) (8, 9) (8, 10) (9, 9) (9, 10) (10, 10)

Uniform endpoints Uniform ranges

slide-10
SLIDE 10

Distribution of Left Endpoints: Uniform Endpoints vs. Uniform Ranges (N=10)

1 2 3 4 5 6 7 8 9 10 Uniform endpoints Uniform ranges

10

slide-11
SLIDE 11

Coupon Collector’s Problem

11

100 200 300 400 500 600 700 800 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 Expected number

  • f draws

Number of coupons (N) N · (1 + log N) N · H(N)

slide-12
SLIDE 12

Coupon Collector’s Problem

100 200 300 400 500 600 700 800 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 Expected number

  • f draws

Number of coupons (N) N · (1 + log N) N · H(N)

12

slide-13
SLIDE 13

Attack 1: Full Reconstruction

slide-14
SLIDE 14

Motivating Example (with Rank Leakage)

  • Suppose left endpoints of query intervals are chosen uniformly at random.
  • Wish to observe at least 1 query with each of the N possible left endpoints.
  • Expected number of queries needed is at most N · (1 + log N).

14

relabelled for convenience

hidden leaked [x,y] a = rank(x-1) b = rank(y) matching IDs [20,25] 1300 1500 M20 [1,18] 1200 M1 [55,125] 3100 4400 M55 [2,10] 500 800 M2 [7,98] 700 4200 M7

slide-15
SLIDE 15

Motivating Example (with Rank Leakage)

15 M1 M2 M3

1 501 … … 4400 …. rank

M1 – Ui >1 Mi M2 – Ui >2 Mi MN-1 – MN MN

slide-16
SLIDE 16

Full Reconstruction (with Rank Leakage)

  • Now suppose queries have ranges chosen uniformly at random.
  • We present a data-optimal algorithm (fails ð full reconstruction is

impossible).

  • Expected number of sufficient queries is at most

N · (2 + log N) for N ≥ 27.

  • Main idea: partition, then sort (easy with rank leakage, harder

without).

  • Expected number of necessary queries is at least

1/2 · N · log N – O(N) for any algorithm.

16

slide-17
SLIDE 17

10000 20000 30000 40000 50000 60000 70000 80000 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 Expected number

  • f queries

Number of coupons (N) KKNO16 This work

O(N2logN) 17

Full Reconstruction (with Rank Leakage)

slide-18
SLIDE 18

Full Reconstruction (with Rank Leakage): Partitioning Step

18

record ID matched query? 1 2 3 4 5 6 7 20 ü ü û û ü û û 23 ü ü û û ü ü ü 29 û ü ü û û ü û 89 û ü ü û ü ü û 193 ü ü û û ü ü ü …

  • Equality of matching defines a partition of records.
  • Records in same class of partition cannot be distinguished.
  • For complete reconstruction, we need N classes – one class

per value.

slide-19
SLIDE 19

Full Reconstruction (with Rank Leakage): Partitioning Step

19

record ID matched query? 1 2 3 4 5 6 7 20 ü ü û û ü û û 23 ü

[1,100]

ü

[18,82]

û û ü

[16,96]

ü

[16,30]

ü

[21,61]

29 û ü ü û û ü û 89 û ü ü û ü ü û 193 ü ü û û ü ü ü … Can also deduce from rank leakage that, e.g., records 23 and 193 have ranks in [21,30], by intersecting rank intervals.

slide-20
SLIDE 20

Full Reconstruction (with Rank Leakage): Partitioning Step

20 records 23 and 193 (and more) Order partition into N classes by rank Ranks [21,30]

1 2 4 3 5 6

slide-21
SLIDE 21

Full Reconstruction (with Rank Leakage): Proof Intuition

  • Hard part is to show that O(N log N) queries suffice with a small

constant.

  • Proof consists of showing that if certain favourable queries are made,

then partitioning succeeds in constructing N classes.

  • Roughly speaking, for our proof we hope for queries on ranges:
  • 1. [x,*] for all 1 ≤ x ≤ N/2 (left coupons)
  • 2. [*,y] for all N/2+1 ≤ y ≤ N (right coupons)
  • 3. [N/2+1,y] and [x,N] for some y ≥ x.
  • Assuming these all arise, then a combinatorial argument establishes

the success of the partitioning step.

  • First two cases are essentially a pair of coupon collector problems –

success with high probability with O(N log N) queries.

  • Third case is a high probability event: 1 - e-Q/(2N+2) for Q queries.

21

slide-22
SLIDE 22

Full Reconstruction (without Rank Leakage)

  • Can only recover values up to reflection.
  • Data-optimal algorithm (fails _ full reconstruction is impossible).
  • Expected number of sufficient queries is at most

N · (3 + log N) for N ≥ 26

  • Partition (as before), then sort*.
  • Expected number of necessary queries is at least

1/2 · N · log N – O(N)

  • for any algorithm.

*Not quite.

22

slide-23
SLIDE 23

Full Reconstruction (without Rank Leakage): Sorting Step

23 M7 M39 M72 M36 M93 M58 M28 M9 M40 M18

all records 1 or N Interval of size N-1

slide-24
SLIDE 24

Full Reconstruction (without Rank Leakage): Sorting Step – Extending

24 M22 M36 M25

all records

M17 T T M62 M81 T

slide-25
SLIDE 25

Full Reconstruction (without Rank Leakage): Sorting Step – Extending

25

all records

slide-26
SLIDE 26

T

Full Reconstruction (without Rank Leakage): Sorting Step

26

all records

M27 M39 M3 M13 T M52 T M99

slide-27
SLIDE 27

Full Reconstruction (without Rank Leakage): Sorting Step

27

all records …

slide-28
SLIDE 28

Full Reconstruction (without Rank Leakage): Proof Intuition

  • Hard part is again to show that O(N log N) queries suffice, with a

small constant.

  • Proof again consists of showing that if certain favourable queries

are made, then partitioning succeeds in constructing N classes.

  • Coupon collecting bounds then establish that O(N log N) queries

are enough.

28

slide-29
SLIDE 29

Attack 2: Approximate Reconstruction

slide-30
SLIDE 30

Approximate Reconstruction Attack (without Rank Leakage)

  • Recover values up to reflection and with relative error ε.
  • Expected number of sufficient queries is

5/4 · N · (log 1/ε) + O(N).

  • Expected number of necessary queries is at least

1/2 · N · (log 1/ε) – O(N) for any algorithm.

  • Not data-optimal without rank leakage (but is with it)

30

slide-31
SLIDE 31

Coupon Collection (N=125)

31

100 200 300 400 500 600 700 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 Expected number of draws Coupon number (n)

Collecting n of 125 coupons

slide-32
SLIDE 32

Coupon Collection (N=125)

32

100 200 300 400 500 600 700 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 Expected number of draws Coupon number (n)

Collecting fraction (1-ε) of 125 coupons

ε = 0.04 ε = 0.08 ε = 0.12 ε = 0.16 ε = 0.2

slide-33
SLIDE 33

Approximate Reconstruction: Old Partitioning Method Doesn't Work

33

all records

M7 M39 M72 M36 M93 M58 M28 M9 M40 M18

slide-34
SLIDE 34

Approximate Reconstruction: Partitioning Step

34

  • 1. Pick any record r.
slide-35
SLIDE 35

Approximate Reconstruction: Partitioning Step

35

  • 2. Intersect all queries matching r to get M.
slide-36
SLIDE 36

Approximate Reconstruction: Partitioning Step

36

  • 2. Intersect all queries matching r to get M.

M

slide-37
SLIDE 37

Approximate Reconstruction: Partitioning Step

37

  • 3. Find qL and qR : qL ∩ qR = M and |qL U qR| maximised.

M qR qL

slide-38
SLIDE 38

Approximate Reconstruction: Partitioning Step

38

  • 4. Find q'L : qL ∩ q'L ≠ ∅, q'L ∩ qR ⊆ M, |qL U q'L | maximised.

qR qL M q'L

slide-39
SLIDE 39

Approximate Reconstruction: Partitioning Step

39

  • 5. Find q'R : qR ∩ q'R ≠ ∅, q'R ∩ qL ⊆ M, |qR U q'R | maximised.

qR qL M q'L q'R

slide-40
SLIDE 40

Approximate Reconstruction: Partitioning Step

40

  • 6. Start over if not every record is in qL U q'L U qR U q’R.

qR qL q'L q'R M

slide-41
SLIDE 41

Approximate Reconstruction: Partitioning Step

41

halfL halfR M

  • 7. Split into halfL = qL U q'L, halfR = qL U q'L, and M.
slide-42
SLIDE 42

Approximate Reconstruction: Partitioning Step

42

halfR \ M M halfL \ M

  • 7. Split into halfL = qL U q'L, halfR = qL U q'L, and M.
slide-43
SLIDE 43

Approximate Reconstruction: Sorting Step

43

  • 8. Form left & right coupons with queries containing M.

halfR \ M halfL \ M M

1 N

nR right coupons nL left coupons

slide-44
SLIDE 44

Approximate Reconstruction: Sorting Step

44

  • 9. Use left & right coupons to sort halfL \ M & halfR \ M.

1 N

halfL \ M M halfR \ M

slide-45
SLIDE 45

Approximate Reconstruction: Sorting Step

45

  • 9. Use left & right coupons to sort halfL \ M & halfR \ M.

1 N

nL + 1 + nR = (1-ε) · N ò reconstruction with precision ε · N halfL \ M M halfR \ M

slide-46
SLIDE 46

Attack 3: Reconstruction with Auxiliary Data

slide-47
SLIDE 47

Reconstruction with Auxiliary Data and Rank Leakage

  • As before, queries have ranges chosen uniformly at random.
  • Assume access pattern and rank are leaked.
  • We now also assume that an approximation to the

distribution on values is known.

  • “Auxiliary data”.
  • From aggregate data, or from another reference source.
  • We show experimentally that, under these assumptions, far

fewer queries are needed.

  • Now no requirement on density, so interesting for OPE and

ORE schemes too (OPE/ORE schemes are trivial to break in dense case).

47

slide-48
SLIDE 48

48

Auxiliary Data Attack: Partitioning Step

record position

group of all records appearing in exact same subset of queries

  • 1. Partition records as in full reconstruction attack.

1 R

slide-49
SLIDE 49

49

Auxiliary Data Attack: Partitioning Step

intersect leaked rank intervals to get position interval

record position 1 R

  • 2. Assign a position interval to each partition.

a+1 b

slide-50
SLIDE 50

50

Auxiliary Data Attack: Estimating Step

  • 3. Assign a value to each group's position interval

record position 1 R a+1 b rank-1(a) + 1 rank-1(b) value 1 x y expected value restricted to [x,y] point guess v Inverse CDF

  • f auxiliary

distribution

slide-51
SLIDE 51

Auxiliary Data Attack: Experimental Evaluation

  • Ages, N = 125 (0 to 124).
  • Health records from US hospitals (NIS HCUP 2009).
  • Target data: individual hospitals' records.
  • Auxiliary data: aggregate of 200 hospitals' records.
  • Measure of success: proportion of records with value guessed

within ε.

51

slide-52
SLIDE 52

Auxiliary Data Attack: Asymptotic Success Rates for Different Target Hospitals

52

95% of records within 1.25 years! All records

  • ff by > 20

years.

slide-53
SLIDE 53

Auxiliary Data Attack: Results for Typical Target Hospital

53

slide-54
SLIDE 54

Auxiliary Data Attack: Results with Perfect Auxiliary Distribution

54

slide-55
SLIDE 55

Auxiliary Data Attack: Removing Assumptions

  • Estimating total number of records is fast if not known a priori
  • Learning set of record identifiers can be slow if not known a priori:

55

slide-56
SLIDE 56

Auxiliary Data Attack: Removing Assumptions

56

known record IDs unknown record IDs approximate auxiliary info. exact auxiliary info.

slide-57
SLIDE 57

Summary and Conclusions

slide-58
SLIDE 58

Summary of Our Attacks

58

Attack Req'd leakage Other req'ts

  • Suff. # queries

Full AP + rank Density N · (log N + 2) AP Density N · (log N + 3) ε-approximate AP Density 5/4 N · (log 1/ε) + O(N) Auxiliary AP + rank Auxiliary dist. ???

slide-59
SLIDE 59

Conclusions

59

  • Many clever schemes have been designed, enabling range queries on

encrypted data:

  • OPE, ORE schemes.
  • POPE, [HK16],…
  • Blind seer, [Lu12], [FJKNRS15],…
  • FH-OPE, Lewi-Wu, Arx, Cipherbase, EncKV,…
  • These schemes are surprisingly vulnerable to attack in realistic setting

(density + uniform queries + access pattern leakage): O(NlogN) queries suffice!

  • Even more severe attacks are possible when auxiliary distribution + rank

leakage is available.

  • Read more at eprint 2017/701.