Searchable Encryption, Leakage-Abuse Attacks, and Statistical - - PowerPoint PPT Presentation

searchable encryption leakage abuse attacks and
SMART_READER_LITE
LIVE PREVIEW

Searchable Encryption, Leakage-Abuse Attacks, and Statistical - - PowerPoint PPT Presentation

Searchable Encryption, Leakage-Abuse Attacks, and Statistical Learning Theory Paul Grubbs, Marie-Sarah Lacharit, Brice Minaud, Kenny Paterson eprint 2019/011 and IEEE S&P 2019. (also eprint 2018/965, CCS 2018.) AriC crypto seminar, ENS


slide-1
SLIDE 1

AriC crypto seminar, ENS Lyon, 2019 Paul Grubbs, Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson eprint 2019/011 and IEEE S&P 2019. (also eprint 2018/965, CCS 2018.)

Searchable Encryption, Leakage-Abuse Attacks, and Statistical Learning Theory

slide-2
SLIDE 2

Outsourcing Data

2

Data upload Data access

Client Server Sensitive data → encryption needed. An encrypted database is of little use if it cannot be searched. → Searchable Encryption. Examples: Private message server. Company/hospital outsourcing client/patient info.

slide-3
SLIDE 3

Searchable Encryption

3

Client Adversarial Server Adversary: honest-but-curious host server. Security goal: confidentiality of data and queries. Very active topic in research and industry. [AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16],

[LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]… Data upload Data access

slide-4
SLIDE 4

Security Model

4

Generic solutions (FHE) are infeasible at scale → for efficiency reasons, some leakage is allowed. Client Adversarial Server

Data upload Data access

Security model: parametrized by a leakage function L. Server learns nothing except for the output of the leakage function. Server learns L(query, DB)

slide-5
SLIDE 5

Security Model

5

Client Server Query q q

Adversary

Real world Ideal world L Simulator L(q,DB) q

Adversary

slide-6
SLIDE 6

Keyword Search

6

Symmetric Searchable Encryption (SSE) = keyword search:

  • Data = collection of documents. e.g. messages.
  • Serch query = find documents containing given keyword(s).

Efficient solutions for leakage = search pattern + access pattern.

Some active topics:

  • Forward and backward privacy [B16][BMO17][CPPJ18][SYL+18]...
  • Locality [CT14][ANSS16][DPP18]...
slide-7
SLIDE 7

Beyond Keyword Search

7

Data upload Search query Matching records

Client Server For an encrypted database management system:

  • Data = collection of records. e.g. health records.
  • Basic query examples:
  • find records with given value. e.g. patients aged 57.
  • find records within a given range. e.g. patients aged 55-65.
slide-8
SLIDE 8

Range Queries

8

In this talk: range queries.

  • Fundamental for any encrypted DB system.
  • Many constructions out there.
  • Simplest type of query that can't “just” be handled by an index.

Initial solutions: Order-Preserving, Order-Revealing Encryption.

  • Plaintexts are ordered, ciphertexts are ordered.
  • The encryption map preserves order.
slide-9
SLIDE 9

30 60 90 0% 25% 50% 75% 100%

Records below age Age 15

Attacks Exploiting ORE

9

  • “Sorting” attack: if every possible value appears in the DB...

Just sort the ciphertexts and you learn their value!

  • “CDF-matching” attack: say the attacker has an approximation
  • f the Cumulative Distribution Function of DB values...

3 11 5 1 8 7 10 6 2 4 9 1 2 3 4 5 6 7 8 9 10 11

slide-10
SLIDE 10

Leakage-Abuse Attacks

10

→ “Second-generation” schemes enable range queries without relying on OPE/ORE. “Leakage-abuse attacks” (coined by Cash et al. CCS'15):

  • Do not contradict security proofs.
  • Can be devastating in practice.

ORE: order information can be used to infer (approximate) values. Leaking order is too revealing.

slide-11
SLIDE 11

Range Queries

11

Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

What can the server learn from the above leakage? SE schemes supporting range queries are proven secure w.r.t. a leakage function including access pattern leakage.

slide-12
SLIDE 12

Database Reconstruction

12

Let N = number of possible values for the target attribute. Strongest goal: full database reconstruction = recovering the exact value of every record. More general: approximate database reconstruction = recovering all values within εN.

ε = 0.05 is recovery within 5%. ε = 1/N is full recovery.

[KKNO16]: full reconstruction in O(N 4 log N) queries, assuming i.i.d. uniform queries! (“Sacrificial” recovery: values very close to 1 and N are excluded.)

slide-13
SLIDE 13

Database Reconstruction

13

[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], [LMP18]):

  • O(ε-4 log ε-1) for approx. reconstruction.
  • O(ε-2 log ε-1) with very mild hypothesis.
  • O(ε-1 log ε-1) for approx. order rec.
  • Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1)

recovers implies

Full reconstruction in O(N log N) for dense DBs. Scale-free: does not depend on size of DB or number of possible values. → Recovering all values in DB within 5% costs O(1) queries!

slide-14
SLIDE 14

Database Reconstruction

14

[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):

  • O(ε-4 log ε-1) for approx. reconstruction.
  • O(ε-2 log ε-1) with very mild hypothesis.
  • O(ε-1 log ε-1) for approx. order rec.
  • Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1) This talk. Main tool:

  • connection with statistical learning theory;
  • especially, VC theory.
slide-15
SLIDE 15

VC Theory

C

slide-16
SLIDE 16

VC Theory

16

Foundational paper: Vapnik and Chervonenkis, 1971. Uniform convergence result. Now a foundation of learning theory, especially PAC (probably approximately correct) learning. Wide applicability. Fairly easy to state/use.

(You don't have to read the original article in Russian.)

slide-17
SLIDE 17

Warm-up

17

Set X with probability distribution D. Let C ⊆ X. Call it a concept. X C Sample complexity: to measure Pr(C) within ε, you need O(1/ε2) samples.

Pr(C) ≈ #points in C #points total

<latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit>
slide-18
SLIDE 18

Approximating a Concept Set

18

X Now: set 𝓓 of concepts. Goal: approximate their probabilities simultaneously. The set of samples drawn from X is an ε-sample iff for all C in 𝓓:

  • Pr(C) − #points in C

#points total

  • ≤ ✏
<latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit>
slide-19
SLIDE 19

ε-sample Theorem

19

X Union bound: yields a sample complexity that depends on |𝓓|. How many samples do we need to get an ε-sample whp? V & C 1971: If 𝓓 has VC dimension d, then the number of points to get an ε-sample whp is

O( d ✏2 log d ✏ ).

<latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit>

Does not depend on |𝓓|!

slide-20
SLIDE 20

VC Dimension

20

Remaining Q: what is the VC dimension? A set of points is shattered by 𝓓 iff: every subset of S is equal to C∩S for some C in 𝓓.

  • Example. Take 2 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subsets:

  • OK. Range A.

A

  • OK. Range B.

B

  • OK. Range C.

C

  • OK. Range D.

D 2 points = SHATTERED

slide-21
SLIDE 21

VC Dimension

21

  • Example. Take 3 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subset: Problem. 3 points = NOT SHATTERED VC dimension of 𝓓 = largest cardinality of a set of points in X that is shattered by 𝓓. E.g. VC dimension of ranges is 2. What typically matters is just that VC dim is finite.

slide-22
SLIDE 22

Database Reconstruction

slide-23
SLIDE 23

KKNO16-like Attack

23

1 N Less probable More probable Assume a uniform distribution on range queries. Idea: for each record...

  • 1. Count frequency at which the record is hit.

→ gives estimate of probability it’s hit by uniform query.

  • 2. deduce estimate of its value by “inverting” f.

values f

Induces a distribution f on the prob. that a given value is hit.

slide-24
SLIDE 24

KKNO16-like Attack

24

1 N Step 1: for all records, estimate prob of the record being hit. This is an ε-sample! X = ranges 𝓓 ={{ranges ∋ x}: x ∈ [1,N]} so we need O(ε-2 log ε-1) queries. Step 2: because f is quadratic, “inverting” f adds a square.

f values

After O(ε-4 log ε-1) queries, the value of all records is recovered within εN.

slide-25
SLIDE 25

On the i.i.d. Assumption

25

We are assuming uniformly distributed queries. In reality we are assuming:

  • The advesary knows the query distribution.
  • Queries are uniform.
  • More fundamentally, queries are independent and

identically distributed (i.i.d.). This is not realistic. What can we learn without that hypothesis?

slide-26
SLIDE 26

Order Reconstruction

P Q ... ...

slide-27
SLIDE 27

Problem Statement

27

Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

This time we don't assume i.i.d. queries, or knowledge of their distribution. What can the server learn from the above leakage?

slide-28
SLIDE 28

Range Query Leakage

28

Query A matches records a, b, c. Query B matches records b, c, d.

→ we learn that records b, c are between a and d. We learn something about the order of records. Then this is the only configuration (up to symmetry)! N A a b c d B

slide-29
SLIDE 29

Range Query Leakage

29

Query A matches records a, b, c. Query B matches records b, c, d. Query C matches records c, d.

Then the only possible order is a, b, c, d (or d, c, b, a)! N A a b c d B C Challenges:

  • How do we extract order information? (What algorithm?)
  • How do we quantify and analyze how fast order is

learned as more queries are observed?

slide-30
SLIDE 30

Challenge 1: the Algorithm

30

Short answer: there is already an algorithm! X: linearly ordered set. Order is unknown. You are given a set S containing some intervals in X. A PQ tree is a compact (linear in |X|) representation of the set of all permutations of X that are compatible with S. Long answer: PQ-trees.

Note: was used in [DR13], didn’t target reconstruction.

Can be updated in linear time.

slide-31
SLIDE 31

PQ Trees

31

P a b c Order is completely unknown.

  • any permutation of abc.

a b c Q Order is completely known (up to reflection).

  • abc’or ‘cba’.

P d e a b c Q Combines in the natural way.

  • ‘abcde’, ‘abced’, ‘dabce’, ‘eabcd’,

‘deabc’, ‘edabc’, ‘cbade’ etc.

slide-32
SLIDE 32

Full Order Reconstruction

32

P No information r1 r2 r3 … … … … Q r1 r2 r3 Full reconstruction

  • bserve enough queries

We want to quantify order learning...

slide-33
SLIDE 33

… …

Challenge 2a: Quantify Order Learning

33

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction ε-Approximate order reconstruction. Roughly: we learn the order between two records as soon as their values are ≥ εN apart. (ε = 1/N is full reconstruction)

slide-34
SLIDE 34

… …

Approximate Order Reconstruction

34

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q Diameter ≤ εN … … … ε-Approximate reconstruction #queries? #queries?

slide-35
SLIDE 35

Challenge 2b: Analyze Query Complexity

35

Intuition: if no query has an endpoint between a and b, then a and b can't be separated. → ε-approximate reconstruction is impossible. N A a b c d εN You want a query endpoint to hit every interval ≥ εN. Conversely with some other conditions it's enough.

Heavy sweeping of details under rug.

slide-36
SLIDE 36

VC Theory Saves the Day (again)

36

➞ Number of points to get an ε-net whp:

O ⇣d ✏ log d ✏ ⌘

<latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit>

The set of samples drawn from X is an ε-net iff for all C in 𝓓:

Pr(C) ≥ ✏ ⇒ C contains a sample

<latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit>

ε-samples: the ratio of points hitting each concept is close to its probability. What we want now: if a concept has high enough probability, it is hit by at least one point.

slide-37
SLIDE 37

… …

Approximate Order Reconstruction

37

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q … … … ε-Approximate reconstruction O(N log N) queries O(ε-1 log ε-1) queries

Note: some (weak) assumptions are swept under the rug.

Conclusion: learn order very quickly. Almost back to ORE...

slide-38
SLIDE 38

Experiments

38

100 200 300 400 500 Number of queries 0.00 0.02 0.04 0.06 0.08 0.10 0.12 (as a fraction of N) ✏−1 log ✏−1 ✏−1 log ✏−1

ApproxOrder experimental results R = 1000, compared to theoretical ✏-net bound

N = 100 N = 1000 1000 N = 10000 N = 100000

  • Max. bucket diameter
slide-39
SLIDE 39

Volume Leakage

7 1 13 3 11 8 10 20

slide-40
SLIDE 40

Problem Statement

40

Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

What can the server learn from the above leakage? Attacker only sees volumes = number of records matching each query.

2 matches

slide-41
SLIDE 41

Volumes

41

3 7 1 12 1 2 3 4

Value Counts A volume = number of records matching some range.

8 13

Some volumes The attacker wants to learn exact counts.

slide-42
SLIDE 42

Elementary Volumes

42

3 7 1 12 1 2 3 4

Value Counts

3 10 11 23

“Elementary” ranges Elementary volumes = volumes of ranges [1,1], [1,2], [1,3]...

slide-43
SLIDE 43

Elementary Volumes

43

3 7 1 12 1 2 3 4

Value Counts

  • Knowing set of elementary volumes ⇔ knowing counts.

vol([a,b]) = vol([1,b]) - vol([1,a])

  • Every volume is = difference of two elementary volumes.

so... Fact: Our goal: finding elementary volumes.

slide-44
SLIDE 44

The Attack

44

Assumption: the volumes of all queries are observed.

7 12 23 1 13 3 11 8 10 20

Draw an edge between volumes a and b iff |b-a| is a volume.

7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20

slide-45
SLIDE 45

Summary

45

Attack: elementary volumes form a clique in the volume graph → clique-finding algorithm reveals them. For structured queries, even just volume leakage can be quite damaging. Attack requires strong assumption. In the article:

  • Pre-processing to avoid clique finding.
  • Analysis of parameters + experiments.
  • Other attacks.
slide-46
SLIDE 46

Closing Remarks

slide-47
SLIDE 47

On Range Queries

47

Access pattern: severe attacks under minimal assumptions. Please don't use OPE/ORE. Also avoid current encrypted DBs if you don't trust the server and care about privacy. New solutions needed. E.g. efficient specialized ORAMs. Even then, need to hide volumes. Many open problems...

slide-48
SLIDE 48

Connection to Machine Learning

48

  • In this talk: VC theory.
  • In the article: known query setting = PAC learning.
  • Some results for general query classes.

Machine learning in crypto: also used for side channel

  • attacks. Same general setting!

Natural connection between reconstructing secret information from leakage and machine learning. Seems to be a powerful tool to understand the security implications of leakage. In side channels - use learning algorithms; here - use learning theory.