Learning to Reconstruct Statistical Learning Theory and Encrypted - - PowerPoint PPT Presentation

learning to reconstruct statistical learning theory and
SMART_READER_LITE
LIVE PREVIEW

Learning to Reconstruct Statistical Learning Theory and Encrypted - - PowerPoint PPT Presentation

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs, Marie-Sarah Lacharit, Brice Minaud, Kenny Paterson CASYS team seminar, Grenoble, December 2018 Outsourcing Data with Search Capabilities Data


slide-1
SLIDE 1

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks

Paul Grubbs, Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson CASYS team seminar, Grenoble, December 2018

slide-2
SLIDE 2

Outsourcing Data with Search Capabilities

2

Data upload Search query Matching records

Client Server For an encrypted database management system:

  • Data = collection of records. e.g. health records.
  • Basic query examples:
  • find records with given value. e.g. patients aged 57.
  • find records within a given range. e.g. patients aged 55-65.
slide-3
SLIDE 3

Searchable Encryption

3

Search query Matching records

Client Adversarial Server Adversary: honest-but-curious host server. Security goal: privacy of data and queries. Very active topic in research and industry. [AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16],

[LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]…

slide-4
SLIDE 4

Security Model of Searchable Encryption

4

Search query Matching records

Client Adversarial Server Server learns F(query, DB) Generic solutions (FHE) are infeasible at scale → for efficiency reasons, some leakage is allowed. Security model: parametrized by a leakage function. Server learns nothing except for the output of the leakage function.

slide-5
SLIDE 5

Implications of Leakage Function

5

In practice: nearly all practical schemes leak at least the set of records matching each query = access pattern leakage.

OPE, ORE schemes, POPE, [HK16], BlindSeer, [Lu12], [FJ+15], FH- OPE, Lewi-Wu, Arx, Cipherbase, EncKV, …

What are the security implications of this leakage? In this talk: focus on range queries.

  • Fundamental for any encrypted DB system.
  • Many constructions out there.
  • Simplest type of query that can't “just” be handled by an index

(cf. Symmetric Searchable Encryption).

slide-6
SLIDE 6

Range Queries

6

Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

Let's specify the problem: say records take N possible values, and queries are uniformly distributed. What can the server learn from the above leakage?

slide-7
SLIDE 7

Database Reconstruction

7

Strongest goal: full database reconstruction = recovering the exact value of every record. More general: approximate database reconstruction = recovering all values within εN.

ε = 1/N is full recovery. ε = 0.05 is recovery within 5%.

[KKNO16]: full reconstruction in O(N 4 log N) queries, assuming i.i.d. uniform queries! (“Sacrificial” recovery: values very close to 1 and N are excluded.)

slide-8
SLIDE 8

Database Reconstruction

8

[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):

  • O(ε-4 log ε-1) for approx. reconstruction.
  • O(ε-2 log ε-1) with very mild hypothesis.
  • O(ε-1 log ε-1) for approx. order rec.
  • Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1)

recovers implies

Full reconstruction in O(N log N) for dense DBs. Scale-free: does not depend on size of DB or number of possible values. → Recovering all values in DB within 5% costs O(1) queries!

slide-9
SLIDE 9

Database Reconstruction

9

[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):

  • O(ε-4 log ε-1) for approx. reconstruction.
  • O(ε-2 log ε-1) with very mild hypothesis.
  • O(ε-1 log ε-1) for approx. order rec.
  • Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1) This talk. Main tool:

  • connection with statistical learning theory;
  • especially, VC theory.
slide-10
SLIDE 10

VC Theory

C

slide-11
SLIDE 11

VC Theory

11

Foundational paper: Vapnik and Chervonenkis, 1971. Uniform convergence result. Now a foundation of learning theory, especially PAC (probably approximately correct) learning. Wide applicability. Fairly easy to state/use.

(You don't have to read the original article in Russian.)

slide-12
SLIDE 12

Warm-up

12

Set X with probability distribution D. Let C ⊆ X. Call it a concept. X C Sample complexity: to measure Pr(C) within ε, you need O(1/ε2) samples.

Pr(C) ≈ #points in C #points total

<latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit>
slide-13
SLIDE 13

Approximating a Concept Set

13

X Now: set 𝓓 of concepts. Goal: approximate their probabilities simultaneously. The set of samples drawn from X is an ε-sample iff for all C in 𝓓:

  • Pr(C) − #points in C

#points total

  • ≤ ✏
<latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit>
slide-14
SLIDE 14

ε-sample Theorem

14

X Union bound: yields a sample complexity that depends on |𝓓|. How many samples do we need to get an ε-sample whp? V & C 1971: If 𝓓 has VC dimension d, then the number of points to get an ε-sample whp is

O( d ✏2 log d ✏ ).

<latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit>

Does not depend on |𝓓|!

slide-15
SLIDE 15

VC Dimension

15

Remaining Q: what is the VC dimension? A set of points is shattered by 𝓓 iff: every subset of S is equal to C∩S for some C in 𝓓.

  • Example. Take 2 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subsets:

  • OK. Range A.

A

  • OK. Range B.

B

  • OK. Range C.

C

  • OK. Range D.

D 2 points = SHATTERED

slide-16
SLIDE 16

VC Dimension

16

  • Example. Take 3 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subset: Problem. 3 points = NOT SHATTERED VC dimension of 𝓓 = largest integer d such that every set

  • f d points in X is shattered.

E.g. VC dimension of ranges is 2. What typically matters is just that VC dim is finite.

slide-17
SLIDE 17

Database Reconstruction

slide-18
SLIDE 18

KKNO16-like Attack

18

1 N Less probable More probable Assume a uniform distribution on range queries. Idea: for each record...

  • 1. Count frequency at which the record is hit.

→ gives estimate of probability it’s hit by uniform query.

  • 2. deduce estimate of its value by “inverting” f.

values f

Induces a distribution f on the prob. that a given value is hit.

slide-19
SLIDE 19

KKNO16-like Attack

19

1 N Step 1: for all records, estimate prob of the record being hit. This is an ε-sample! X = ranges 𝓓 ={{ranges ∋ x}: x ∈ [1,N]} so we need O(ε-2 log ε-1) queries. Step 2: because f is quadratic, “inverting” f adds a square.

f values

After O(ε-4 log ε-1) queries, the value of all records is recovered within εN.

slide-20
SLIDE 20

On the i.i.d. Assumption

20

We are assuming uniformly distributed queries. In reality we are assuming:

  • The advesary knows the query distribution.
  • Queries are uniform.
  • More fundamentally, queries are independent and

identically distributed (i.i.d.). This is not realistic. What can we learn without that hypothesis?

slide-21
SLIDE 21

Order Reconstruction

P Q ... ...

slide-22
SLIDE 22

Problem Statement

22

Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

This time we don't assume i.i.d. queries, or knowledge of their distribution. What can the server learn from the above leakage?

slide-23
SLIDE 23

Range Query Leakage

23

Query A matches records a, b, c. Query B matches records b, c, d.

→ we learn that records b, c are between a and d. We learn something about the order of records. Then this is the only configuration (up to symmetry)! N A a b c d B

slide-24
SLIDE 24

Range Query Leakage

24

Query A matches records a, b, c. Query B matches records b, c, d. Query C matches records c, d.

Then the only possible order is a, b, c, d (or d, c, b, a)! N A a b c d B C Challenges:

  • How do we extract order information? (What algorithm?)
  • How do we quantify and analyze how fast order is

learned as more queries are observed?

slide-25
SLIDE 25

Challenge 1: the Algorithm

25

Short answer: there is already an algorithm! X: linearly ordered set. Order is unknown. You are given a set S containing some intervals in X. A PQ tree is a compact (linear in |X|) representation of the set of all permutations of X that are compatible with S. Long answer: PQ-trees.

Note: was used in [DR13], didn’t target reconstruction.

Can be updated in linear time.

slide-26
SLIDE 26

PQ Trees

26

P a b c Order is completely unknown.

  • any permutation of abc.

a b c Q Order is completely known (up to reflection).

  • abc’or ‘cba’.

P d e a b c Q Combines in the natural way.

  • ‘abcde’, ‘abced’, ‘dabce’, ‘eabcd’,

‘deabc’, ‘edabc’, ‘cbade’ etc.

slide-27
SLIDE 27

Full Order Reconstruction

27

P No information r1 r2 r3 … … … … Q r1 r2 r3 Full reconstruction

  • bserve enough queries

We want to quantify order learning...

slide-28
SLIDE 28

… …

Challenge 2a: Quantify Order Learning

28

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction ε-Approximate order reconstruction. Roughly: we learn the order between two records as soon as their values are ≥ εN apart. (ε = 1/N is full reconstruction)

slide-29
SLIDE 29

… …

Approximate Order Reconstruction

29

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q Diameter ≤ εN … … … ε-Approximate reconstruction #queries? #queries?

slide-30
SLIDE 30

Challenge 2b: Analyze Query Complexity

30

Intuition: if no query has en endpoint between a and b, then a and b can't be separated. → ε-approximate reconstruction is impossible. N A a b c d εN You want a query endpoint to hit every interval ≥ εN. Conversely with some other conditions it's enough.

Heavy sweeping of details under rug.

slide-31
SLIDE 31

VC Theory Saves the Day (again)

31

➞ Number of points to get an ε-net whp:

O ⇣d ✏ log d ✏ ⌘

<latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit>

The set of samples drawn from X is an ε-net iff for all C in 𝓓:

Pr(C) ≥ ✏ ⇒ C contains a sample

<latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit>

ε-samples: the ratio of points hitting each concept is close to its probability. What we want now: if a concept has high enough probabiliy, it is hit by at least one point.

slide-32
SLIDE 32

… …

Approximate Order Reconstruction

32

P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q … … … ε-Approximate reconstruction O(N log N) queries O(ε-1 log ε-1) queries

Note: some (weak) assumptions are swept under the rug.

slide-33
SLIDE 33

Experiments

33

100 200 300 400 500 Number of queries 0.00 0.02 0.04 0.06 0.08 0.10 0.12 (as a fraction of N) ✏−1 log ✏−1 ✏−1 log ✏−1

ApproxOrder experimental results R = 1000, compared to theoretical ✏-net bound

N = 100 N = 1000 1000 N = 10000 N = 100000

  • Max. bucket diameter
slide-34
SLIDE 34

A Relevant Question

34

Yes, a lot. Do we care about leaking the order of records? Some History. Order-Preserving and Order-Revealing Encryption (OPE/ORE) schemes leak order by design. →Devastating leakage-abuse attacks [NKW15], [GSB+17]... Led to “Second-generation” schemes. Whole point = enable range queries without leaking order. We just saw access pattern leaks order… So if you leak access pattern it’s back to square one! Also note: if DB is dense, order reveals values directly.

slide-35
SLIDE 35

Practical Experimental Results

35

Using an approximation of the DB data distribution, we can map

  • rder to actual values. (We use census data.)

Federal Aviation Administration ZIP code DB:

  • 1st digit of ZIP code (revealing region) after ~10 queries (!)
  • 2nd digit after ~100 queries.

California public employee salary DB:

  • 2% error ($10000) after ~50 queries.
  • 1% error ($5000) after ~100 queries.
slide-36
SLIDE 36

Closing Remarks

slide-37
SLIDE 37

On Range Queries

37

Severe attacks under minimal assumptions. Analysis clarifies setting.

  • Size of DB, or number of possible values, don't matter.
  • What is really leaked is order of records.
  • Various auxiliary info can get you from order to values.

Please don't use OPE/ORE. Also avoid current encrypted DBs if you don't trust the server and care about privacy. New solutions needed. E.g. efficient specialized ORAMs.

slide-38
SLIDE 38

Connection to Machine Learning

38

  • In this talk: VC theory.
  • In the article: known query setting = PAC learning.
  • Some results for general query classes.

Machine learning in crypto: also used for side channel

  • attacks. Same general setting!

Natural connection between reconstructing secret information from leakage and machine learning. Seems to be a powerful tool to understand the security implications of leakage. In side channels - use learning algorithms; here - use learning theory.