SLIDE 1
Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks
Paul Grubbs, Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson CASYS team seminar, Grenoble, December 2018
SLIDE 2 Outsourcing Data with Search Capabilities
2
Data upload Search query Matching records
Client Server For an encrypted database management system:
- Data = collection of records. e.g. health records.
- Basic query examples:
- find records with given value. e.g. patients aged 57.
- find records within a given range. e.g. patients aged 55-65.
SLIDE 3
Searchable Encryption
3
Search query Matching records
Client Adversarial Server Adversary: honest-but-curious host server. Security goal: privacy of data and queries. Very active topic in research and industry. [AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16],
[LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]…
SLIDE 4
Security Model of Searchable Encryption
4
Search query Matching records
Client Adversarial Server Server learns F(query, DB) Generic solutions (FHE) are infeasible at scale → for efficiency reasons, some leakage is allowed. Security model: parametrized by a leakage function. Server learns nothing except for the output of the leakage function.
SLIDE 5 Implications of Leakage Function
5
In practice: nearly all practical schemes leak at least the set of records matching each query = access pattern leakage.
OPE, ORE schemes, POPE, [HK16], BlindSeer, [Lu12], [FJ+15], FH- OPE, Lewi-Wu, Arx, Cipherbase, EncKV, …
What are the security implications of this leakage? In this talk: focus on range queries.
- Fundamental for any encrypted DB system.
- Many constructions out there.
- Simplest type of query that can't “just” be handled by an index
(cf. Symmetric Searchable Encryption).
SLIDE 6
Range Queries
6
Range = [40,100]
Client Server
45 1 83 3 45 1 6 2 83 3 28 4
Let's specify the problem: say records take N possible values, and queries are uniformly distributed. What can the server learn from the above leakage?
SLIDE 7
Database Reconstruction
7
Strongest goal: full database reconstruction = recovering the exact value of every record. More general: approximate database reconstruction = recovering all values within εN.
ε = 1/N is full recovery. ε = 0.05 is recovery within 5%.
[KKNO16]: full reconstruction in O(N 4 log N) queries, assuming i.i.d. uniform queries! (“Sacrificial” recovery: values very close to 1 and N are excluded.)
SLIDE 8 Database Reconstruction
8
[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):
- O(ε-4 log ε-1) for approx. reconstruction.
- O(ε-2 log ε-1) with very mild hypothesis.
- O(ε-1 log ε-1) for approx. order rec.
- Full. Rec.
O(N4 log N) O(N2 log N) O(N log N)
Lower Bound
Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1)
recovers implies
Full reconstruction in O(N log N) for dense DBs. Scale-free: does not depend on size of DB or number of possible values. → Recovering all values in DB within 5% costs O(1) queries!
SLIDE 9 Database Reconstruction
9
[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):
- O(ε-4 log ε-1) for approx. reconstruction.
- O(ε-2 log ε-1) with very mild hypothesis.
- O(ε-1 log ε-1) for approx. order rec.
- Full. Rec.
O(N4 log N) O(N2 log N) O(N log N)
Lower Bound
Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1) This talk. Main tool:
- connection with statistical learning theory;
- especially, VC theory.
SLIDE 10
VC Theory
C
SLIDE 11
VC Theory
11
Foundational paper: Vapnik and Chervonenkis, 1971. Uniform convergence result. Now a foundation of learning theory, especially PAC (probably approximately correct) learning. Wide applicability. Fairly easy to state/use.
(You don't have to read the original article in Russian.)
SLIDE 12
Warm-up
12
Set X with probability distribution D. Let C ⊆ X. Call it a concept. X C Sample complexity: to measure Pr(C) within ε, you need O(1/ε2) samples.
Pr(C) ≈ #points in C #points total
<latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit>
SLIDE 13 Approximating a Concept Set
13
X Now: set 𝓓 of concepts. Goal: approximate their probabilities simultaneously. The set of samples drawn from X is an ε-sample iff for all C in 𝓓:
#points total
<latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit>
SLIDE 14
ε-sample Theorem
14
X Union bound: yields a sample complexity that depends on |𝓓|. How many samples do we need to get an ε-sample whp? V & C 1971: If 𝓓 has VC dimension d, then the number of points to get an ε-sample whp is
O( d ✏2 log d ✏ ).
<latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit>
Does not depend on |𝓓|!
SLIDE 15 VC Dimension
15
Remaining Q: what is the VC dimension? A set of points is shattered by 𝓓 iff: every subset of S is equal to C∩S for some C in 𝓓.
- Example. Take 2 points in X=[0,1]. Concepts 𝓓 = all ranges.
1 Subsets:
A
B
C
D 2 points = SHATTERED
SLIDE 16 VC Dimension
16
- Example. Take 3 points in X=[0,1]. Concepts 𝓓 = all ranges.
1 Subset: Problem. 3 points = NOT SHATTERED VC dimension of 𝓓 = largest integer d such that every set
- f d points in X is shattered.
E.g. VC dimension of ranges is 2. What typically matters is just that VC dim is finite.
SLIDE 17
Database Reconstruction
SLIDE 18 KKNO16-like Attack
18
1 N Less probable More probable Assume a uniform distribution on range queries. Idea: for each record...
- 1. Count frequency at which the record is hit.
→ gives estimate of probability it’s hit by uniform query.
- 2. deduce estimate of its value by “inverting” f.
values f
Induces a distribution f on the prob. that a given value is hit.
SLIDE 19
KKNO16-like Attack
19
1 N Step 1: for all records, estimate prob of the record being hit. This is an ε-sample! X = ranges 𝓓 ={{ranges ∋ x}: x ∈ [1,N]} so we need O(ε-2 log ε-1) queries. Step 2: because f is quadratic, “inverting” f adds a square.
f values
After O(ε-4 log ε-1) queries, the value of all records is recovered within εN.
SLIDE 20 On the i.i.d. Assumption
20
We are assuming uniformly distributed queries. In reality we are assuming:
- The advesary knows the query distribution.
- Queries are uniform.
- More fundamentally, queries are independent and
identically distributed (i.i.d.). This is not realistic. What can we learn without that hypothesis?
SLIDE 21
Order Reconstruction
P Q ... ...
SLIDE 22
Problem Statement
22
Range = [40,100]
Client Server
45 1 83 3 45 1 6 2 83 3 28 4
This time we don't assume i.i.d. queries, or knowledge of their distribution. What can the server learn from the above leakage?
SLIDE 23
Range Query Leakage
23
Query A matches records a, b, c. Query B matches records b, c, d.
→ we learn that records b, c are between a and d. We learn something about the order of records. Then this is the only configuration (up to symmetry)! N A a b c d B
SLIDE 24 Range Query Leakage
24
Query A matches records a, b, c. Query B matches records b, c, d. Query C matches records c, d.
Then the only possible order is a, b, c, d (or d, c, b, a)! N A a b c d B C Challenges:
- How do we extract order information? (What algorithm?)
- How do we quantify and analyze how fast order is
learned as more queries are observed?
SLIDE 25
Challenge 1: the Algorithm
25
Short answer: there is already an algorithm! X: linearly ordered set. Order is unknown. You are given a set S containing some intervals in X. A PQ tree is a compact (linear in |X|) representation of the set of all permutations of X that are compatible with S. Long answer: PQ-trees.
Note: was used in [DR13], didn’t target reconstruction.
Can be updated in linear time.
SLIDE 26 PQ Trees
26
P a b c Order is completely unknown.
a b c Q Order is completely known (up to reflection).
P d e a b c Q Combines in the natural way.
- ‘abcde’, ‘abced’, ‘dabce’, ‘eabcd’,
‘deabc’, ‘edabc’, ‘cbade’ etc.
SLIDE 27 Full Order Reconstruction
27
P No information r1 r2 r3 … … … … Q r1 r2 r3 Full reconstruction
We want to quantify order learning...
SLIDE 28
… …
Challenge 2a: Quantify Order Learning
28
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction ε-Approximate order reconstruction. Roughly: we learn the order between two records as soon as their values are ≥ εN apart. (ε = 1/N is full reconstruction)
SLIDE 29
… …
Approximate Order Reconstruction
29
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q Diameter ≤ εN … … … ε-Approximate reconstruction #queries? #queries?
SLIDE 30
Challenge 2b: Analyze Query Complexity
30
Intuition: if no query has en endpoint between a and b, then a and b can't be separated. → ε-approximate reconstruction is impossible. N A a b c d εN You want a query endpoint to hit every interval ≥ εN. Conversely with some other conditions it's enough.
Heavy sweeping of details under rug.
SLIDE 31
VC Theory Saves the Day (again)
31
➞ Number of points to get an ε-net whp:
O ⇣d ✏ log d ✏ ⌘
<latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit>
The set of samples drawn from X is an ε-net iff for all C in 𝓓:
Pr(C) ≥ ✏ ⇒ C contains a sample
<latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit>
ε-samples: the ratio of points hitting each concept is close to its probability. What we want now: if a concept has high enough probabiliy, it is hit by at least one point.
SLIDE 32
… …
Approximate Order Reconstruction
32
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q … … … ε-Approximate reconstruction O(N log N) queries O(ε-1 log ε-1) queries
Note: some (weak) assumptions are swept under the rug.
SLIDE 33 Experiments
33
100 200 300 400 500 Number of queries 0.00 0.02 0.04 0.06 0.08 0.10 0.12 (as a fraction of N) ✏−1 log ✏−1 ✏−1 log ✏−1
ApproxOrder experimental results R = 1000, compared to theoretical ✏-net bound
N = 100 N = 1000 1000 N = 10000 N = 100000
SLIDE 34
A Relevant Question
34
Yes, a lot. Do we care about leaking the order of records? Some History. Order-Preserving and Order-Revealing Encryption (OPE/ORE) schemes leak order by design. →Devastating leakage-abuse attacks [NKW15], [GSB+17]... Led to “Second-generation” schemes. Whole point = enable range queries without leaking order. We just saw access pattern leaks order… So if you leak access pattern it’s back to square one! Also note: if DB is dense, order reveals values directly.
SLIDE 35 Practical Experimental Results
35
Using an approximation of the DB data distribution, we can map
- rder to actual values. (We use census data.)
Federal Aviation Administration ZIP code DB:
- 1st digit of ZIP code (revealing region) after ~10 queries (!)
- 2nd digit after ~100 queries.
California public employee salary DB:
- 2% error ($10000) after ~50 queries.
- 1% error ($5000) after ~100 queries.
SLIDE 36
Closing Remarks
SLIDE 37 On Range Queries
37
Severe attacks under minimal assumptions. Analysis clarifies setting.
- Size of DB, or number of possible values, don't matter.
- What is really leaked is order of records.
- Various auxiliary info can get you from order to values.
Please don't use OPE/ORE. Also avoid current encrypted DBs if you don't trust the server and care about privacy. New solutions needed. E.g. efficient specialized ORAMs.
SLIDE 38 Connection to Machine Learning
38
- In this talk: VC theory.
- In the article: known query setting = PAC learning.
- Some results for general query classes.
Machine learning in crypto: also used for side channel
- attacks. Same general setting!
Natural connection between reconstructing secret information from leakage and machine learning. Seems to be a powerful tool to understand the security implications of leakage. In side channels - use learning algorithms; here - use learning theory.