Learning to Reconstruct Statistical Learning Theory and Encrypted - - PowerPoint PPT Presentation

▶

Sep 01, 2022 254 likes •649 views

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs, Marie-Sarah Lacharit, Brice Minaud, Kenny Paterson eprint 2019/011 and IEEE S&P 2019. C2 seminar, Rennes, 2019 Outsourcing Data Data upload

SLIDE 1

C2 seminar, Rennes, 2019 Paul Grubbs, Marie-Sarah Lacharité, Brice Minaud, Kenny Paterson eprint 2019/011 and IEEE S&P 2019.

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks

SLIDE 2

Outsourcing Data

2 Data upload Data access

Client Server Sensitive data → encryption needed. An encrypted database is of little use if it cannot be searched. → Searchable Encryption. Examples: Private message server. Company/hospital outsourcing client/patient info.

SLIDE 3

Searchable Encryption

3 Client Adversarial Server Adversary: honest-but-curious host server. Security goal: confidentiality of data and queries. Very active topic in research and industry. [AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16],

[LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]… Data upload Data access

SLIDE 4

Security Model

4 Generic solutions (FHE) are infeasible at scale → for efficiency reasons, some leakage is allowed. Client Adversarial Server

Data upload Data access

Security model: parametrized by a leakage function L. Server learns nothing except for the output of the leakage function. Server learns L(query, DB)

SLIDE 5

Security Model

5 Client Server Query q q

Adversary

Real world Ideal world L Simulator L(q) q

Adversary

SLIDE 6

Keyword Search

6 Symmetric Searchable Encryption (SSE) = keyword search:

Data = collection of documents. e.g. messages.
Serch query = find documents containing given keyword(s).

Efficient solutions for leakage = search pattern + access pattern.

Some active topics:

Forward and backward privacy [B16][BMO17][CPPJ18][SYL+18]...
Locality [CT14][ANSS16][DPP18]...

SLIDE 7

Beyond Keyword Search

7 Data upload Search query Matching records

Client Server For an encrypted database management system:

Data = collection of records. e.g. health records.
Basic query examples:
find records with given value. e.g. patients aged 57.
find records within a given range. e.g. patients aged 55-65.

SLIDE 8

Range queries

8 In this talk: range queries.

Fundamental for any encrypted DB system.
Many constructions out there.
Simplest type of query that can't “just” be handled by an index.

Initial solutions: Order-Preserving, Order-Revealing Encryption. → “Second-generation” schemes enable range queries without relying on OPE/ORE. Still leak access pattern. Leakage-abuse attacks: order information can be used to infer (approximate) values. Leaking order is too revealing.

SLIDE 9

Range Queries

9 Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

What can the server learn from the above leakage?

SLIDE 10

Database Reconstruction

10 Let N = number of possible values for the target attribute. Strongest goal: full database reconstruction = recovering the exact value of every record. More general: approximate database reconstruction = recovering all values within εN.

ε = 0.05 is recovery within 5%. ε = 1/N is full recovery.

[KKNO16]: full reconstruction in O(N 4 log N) queries, assuming i.i.d. uniform queries! (“Sacrificial” recovery: values very close to 1 and N are excluded.)

SLIDE 11

Database Reconstruction

11 [KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], [LMP18]):

O(ε-4 log ε-1) for approx. reconstruction.
O(ε-2 log ε-1) with very mild hypothesis.
O(ε-1 log ε-1) for approx. order rec.
Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1)

recovers implies

Full reconstruction in O(N log N) for dense DBs. Scale-free: does not depend on size of DB or number of possible values. → Recovering all values in DB within 5% costs O(1) queries!

SLIDE 12

Database Reconstruction

12 [KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):

O(ε-4 log ε-1) for approx. reconstruction.
O(ε-2 log ε-1) with very mild hypothesis.
O(ε-1 log ε-1) for approx. order rec.
Full. Rec.

O(N4 log N) O(N2 log N) O(N log N)

Lower Bound

Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1) This talk. Main tool:

connection with statistical learning theory;
especially, VC theory.

SLIDE 13

VC Theory

C

SLIDE 14

VC Theory

14 Foundational paper: Vapnik and Chervonenkis, 1971. Uniform convergence result. Now a foundation of learning theory, especially PAC (probably approximately correct) learning. Wide applicability. Fairly easy to state/use.

(You don't have to read the original article in Russian.)

SLIDE 15

Warm-up

15 Set X with probability distribution D. Let C ⊆ X. Call it a concept. X C Sample complexity: to measure Pr(C) within ε, you need O(1/ε2) samples.

Pr(C) ≈ #points in C #points total

<latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit>

SLIDE 16

Approximating a Concept Set

16 X Now: set 𝓓 of concepts. Goal: approximate their probabilities simultaneously. The set of samples drawn from X is an ε-sample iff for all C in 𝓓:

Pr(C) − #points in C

#points total

≤ ✏

<latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit><latexit sha1_base64="UfpOiKm2RL8/P6WTnBh3SDiIqYU=">AC63icbVFLbhNBEG0Pv2A+SWDJpoQTKUjEmkFIsIywkFgaCSdBHitq9TYrfRn6K5xYoY5BTvElnNwDg7AFq5Aj2NLiUNJLT29q7Pe+NCSU9x/KsV3bh56/adjbvte/cfPNzc2n506G3pBA6EVdYdj7lHJQ0OSJLC48Ih12OFR+PTXpM/mqHz0poPNC9wpPnEyFwKToE62fqYKszpC6R9t9d7BvuQ5o6LKu2khOdUFVYa8iAN7PR26nqdJ0tc1Xq5GTaNFH4CVIsvFRN707cjRcB10GyB2jP7JdutzmlRajQkFPd+mMQFjSruSAqFdTstPRZcnPIJDgM0XKMfVQsNatgNTAa5deEZgV7+UfFtfdzPQ6VmtPUr+ca8n+5YUn561ElTVESGnExKC9VuB0aQSGTDgWpeQBcOBl2BTHlQUQKsrfbu5fnTFHNkMIlDg2eCas1N1mV5lxLNc8w56WiILPV/jKlp6b1Zrw1vBgcXCgdEaCVgDTXrfo5M5NFWgbYbPIQyA0i9KaIoO0Myks6YROdy5atkObiXr3lwHhy+6SdxN3r/sHLxZ+rbBnrCnbI8l7BU7YO9Ynw2YD/Zb/aH/Y109DX6Fn2/KI1ayz+P2ZWIfvwD74bvfQ=</latexit>

SLIDE 17

ε-sample Theorem

17 X Union bound: yields a sample complexity that depends on |𝓓|. How many samples do we need to get an ε-sample whp? V & C 1971: If 𝓓 has VC dimension d, then the number of points to get an ε-sample whp is

O( d ✏2 log d ✏ ).

<latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit>

Does not depend on |𝓓|!

SLIDE 18

VC Dimension

18 Remaining Q: what is the VC dimension? A set of points is shattered by 𝓓 iff: every subset of S is equal to C∩S for some C in 𝓓.

Example. Take 2 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subsets:

OK. Range A.

A

OK. Range B.

B

OK. Range C.

C

OK. Range D.

D 2 points = SHATTERED

SLIDE 19

VC Dimension

19

Example. Take 3 points in X=[0,1]. Concepts 𝓓 = all ranges.

1 Subset: Problem. 3 points = NOT SHATTERED VC dimension of 𝓓 = largest cardinality of a set of points in X that is shattered by 𝓓. E.g. VC dimension of ranges is 2. What typically matters is just that VC dim is finite.

SLIDE 20

Database Reconstruction

SLIDE 21

KKNO16-like Attack

21 1 N Less probable More probable Assume a uniform distribution on range queries. Idea: for each record...

1. Count frequency at which the record is hit.

→ gives estimate of probability it’s hit by uniform query.

2. deduce estimate of its value by “inverting” f.

values f

Induces a distribution f on the prob. that a given value is hit.

SLIDE 22

KKNO16-like Attack

22 1 N Step 1: for all records, estimate prob of the record being hit. This is an ε-sample! X = ranges 𝓓 ={{ranges ∋ x}: x ∈ [1,N]} so we need O(ε-2 log ε-1) queries. Step 2: because f is quadratic, “inverting” f adds a square.

f values

After O(ε-4 log ε-1) queries, the value of all records is recovered within εN.

SLIDE 23

On the i.i.d. Assumption

23 We are assuming uniformly distributed queries. In reality we are assuming:

The advesary knows the query distribution.
Queries are uniform.
More fundamentally, queries are independent and

identically distributed (i.i.d.). This is not realistic. What can we learn without that hypothesis?

SLIDE 24

Order Reconstruction

P Q ... ...

SLIDE 25

Problem Statement

25 Range = [40,100]

Client Server

45 1 83 3 45 1 6 2 83 3 28 4

This time we don't assume i.i.d. queries, or knowledge of their distribution. What can the server learn from the above leakage?

SLIDE 26

Range Query Leakage

26 Query A matches records a, b, c. Query B matches records b, c, d.

→ we learn that records b, c are between a and d. We learn something about the order of records. Then this is the only configuration (up to symmetry)! N A a b c d B

SLIDE 27

Range Query Leakage

27 Query A matches records a, b, c. Query B matches records b, c, d. Query C matches records c, d.

Then the only possible order is a, b, c, d (or d, c, b, a)! N A a b c d B C Challenges:

How do we extract order information? (What algorithm?)
How do we quantify and analyze how fast order is

learned as more queries are observed?

SLIDE 28

Challenge 1: the Algorithm

28 Short answer: there is already an algorithm! X: linearly ordered set. Order is unknown. You are given a set S containing some intervals in X. A PQ tree is a compact (linear in |X|) representation of the set of all permutations of X that are compatible with S. Long answer: PQ-trees.

Note: was used in [DR13], didn’t target reconstruction.

Can be updated in linear time.

SLIDE 29

PQ Trees

29 P a b c Order is completely unknown.

any permutation of abc.

a b c Q Order is completely known (up to reflection).

abc’or ‘cba’.

P d e a b c Q Combines in the natural way.

‘abcde’, ‘abced’, ‘dabce’, ‘eabcd’,

‘deabc’, ‘edabc’, ‘cbade’ etc.

SLIDE 30

Full Order Reconstruction

30 P No information r1 r2 r3 … … … … Q r1 r2 r3 Full reconstruction

bserve enough queries

We want to quantify order learning...

SLIDE 31

… …

Challenge 2a: Quantify Order Learning

31 P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction ε-Approximate order reconstruction. Roughly: we learn the order between two records as soon as their values are ≥ εN apart. (ε = 1/N is full reconstruction)

SLIDE 32

… …

Approximate Order Reconstruction

32 P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q Diameter ≤ εN … … … ε-Approximate reconstruction #queries? #queries?

SLIDE 33

Challenge 2b: Analyze Query Complexity

33 Intuition: if no query has an endpoint between a and b, then a and b can't be separated. → ε-approximate reconstruction is impossible. N A a b c d εN You want a query endpoint to hit every interval ≥ εN. Conversely with some other conditions it's enough.

Heavy sweeping of details under rug.

SLIDE 34

VC Theory Saves the Day (again)

34 ➞ Number of points to get an ε-net whp:

O ⇣d ✏ log d ✏ ⌘

<latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit>

The set of samples drawn from X is an ε-net iff for all C in 𝓓:

Pr(C) ≥ ✏ ⇒ C contains a sample

<latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit>

ε-samples: the ratio of points hitting each concept is close to its probability. What we want now: if a concept has high enough probability, it is hit by at least one point.

SLIDE 35

… …

Approximate Order Reconstruction

35 P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q … … … ε-Approximate reconstruction O(N log N) queries O(ε-1 log ε-1) queries

Note: some (weak) assumptions are swept under the rug.

SLIDE 36

Experiments

36

100 200 300 400 500 Number of queries 0.00 0.02 0.04 0.06 0.08 0.10 0.12 (as a fraction of N) ✏−1 log ✏−1 ✏−1 log ✏−1

ApproxOrder experimental results R = 1000, compared to theoretical ✏-net bound

N = 100 N = 1000 1000 N = 10000 N = 100000

Max. bucket diameter

SLIDE 37

Closing Remarks

SLIDE 38

On Range Queries

38 Severe attacks under minimal assumptions. Analysis clarifies setting.

Size of DB, or number of possible values, don't matter.
What is really leaked is order of records.
Various auxiliary info can get you from order to values.

Please don't use OPE/ORE. Also avoid current encrypted DBs if you don't trust the server and care about privacy. New solutions needed. E.g. efficient specialized ORAMs.

SLIDE 39

Connection to Machine Learning

39

In this talk: VC theory.
In the article: known query setting = PAC learning.
Some results for general query classes.

Machine learning in crypto: also used for side channel

attacks. Same general setting!

Natural connection between reconstructing secret information from leakage and machine learning. Seems to be a powerful tool to understand the security implications of leakage. In side channels - use learning algorithms; here - use learning theory.