SLIDE 1
Searchable Encryption, Leakage-Abuse Attacks, and Statistical - - PowerPoint PPT Presentation
Searchable Encryption, Leakage-Abuse Attacks, and Statistical - - PowerPoint PPT Presentation
Searchable Encryption, Leakage-Abuse Attacks, and Statistical Learning Theory Paul Grubbs, Marie-Sarah Lacharit, Brice Minaud, Kenny Paterson eprint 2019/011 and IEEE S&P 2019. (also eprint 2018/965, CCS 2018.) AriC crypto seminar, ENS
SLIDE 2
SLIDE 3
Searchable Encryption
3
Client Adversarial Server Adversary: honest-but-curious host server. Security goal: confidentiality of data and queries. Very active topic in research and industry. [AKSX04], [BCLO09], [PKV+14], [BLR+15], [NKW15], [KKNO16],
[LW16], [FVY+17], [SDY+17], [DP17], [HLK18], [PVC18], [MPC+18]… Data upload Data access
SLIDE 4
Security Model
4
Generic solutions (FHE) are infeasible at scale → for efficiency reasons, some leakage is allowed. Client Adversarial Server
Data upload Data access
Security model: parametrized by a leakage function L. Server learns nothing except for the output of the leakage function. Server learns L(query, DB)
SLIDE 5
Security Model
5
Client Server Query q q
Adversary
Real world Ideal world L Simulator L(q,DB) q
Adversary
SLIDE 6
Keyword Search
6
Symmetric Searchable Encryption (SSE) = keyword search:
- Data = collection of documents. e.g. messages.
- Serch query = find documents containing given keyword(s).
Efficient solutions for leakage = search pattern + access pattern.
Some active topics:
- Forward and backward privacy [B16][BMO17][CPPJ18][SYL+18]...
- Locality [CT14][ANSS16][DPP18]...
SLIDE 7
Beyond Keyword Search
7
Data upload Search query Matching records
Client Server For an encrypted database management system:
- Data = collection of records. e.g. health records.
- Basic query examples:
- find records with given value. e.g. patients aged 57.
- find records within a given range. e.g. patients aged 55-65.
SLIDE 8
Range Queries
8
In this talk: range queries.
- Fundamental for any encrypted DB system.
- Many constructions out there.
- Simplest type of query that can't “just” be handled by an index.
Initial solutions: Order-Preserving, Order-Revealing Encryption.
- Plaintexts are ordered, ciphertexts are ordered.
- The encryption map preserves order.
SLIDE 9
30 60 90 0% 25% 50% 75% 100%
Records below age Age 15
Attacks Exploiting ORE
9
- “Sorting” attack: if every possible value appears in the DB...
Just sort the ciphertexts and you learn their value!
- “CDF-matching” attack: say the attacker has an approximation
- f the Cumulative Distribution Function of DB values...
3 11 5 1 8 7 10 6 2 4 9 1 2 3 4 5 6 7 8 9 10 11
SLIDE 10
Leakage-Abuse Attacks
10
→ “Second-generation” schemes enable range queries without relying on OPE/ORE. “Leakage-abuse attacks” (coined by Cash et al. CCS'15):
- Do not contradict security proofs.
- Can be devastating in practice.
ORE: order information can be used to infer (approximate) values. Leaking order is too revealing.
SLIDE 11
Range Queries
11
Range = [40,100]
Client Server
45 1 83 3 45 1 6 2 83 3 28 4
What can the server learn from the above leakage? SE schemes supporting range queries are proven secure w.r.t. a leakage function including access pattern leakage.
SLIDE 12
Database Reconstruction
12
Let N = number of possible values for the target attribute. Strongest goal: full database reconstruction = recovering the exact value of every record. More general: approximate database reconstruction = recovering all values within εN.
ε = 0.05 is recovery within 5%. ε = 1/N is full recovery.
[KKNO16]: full reconstruction in O(N 4 log N) queries, assuming i.i.d. uniform queries! (“Sacrificial” recovery: values very close to 1 and N are excluded.)
SLIDE 13
Database Reconstruction
13
[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], [LMP18]):
- O(ε-4 log ε-1) for approx. reconstruction.
- O(ε-2 log ε-1) with very mild hypothesis.
- O(ε-1 log ε-1) for approx. order rec.
- Full. Rec.
O(N4 log N) O(N2 log N) O(N log N)
Lower Bound
Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1)
recovers implies
Full reconstruction in O(N log N) for dense DBs. Scale-free: does not depend on size of DB or number of possible values. → Recovering all values in DB within 5% costs O(1) queries!
SLIDE 14
Database Reconstruction
14
[KKNO16]: full reconstruction in O(N 4 log N) queries! This talk ([GLMP19], subsuming [LMP18]):
- O(ε-4 log ε-1) for approx. reconstruction.
- O(ε-2 log ε-1) with very mild hypothesis.
- O(ε-1 log ε-1) for approx. order rec.
- Full. Rec.
O(N4 log N) O(N2 log N) O(N log N)
Lower Bound
Ω(ε-4) Ω(ε-2) Ω(ε-1 log ε-1) This talk. Main tool:
- connection with statistical learning theory;
- especially, VC theory.
SLIDE 15
VC Theory
C
SLIDE 16
VC Theory
16
Foundational paper: Vapnik and Chervonenkis, 1971. Uniform convergence result. Now a foundation of learning theory, especially PAC (probably approximately correct) learning. Wide applicability. Fairly easy to state/use.
(You don't have to read the original article in Russian.)
SLIDE 17
Warm-up
17
Set X with probability distribution D. Let C ⊆ X. Call it a concept. X C Sample complexity: to measure Pr(C) within ε, you need O(1/ε2) samples.
Pr(C) ≈ #points in C #points total
<latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit><latexit sha1_base64="DjnHOxRz4I3ci4soPVaWmu0s1+E=">AC1XicbVFNixNBEO2MX2v82KwevTRmF1bQMCOCHheD4DGC2SxkQqj01CTN9sfQXRM3DnMTr/4Of41X9/Yk01gN2tBw+NVe9V7NCSU9xfNmK7ty9d/B3sP2o8dPnu53Dp6dels6gUNhlXVnM/CopMEhSVJ4VjgEPVM4mp3m/xoic5La7SqsCJhrmRuRAgZp2+unAHfdf8RSKwtkLnuYORJV2U8ILqgorDXkuDT/sH9b1Lk+WQNX1tNONe/E6+G2QbECXbWIwPWh9TzMrSo2GhALvx0lc0KQCR1IorNtp6bEAcQ5zHAdoQKOfVGu1NT8KTMZz68IzxNfs9Y4KtPcrPQuVGmjhd3MN+b/cuKT8w6SpigJjbgalJcqyOSNdTyTDgWpVQAgnAy7crGA4BcFg9vto+tzFqiWSEGJQ4PfhNUaTFalOWipVhnmUCoKfvp8i29s6cFs1+SfDIRjBrNLZ6SZc2t4k37j0cmcN1Vc2wxf8zCAl35dQgt0HM1SOmsak4PO7ZftcK1k9za3wenbXhL3ki/vuicfN3fbYy/YS3bMEvaenbDPbMCGTLDf7A/7y6jUVRHP6KfV6VRa9PznN2I6Nc/0+vmnQ=</latexit> SLIDE 18
Approximating a Concept Set
18
X Now: set 𝓓 of concepts. Goal: approximate their probabilities simultaneously. The set of samples drawn from X is an ε-sample iff for all C in 𝓓:
- Pr(C) − #points in C
#points total
- ≤ ✏
SLIDE 19
ε-sample Theorem
19
X Union bound: yields a sample complexity that depends on |𝓓|. How many samples do we need to get an ε-sample whp? V & C 1971: If 𝓓 has VC dimension d, then the number of points to get an ε-sample whp is
O( d ✏2 log d ✏ ).
<latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit><latexit sha1_base64="9lVr7IL6AG/fkO4DW1c5A8k0Rrs=">ADPXicbVLihQxFE2Xr7F8TI8u3RQ2wijSVA2CLgfduHMEe2ag0zap1E1mDyKJNXaFPUbfo3gSv/BD3An7sStqYdgdc+FJId7z3kcNCcOvi+PsouHL12vUbezfDW7fv3N0fH9w7tbo0FGZUC23OU2JBcAUzx52A8IAkamAs/TiVRM/W4OxXKt3blPAQpJcYpcd61HMdvDjEzhFZXWEoLBdavT+qIyx0Hu1E6sfT5XgST+PWol2Q9GCejtZHox+40zTUoJyVBr50lcuEVFjONUQB3i0kJB6AXJYe6hIhLsomq/VkePvCeLmDb+KBe13v8zKiKt3cjUMyVxK7sda5yXxealYy8WFVdF6UDRrhErReR01OgUZdwAdWLjAaG+1kjuiJeD+fVHRZgViDG/yjsqxtHGIDCj5QLSVR2ZMKMyK52GTASCmc19Wyf/gyFZ5ma17YXpCPnSIhFuCwNjznigBzOHmGr9s3K4vYcjVC3ZN27G0wWoqm4hFdoCTnOjy2JQvN7Ob4v6AoR5ITo+DNM6RugXJdlei1wejRN4mny9tnk+GW/MnvoAXqIDlGCnqNj9BqdoBmi6BP6jL6ib8GX4EfwM/jVUYNRn3MfDSz48xdm7xhr</latexit>Does not depend on |𝓓|!
SLIDE 20
VC Dimension
20
Remaining Q: what is the VC dimension? A set of points is shattered by 𝓓 iff: every subset of S is equal to C∩S for some C in 𝓓.
- Example. Take 2 points in X=[0,1]. Concepts 𝓓 = all ranges.
1 Subsets:
- OK. Range A.
A
- OK. Range B.
B
- OK. Range C.
C
- OK. Range D.
D 2 points = SHATTERED
SLIDE 21
VC Dimension
21
- Example. Take 3 points in X=[0,1]. Concepts 𝓓 = all ranges.
1 Subset: Problem. 3 points = NOT SHATTERED VC dimension of 𝓓 = largest cardinality of a set of points in X that is shattered by 𝓓. E.g. VC dimension of ranges is 2. What typically matters is just that VC dim is finite.
SLIDE 22
Database Reconstruction
SLIDE 23
KKNO16-like Attack
23
1 N Less probable More probable Assume a uniform distribution on range queries. Idea: for each record...
- 1. Count frequency at which the record is hit.
→ gives estimate of probability it’s hit by uniform query.
- 2. deduce estimate of its value by “inverting” f.
values f
Induces a distribution f on the prob. that a given value is hit.
SLIDE 24
KKNO16-like Attack
24
1 N Step 1: for all records, estimate prob of the record being hit. This is an ε-sample! X = ranges 𝓓 ={{ranges ∋ x}: x ∈ [1,N]} so we need O(ε-2 log ε-1) queries. Step 2: because f is quadratic, “inverting” f adds a square.
f values
After O(ε-4 log ε-1) queries, the value of all records is recovered within εN.
SLIDE 25
On the i.i.d. Assumption
25
We are assuming uniformly distributed queries. In reality we are assuming:
- The advesary knows the query distribution.
- Queries are uniform.
- More fundamentally, queries are independent and
identically distributed (i.i.d.). This is not realistic. What can we learn without that hypothesis?
SLIDE 26
Order Reconstruction
P Q ... ...
SLIDE 27
Problem Statement
27
Range = [40,100]
Client Server
45 1 83 3 45 1 6 2 83 3 28 4
This time we don't assume i.i.d. queries, or knowledge of their distribution. What can the server learn from the above leakage?
SLIDE 28
Range Query Leakage
28
Query A matches records a, b, c. Query B matches records b, c, d.
→ we learn that records b, c are between a and d. We learn something about the order of records. Then this is the only configuration (up to symmetry)! N A a b c d B
SLIDE 29
Range Query Leakage
29
Query A matches records a, b, c. Query B matches records b, c, d. Query C matches records c, d.
Then the only possible order is a, b, c, d (or d, c, b, a)! N A a b c d B C Challenges:
- How do we extract order information? (What algorithm?)
- How do we quantify and analyze how fast order is
learned as more queries are observed?
SLIDE 30
Challenge 1: the Algorithm
30
Short answer: there is already an algorithm! X: linearly ordered set. Order is unknown. You are given a set S containing some intervals in X. A PQ tree is a compact (linear in |X|) representation of the set of all permutations of X that are compatible with S. Long answer: PQ-trees.
Note: was used in [DR13], didn’t target reconstruction.
Can be updated in linear time.
SLIDE 31
PQ Trees
31
P a b c Order is completely unknown.
- any permutation of abc.
a b c Q Order is completely known (up to reflection).
- abc’or ‘cba’.
P d e a b c Q Combines in the natural way.
- ‘abcde’, ‘abced’, ‘dabce’, ‘eabcd’,
‘deabc’, ‘edabc’, ‘cbade’ etc.
SLIDE 32
Full Order Reconstruction
32
P No information r1 r2 r3 … … … … Q r1 r2 r3 Full reconstruction
- bserve enough queries
We want to quantify order learning...
SLIDE 33
… …
Challenge 2a: Quantify Order Learning
33
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction ε-Approximate order reconstruction. Roughly: we learn the order between two records as soon as their values are ≥ εN apart. (ε = 1/N is full reconstruction)
SLIDE 34
… …
Approximate Order Reconstruction
34
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q Diameter ≤ εN … … … ε-Approximate reconstruction #queries? #queries?
SLIDE 35
Challenge 2b: Analyze Query Complexity
35
Intuition: if no query has an endpoint between a and b, then a and b can't be separated. → ε-approximate reconstruction is impossible. N A a b c d εN You want a query endpoint to hit every interval ≥ εN. Conversely with some other conditions it's enough.
Heavy sweeping of details under rug.
SLIDE 36
VC Theory Saves the Day (again)
36
➞ Number of points to get an ε-net whp:
O ⇣d ✏ log d ✏ ⌘
<latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit><latexit sha1_base64="bG12hBMKQYLjA5/zK1nlVMJu+Ls=">ACyHicbVHbahRBEO0dL4njbaOPvjQugQi6zIgYH0NEF+M4CaBnWp7amZbdKXobtmwzjsi9/h1/iqP+Df2LPZhdwKGg6nTnVnZpVSnpKkn+96M7de/e3th/EDx89fvK0v/Ps2NvaCRwJq6w7nYFHJQ2OSJLC08oh6JnCk9nZxy5/skDnpTXfqalwoqE0spACKFDT/vuv2aEs93hWOBtvmwzrLxU1ix5pmx5C9/pX037g2SYrILfBOkaDNg6jqY7vR9ZbkWt0ZBQ4P04TSqatOBICoXLOKs9ViDOoMRxgAY0+km7WnDJdwOT8K68AzxFXu5ogXtfaNnQamB5v56riNvy41rKj5MWmqmtCIi0ZFrThZ3rnFc+lQkGoCAOFkmJWLOQRLKHgax7uX+8xRLZDCJg4NngurNZi8zQrQUjU5FlArCj76YoOvTOnBbMbknwyE+3lOtTPSlNwa3qXfeHSy4J2Ka5vjax4a8NqvJDRHx9EspLOmMznsufkyDtdKr9/mJjh+O0yTYfrt3eDgcH23bfaCvWR7LGX7IB9ZkdsxAT7xX6zP+xv9CWqovOouZBGvXNc3Ylop/AVtr4bk=</latexit>The set of samples drawn from X is an ε-net iff for all C in 𝓓:
Pr(C) ≥ ✏ ⇒ C contains a sample
<latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit><latexit sha1_base64="GLJ0wq3GFBIt6qZvzQ86d7jvkM=">AC0HicbVFbaxNBFJ6st7reUn305WBaqKBhVwR9kmIQfIzFtIVsCJPZs8nQuawzZ9PGJYiv/g5/ja/64r9xNk2gFw8MfHzfmXP7JqWSnpLkbyu6cfPW7Ttbd+N79x8fNTefnzobeUEDoRV1h1PuEclDQ5IksLj0iHXE4VHk5Neox/N0XlpzWdalDjSfGpkIQWnQI3b7K+g73ec8im+AUyL1U1kB2IKcz4s7ZU8gIz6je6e2AsIa4NB4eK5Lhctxu5N0k1XAdZCuQYetoz/ebn3NcisqjYaE4t4P06SkUc0dSREKxlnlseTihE9xGKDhGv2oXi26hN3A5FBYF54hWLEXf9Rce7/Qk5CpOc38Va0h/6cNKyrejmpyorQiPNGRaWALDRXg1w6FKQWAXDhZJgVxIw7LijcNo53L/aZoZojhU0cGjwVmtu8joruJZqkWPBK0XLOvPFBl+a0nOzGRM+GB589ECVM9JMIRjTyC89OlAkwXa5vgCQgOo/CqFZugAzVw6a5ojhz03JePgVnrVm+vg8FU3Tbrp9ed/fdr37bYU/aM7bGUvWH7CPrswET7Cf7xX6zP9FBdBZ9i76fp0at9Z8n7FJEP/4Bzr/jVw=</latexit>ε-samples: the ratio of points hitting each concept is close to its probability. What we want now: if a concept has high enough probability, it is hit by at least one point.
SLIDE 37
… …
Approximate Order Reconstruction
37
P Q No information r1 r2 r3 … … r1 r2 r3 Full reconstruction … … Q … … … ε-Approximate reconstruction O(N log N) queries O(ε-1 log ε-1) queries
Note: some (weak) assumptions are swept under the rug.
Conclusion: learn order very quickly. Almost back to ORE...
SLIDE 38
Experiments
38
100 200 300 400 500 Number of queries 0.00 0.02 0.04 0.06 0.08 0.10 0.12 (as a fraction of N) ✏−1 log ✏−1 ✏−1 log ✏−1
ApproxOrder experimental results R = 1000, compared to theoretical ✏-net bound
N = 100 N = 1000 1000 N = 10000 N = 100000
- Max. bucket diameter
SLIDE 39
Volume Leakage
7 1 13 3 11 8 10 20
SLIDE 40
Problem Statement
40
Range = [40,100]
Client Server
45 1 83 3 45 1 6 2 83 3 28 4
What can the server learn from the above leakage? Attacker only sees volumes = number of records matching each query.
2 matches
SLIDE 41
Volumes
41
3 7 1 12 1 2 3 4
Value Counts A volume = number of records matching some range.
8 13
Some volumes The attacker wants to learn exact counts.
SLIDE 42
Elementary Volumes
42
3 7 1 12 1 2 3 4
Value Counts
3 10 11 23
“Elementary” ranges Elementary volumes = volumes of ranges [1,1], [1,2], [1,3]...
SLIDE 43
Elementary Volumes
43
3 7 1 12 1 2 3 4
Value Counts
- Knowing set of elementary volumes ⇔ knowing counts.
vol([a,b]) = vol([1,b]) - vol([1,a])
- Every volume is = difference of two elementary volumes.
so... Fact: Our goal: finding elementary volumes.
SLIDE 44
The Attack
44
Assumption: the volumes of all queries are observed.
7 12 23 1 13 3 11 8 10 20
Draw an edge between volumes a and b iff |b-a| is a volume.
7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20 7 12 23 1 13 3 11 8 10 20
SLIDE 45
Summary
45
Attack: elementary volumes form a clique in the volume graph → clique-finding algorithm reveals them. For structured queries, even just volume leakage can be quite damaging. Attack requires strong assumption. In the article:
- Pre-processing to avoid clique finding.
- Analysis of parameters + experiments.
- Other attacks.
SLIDE 46
Closing Remarks
SLIDE 47
On Range Queries
47
Access pattern: severe attacks under minimal assumptions. Please don't use OPE/ORE. Also avoid current encrypted DBs if you don't trust the server and care about privacy. New solutions needed. E.g. efficient specialized ORAMs. Even then, need to hide volumes. Many open problems...
SLIDE 48
Connection to Machine Learning
48
- In this talk: VC theory.
- In the article: known query setting = PAC learning.
- Some results for general query classes.
Machine learning in crypto: also used for side channel
- attacks. Same general setting!