Statistical Databases – Query Auditing
Li Xiong
CS573 Data Privacy and Anonymity
Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin
Statistical Databases Query Auditing Li Xiong CS573 Data Privacy - - PowerPoint PPT Presentation
Statistical Databases Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin Query Audit Problem Maintaining privacy of data Auditor Q 1 Q 2 Q n Database
CS573 Data Privacy and Anonymity
Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin
slide 2
Database Q1 Auditor
… Q2 … Qn A1 A2 … An Q Does answer to Q combined with answers to Q1,…,Qn reveal something? A or “Denied”
slide 3
Database Q1 Auditor … Q2 … Qn A1 A2 … An List of real, integer, or Boolean values Specifies subset of the variables Min, max, median, sum, average,
Wants to learn value of some variable
slide 4
Offline auditing
Given a collection of queries and answers to them,
Detects privacy breaches after the fact
Online auditing
Queries are presented to auditor one at a time;
Prevents privacy breaches on-the-fly
Query auditing method for SUM queries A SUM query can be considered as a linear equation
where is whether record i belongs to the query set, xi is the sensitive value, and q is the query result
A set of SUM queries can be thought of as a system of
linear equations
Maintains the binary matrix representing linearly
independent queries and update it when a new query is issued
A row with all 0s except for ith column indicates disclosure
slide 9
slide 10
Real: multiple solutions, secure Boolean: unique solution, insecure (why?)
Partial disclosure – the disclosed range of the protected
Sum queries Interval-based disclosure: monitoring upper and lower
Auditing interval-based inference. Li et al. 2002
An efficient online auditing approach to limit private data disclosure, Lu, 2009
slide 14
Database Qi+1 Auditor A or “Denied” Previous queries Q1 … Qi “Denied” if answering Qi+1 would cause a privacy breach
Query set size control Query set overlap control Limited data utility
slide 17
“On the advice of my counsel I respectfully and regretfully decline to answer the question based
Colonel Oliver North, on the Iran-Contra arms deal “Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States.” David Duncan, former auditor for Enron and partner in Arthur Andersen
[slide stolen from Kobbi Nissim]
slide 18
Database
Gimme sum(d1,d2,d3)
Auditor
Answer=15 Gimme max(d1,d2,d3) “Denied”
Wait… there must be a reason why second query was denied Oh well The only possible reason for denial is if d1=d2=d3=5
slide 19
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi; a1,…,ai) qi+1 denied
Leakage is not prevented Have to remember which queries were
Semantically determine whether two queries
Simulatable Auditing, Kenthapadi, 2005
slide 22
An auditor is a function of Q, A and X An auditor is simulatable if there exists a simulator
Auditor qi+1 Deny or answer qi+1
Deny or answer Simulator q1,…,qi a1,…,ai Database q1,…,qi
slide 23
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai ) qi+1 denied/allowed
slide 26
Database
Gimme sum(d1,d2,d3)
Auditor
Answer=a1 Gimme max(d1,d2,d3) “Denied”
Privacy definition
Privacy of groups/families
Algorithmic limitations
Simulatable algorithms computationally prohibitive Most work on sum queries, some on max, min,
Collusion
Reduced utility for legitimate users Large audit trail
Utility
Percentage of denials may not be the best measure