Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - - PowerPoint PPT Presentation
Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - - PowerPoint PPT Presentation
Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops? Adaptive Data Analysis 1 : > 6 ft? q 1 a 1 q 2 M 2 : muffin bottoms? a 2 q 3 a 3 Database data analyst
Adaptive Data Analysis
ο½ ππ depends on π1, π2, β¦ , ππβ1 ο½ Worry: analyst finds a query for which the dataset is not
representative of population; reports surprising discovery
q1 a1
Database π βΌ πΈ data analyst
M
q2 a2 q3 a3
π π1: > 6 ft? π2: muffin tops? π2: muffin bottoms?
Differential Privacy for Adaptive Validity
ο½ ππ depends on π1, π2, β¦ , ππβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity
ο½ Definition of privacy tailored to statistical analysis of large data sets
q1 a1
Database data analyst
DP
q2 a2 q3 a3
[D., Feldman, Hardt, Pitassi, Reingold, Roth β14] DP
π π1: > 6 ft? π2: muffin tops? π2: muffin bottoms?
Differential Privacy for Adaptive Validity
ο½ ππ depends on π1, π2, β¦ , ππβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity
ο½ β LARGE literature on DP algorithms for data analysis
q1 a1
Database data analyst
DP
q2 a2 q3 a3
[D., Feldman, Hardt, Pitassi, Reingold, Roth β14] DP
π π1: > 6 ft? π2: muffin tops? π2: muffin bottoms?
Some Intuition
ο½ Fix a query, eg, βWhat fraction of population is over 6 feet tall?β ο½ Almost all large datasets will give an approximately correct reply
ο½ Most datasets are representative with respect to this query
ο½ If, in the process of adaptive exploration, the analyst finds a
query for which the dataset is not representative, then she must have βlearned something significantβ about the dataset.
ο½ Preserving the βprivacyβ of the data may prevent over-fitting.
Intuition After Natiβs Talk
ο½ Differential Privacy: The outcome of any analysis is essentially equally
likely, independent of whether any individual joins, or refrains from joining, the dataset.
ο½ This is a stability requirement. ο½ Gave rise to the folklore that differential privacy yields generalizability. ο½ But we will be able to say something stronger.
ο½ ππ depends on π1, π2, β¦ , ππβ1 ο½ Differential privacy neutralizes risks incurred by adaptivity
ο½ E.g., for statistical queries: whp πΉπ π΅ π
β πΉπ π΅ π < π
ο½ High probability is important for handling many queries
[D., Feldman, Hardt, Pitassi, Reingold, Roth β14]
q1 a1
Database data analyst
M
q2 a2 q3 a3
DP
π1: > 6 ft? π2: muffin tops? π2: muffin bottoms? π
ππ(π) fails to generalize
Formalization
ο½ Data sets π β ππ; π βΌ πΈ ο½ Queries π: ππ β π ο½ Algorithms that choose queries and output results
ο½ π΅1 = π1 (trivial choice), outputs (π1, π1(π))
ο½ π΅π: ππ Γ π
1 Γ β― Γ π πβ1 β π π where ο½ ππ = π·π(π§1, β¦ , π§πβ1) ο½ π΅π π, π§1, β¦ , π§πβ1 = ππ, ππ π
= (ππ, ππ)
ο½ πΌ β
π, π π π not representative wrt πΈ}
ο½ β π§1, β¦ , π§πβ1 Pr
π
π, ππ β πΌ β€ πΎπ
ο½ We want: Pr[ π», π·π π»
β πΌ] to be similar
ο½ ππ(π) should generalize even when ππ chosen as a function of π
Choose new query based on history of
- bservations
Output chosen query and its response on π
q1 a1 q2 a2 q3 a3
Differential Privacy [D.,McSherry,Nissim,Smith β06]
π gives π-differential privacy if for all pairs of adjacent data sets π, πβ², and all events π Pr π π β π β€ ππ Pr π πβ² β π Randomness introduced by π
Differential Privacy [D.,McSherry,Nissim,Smith β06]
π gives π-differential privacy if for all pairs of adjacent data sets π, πβ², and all events π
For random variables π, π over Ξ§, the max-divergence of π from π is given by πΈβ(π| π = log max
xβπ
Pr[π = π¦] Pr[π = π¦] Then π-DP equivalent to πΈβ π π ||π(πβ²) β€ π.
Pr π π β π β€ ππ Pr π πβ² β π
Differential Privacy [D.,McSherry,Nissim,Smith β06]
π gives π-differential privacy if for all pairs of adjacent data sets π, πβ², and all events π
For random variables π, π over Ξ§, the max-divergence of π from π is given by πΈβ(π| π = log max
xβπ
Pr[π = π¦] Pr[π = π¦] Then π-DP equivalent to πΈβ π π ||π(πβ²) β€ π. Closed under post-processing: πΈβ π΅(π π )||π΅(π πβ² ) β€ π.
Pr π π β π β€ ππ Pr π πβ² β π
Differential Privacy [D.,McSherry,Nissim,Smith β06]
π gives π-differential privacy if for all pairs of adjacent data sets π, πβ², and all events π
For random variables π, π over Ξ§, the max-divergence of π from π is given by πΈβ(π| π = log max
xβπ
Pr[π = π¦] Pr[π = π¦] Then π-DP equivalent to πΈβ π π ||π(πβ²) β€ π. Group Privacy: βπ, πβ²β² πΈβ π π ||π πβ² β€ Ξ π, πβ²β² π.
Pr π π β π β€ ππ Pr π πβ² β π
Properties
ο½ Closed under post-processing
ο½ Max-divergence remains bounded
ο½ Automatically yields group privacy
ο½ ππ for groups of size π
ο½ Understand behavior under adaptive composition
ο½ Can bound cumulative privacy loss over multiple analyses
ο½ βThe epsilons add upβ
ο½ Programmable
ο½ Complicated private analyses from simple private building blocks
The Power of Composition
ο½ Lemma: The choice of ππ is differentially private.
ο½ Closure under post-processing.
ο½ Inductive step (key): If π is chosen in a differentially private
fashion with respect to π, then Pr[ π», π·(π») β πΌ] is small
ο½ Sufficiency: union bound.
q1 a1
Database data analyst
M
q2 a2 q3 a3
DP
π
Description Length
ο½ Let π΅: ππ β π. ο½ Description length of π΅ is the cardinality of its range
If βπ§ Prπ π, π§ β πΌ β€ πΎ, then Pr
S
π, π΅ π β πΌ β€ π β πΎ
ο½ Description length composes too.
ο½ Product: πΎ β Ξ π|π
π|
ο½ And, morally, it is closed under post-processing
ο½ Once you fix the randomness of the post-processing algorithm
[D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Approximate max-divergence
πΎ-approximate max-divergence of π from π
πΈβ
πΎ(π| π = log
max
πβπ, Pr πβπ >πΎ
Pr π β π β πΎ Pr[π β π]
Approximate max-divergence
πΎ-approximate max-divergence of π from π
πΈβ
πΎ(π| π = log
max
πβπ, Pr πβπ >πΎ
Pr π β π β πΎ Pr[π β π]
We are interested in (with πΎ, but too messy) πΈβ((π», π΅ π» )||π» Γ π΅ π» ) = log max
π Pr[ π»,π΅ π» βπ] Pr[π»Γπ΅ π» βπ]
Approximate max-divergence
πΎ-approximate max-divergence of π from π
πΈβ
πΎ(π| π = log
max
πβπ, Pr πβπ >πΎ
Pr π β π β πΎ Pr[π β π]
We are interested in (with πΎ, but too messy) πΈβ((π», π΅ π» )||π» Γ π΅ π» ) = log max
π Pr[ π»,π΅ π» βπ] Pr[π»Γπ΅ π» βπ]
How much more likely is π΅(π) to relate to π than to a fresh πβ²? Captures the maximum amount of information that an output of an algorithm might reveal about its input
Unifying Concept: Max-Information
ο½ π½β
πΎ π; π = πΈβ πΎ((π, π)||π Γ π)
ο½ We are interested in π½β
πΎ(π»; π΅ π» )
ο½ Theorem: If π½β
πΎ π»; π΅ π»
β€ π then for any π β ππ Γ π
ο½ Pr π», π΅ π»
β π β€ 2π Pr π» Γ π΅ π» β π + πΎ
ο½ So Pr π», π΅ π»
β πΌ β€ 2π max
π§βπ Pr π», π§ β πΌ + πΎ !
[D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Unifying Concept: Max-Information
ο½ π½β
πΎ π; π = πΈβ πΎ((π, π)||π Γ π)
ο½ We are interested in π½β
πΎ(π»; π΅ π» )
ο½ Theorem: If π½β
πΎ π»; π΅ π»
β€ π then for any π β ππ Γ π
ο½ Pr π», π΅ π»
β π β€ 2π Pr π» Γ π΅ π» β π + πΎ
ο½ So Pr π», π΅ π»
β πΌ β€ 2π max
π§βπ Pr π», π§ β πΌ + πΎ !
ο½ Max-Information composes and is closed under post-processing ο½ For π-DP π΅: π½β π΅, π β€ ππ log2 π. Better bounds for π½β
πΎ(π΅, π).
ο½ π½β
πΎ π΅, π β€ log π πΎ
[D., Feldman, Hardt, Pitassi, Reingold, Roth β15]
Bound on worst case approximate max info for any distribution on π-element databases
Abstract is Good
ο½ Focusing on properties is powerful
ο½ Completely universal approach to validity of adaptive analysis
ο½ DP, small description length, low max-information
ο½ Large numbers of arbitrary adaptively chosen computations
ο½ Closure under post-processing and composition
Long Live the Dataset!
ο½ Leaking information slowly prolongs the lifetime of the system ο½ Similar to the situation with privacy for the sake of privacy
ο½ To avoid too much cumulative loss, answer with smaller values of π ο½ Essential: Fundamental Law of Information Leakage
ο½ Overly accurate estimates of too many statistics is blatantly non-private. ο½ Dealerβs choice
ο½ Conjecture: The same is true for adaptivity.
ο½ Failure to control cumulative max-info leads to failure to generalize ο½ Important policy Implications! ο½ Supporting evidence: Hardt-Ullman queries
Thank you!
NIPS Workshop on Adaptive Data Analysis, Montreal, 12/11/15