Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - - PowerPoint PPT Presentation

β–Ά
universally adaptive data analysis
SMART_READER_LITE
LIVE PREVIEW

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - - PowerPoint PPT Presentation

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops? Adaptive Data Analysis 1 : > 6 ft? q 1 a 1 q 2 M 2 : muffin bottoms? a 2 q 3 a 3 Database data analyst


slide-1
SLIDE 1

Universally Adaptive Data Analysis

Cynthia Dwork, Microsoft Research

slide-2
SLIDE 2

Adaptive Data Analysis

 π‘Ÿπ‘— depends on 𝑏1, 𝑏2, … , π‘π‘—βˆ’1  Worry: analyst finds a query for which the dataset is not

representative of population; reports surprising discovery

q1 a1

Database 𝑇 ∼ 𝐸 data analyst

M

q2 a2 q3 a3

𝑇 π‘Ÿ1: > 6 ft? π‘Ÿ2: muffin tops? π‘Ÿ2: muffin bottoms?

slide-3
SLIDE 3

Differential Privacy for Adaptive Validity

 π‘Ÿπ‘— depends on 𝑏1, 𝑏2, … , π‘π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity

 Definition of privacy tailored to statistical analysis of large data sets

q1 a1

Database data analyst

DP

q2 a2 q3 a3

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’14] DP

𝑇 π‘Ÿ1: > 6 ft? π‘Ÿ2: muffin tops? π‘Ÿ2: muffin bottoms?

slide-4
SLIDE 4

Differential Privacy for Adaptive Validity

 π‘Ÿπ‘— depends on 𝑏1, 𝑏2, … , π‘π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity

 βˆƒ LARGE literature on DP algorithms for data analysis

q1 a1

Database data analyst

DP

q2 a2 q3 a3

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’14] DP

𝑇 π‘Ÿ1: > 6 ft? π‘Ÿ2: muffin tops? π‘Ÿ2: muffin bottoms?

slide-5
SLIDE 5

Some Intuition

 Fix a query, eg, β€œWhat fraction of population is over 6 feet tall?”  Almost all large datasets will give an approximately correct reply

 Most datasets are representative with respect to this query

 If, in the process of adaptive exploration, the analyst finds a

query for which the dataset is not representative, then she must have β€œlearned something significant” about the dataset.

 Preserving the β€œprivacy” of the data may prevent over-fitting.

slide-6
SLIDE 6

Intuition After Nati’s Talk

 Differential Privacy: The outcome of any analysis is essentially equally

likely, independent of whether any individual joins, or refrains from joining, the dataset.

 This is a stability requirement.  Gave rise to the folklore that differential privacy yields generalizability.  But we will be able to say something stronger.

slide-7
SLIDE 7

 π‘Ÿπ‘— depends on 𝑏1, 𝑏2, … , π‘π‘—βˆ’1  Differential privacy neutralizes risks incurred by adaptivity

 E.g., for statistical queries: whp 𝐹𝑇 𝐡 𝑇

βˆ’ 𝐹𝑄 𝐡 𝑇 < 𝜐

 High probability is important for handling many queries

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’14]

q1 a1

Database data analyst

M

q2 a2 q3 a3

DP

π‘Ÿ1: > 6 ft? π‘Ÿ2: muffin tops? π‘Ÿ2: muffin bottoms? 𝑇

slide-8
SLIDE 8

π‘Ÿπ‘—(𝑇) fails to generalize

Formalization

 Data sets 𝑇 ∈ π‘Œπ‘œ; 𝑇 ∼ 𝐸  Queries π‘Ÿ: π‘Œπ‘œ β†’ 𝑍  Algorithms that choose queries and output results

 𝐡1 = π‘Ÿ1 (trivial choice), outputs (π‘Ÿ1, π‘Ÿ1(𝑇))

 𝐡𝑗: π‘Œπ‘œ Γ— 𝑍

1 Γ— β‹― Γ— 𝑍 π‘—βˆ’1 β†’ 𝑍 𝑗 where  π‘Ÿπ‘— = 𝐷𝑗(𝑧1, … , π‘§π‘—βˆ’1)  𝐡𝑗 𝑇, 𝑧1, … , π‘§π‘—βˆ’1 = π‘Ÿπ‘—, π‘Ÿπ‘— 𝑇

= (π‘Ÿπ‘—, 𝑏𝑗)

 𝐼 ≝

𝑇, π‘Ÿ π‘Ÿ 𝑇 not representative wrt 𝐸}

 βˆ€ 𝑧1, … , π‘§π‘—βˆ’1 Pr

𝑇

𝑇, π‘Ÿπ‘— ∈ 𝐼 ≀ 𝛾𝑗

 We want: Pr[ 𝑻, 𝐷𝑗 𝑻

∈ 𝐼] to be similar

 π‘Ÿπ‘—(𝑇) should generalize even when π‘Ÿπ‘— chosen as a function of 𝑇

Choose new query based on history of

  • bservations

Output chosen query and its response on 𝑇

q1 a1 q2 a2 q3 a3

slide-9
SLIDE 9

Differential Privacy [D.,McSherry,Nissim,Smith β€˜06]

𝑁 gives πœ—-differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′, and all events π‘ˆ Pr 𝑁 𝑇 ∈ π‘ˆ ≀ π‘“πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ Randomness introduced by 𝑁

slide-10
SLIDE 10

Differential Privacy [D.,McSherry,Nissim,Smith β€˜06]

𝑁 gives πœ—-differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′, and all events π‘ˆ

For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by 𝐸∞(𝒀| 𝒁 = log max

xβˆˆπ‘Œ

Pr[𝒀 = 𝑦] Pr[𝒁 = 𝑦] Then πœ—-DP equivalent to 𝐸∞ 𝑁 𝑇 ||𝑁(𝑇′) ≀ πœ—.

Pr 𝑁 𝑇 ∈ π‘ˆ ≀ π‘“πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ

slide-11
SLIDE 11

Differential Privacy [D.,McSherry,Nissim,Smith β€˜06]

𝑁 gives πœ—-differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′, and all events π‘ˆ

For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by 𝐸∞(𝒀| 𝒁 = log max

xβˆˆπ‘Œ

Pr[𝒀 = 𝑦] Pr[𝒁 = 𝑦] Then πœ—-DP equivalent to 𝐸∞ 𝑁 𝑇 ||𝑁(𝑇′) ≀ πœ—. Closed under post-processing: 𝐸∞ 𝐡(𝑁 𝑇 )||𝐡(𝑁 𝑇′ ) ≀ πœ—.

Pr 𝑁 𝑇 ∈ π‘ˆ ≀ π‘“πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ

slide-12
SLIDE 12

Differential Privacy [D.,McSherry,Nissim,Smith β€˜06]

𝑁 gives πœ—-differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′, and all events π‘ˆ

For random variables 𝒀, 𝒁 over Ξ§, the max-divergence of 𝒀 from 𝒁 is given by 𝐸∞(𝒀| 𝒁 = log max

xβˆˆπ‘Œ

Pr[𝒀 = 𝑦] Pr[𝒁 = 𝑦] Then πœ—-DP equivalent to 𝐸∞ 𝑁 𝑇 ||𝑁(𝑇′) ≀ πœ—. Group Privacy: βˆ€π‘‡, 𝑇′′ 𝐸∞ 𝑁 𝑇 ||𝑁 𝑇′ ≀ Ξ” 𝑇, 𝑇′′ πœ—.

Pr 𝑁 𝑇 ∈ π‘ˆ ≀ π‘“πœ— Pr 𝑁 𝑇′ ∈ π‘ˆ

slide-13
SLIDE 13

Properties

 Closed under post-processing

 Max-divergence remains bounded

 Automatically yields group privacy

 π‘™πœ— for groups of size 𝑙

 Understand behavior under adaptive composition

 Can bound cumulative privacy loss over multiple analyses

 β€œThe epsilons add up”

 Programmable

 Complicated private analyses from simple private building blocks

slide-14
SLIDE 14

The Power of Composition

 Lemma: The choice of π‘Ÿπ‘— is differentially private.

 Closure under post-processing.

 Inductive step (key): If π‘Ÿ is chosen in a differentially private

fashion with respect to 𝑇, then Pr[ 𝑻, 𝐷(𝑻) ∈ 𝐼] is small

 Sufficiency: union bound.

q1 a1

Database data analyst

M

q2 a2 q3 a3

DP

𝑇

slide-15
SLIDE 15

Description Length

 Let 𝐡: π‘Œπ‘œ β†’ 𝑍.  Description length of 𝐡 is the cardinality of its range

If βˆ€π‘§ Pr𝑇 𝑇, 𝑧 ∈ 𝐼 ≀ 𝛾, then Pr

S

𝑇, 𝐡 𝑇 ∈ 𝐼 ≀ 𝑍 β‹… 𝛾

 Description length composes too.

 Product: 𝛾 β‹… Π𝑗|𝑍

𝑗|

 And, morally, it is closed under post-processing

 Once you fix the randomness of the post-processing algorithm

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

slide-16
SLIDE 16

Approximate max-divergence

𝛾-approximate max-divergence of 𝒀 from 𝒁

𝐸∞

𝛾(𝒀| 𝒁 = log

max

π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾

Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 Pr[𝒁 ∈ π‘ˆ]

slide-17
SLIDE 17

Approximate max-divergence

𝛾-approximate max-divergence of 𝒀 from 𝒁

𝐸∞

𝛾(𝒀| 𝒁 = log

max

π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾

Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 Pr[𝒁 ∈ π‘ˆ]

We are interested in (with 𝛾, but too messy) 𝐸∞((𝑻, 𝐡 𝑻 )||𝑻 Γ— 𝐡 𝑻 ) = log max

π‘ˆ Pr[ 𝑻,𝐡 𝑻 βˆˆπ‘ˆ] Pr[𝑻×𝐡 𝑻 βˆˆπ‘ˆ]

slide-18
SLIDE 18

Approximate max-divergence

𝛾-approximate max-divergence of 𝒀 from 𝒁

𝐸∞

𝛾(𝒀| 𝒁 = log

max

π‘ˆβˆˆπ‘Œ, Pr π’€βˆˆπ‘ˆ >𝛾

Pr 𝒀 ∈ π‘ˆ βˆ’ 𝛾 Pr[𝒁 ∈ π‘ˆ]

We are interested in (with 𝛾, but too messy) 𝐸∞((𝑻, 𝐡 𝑻 )||𝑻 Γ— 𝐡 𝑻 ) = log max

π‘ˆ Pr[ 𝑻,𝐡 𝑻 βˆˆπ‘ˆ] Pr[𝑻×𝐡 𝑻 βˆˆπ‘ˆ]

How much more likely is 𝐡(𝑇) to relate to 𝑇 than to a fresh 𝑇′? Captures the maximum amount of information that an output of an algorithm might reveal about its input

slide-19
SLIDE 19

Unifying Concept: Max-Information

 𝐽∞

𝛾 𝒀; 𝒁 = 𝐸∞ 𝛾((𝒀, 𝒁)||𝒀 Γ— 𝒁)

 We are interested in 𝐽∞

𝛾(𝑻; 𝐡 𝑻 )

 Theorem: If 𝐽∞

𝛾 𝑻; 𝐡 𝑻

≀ 𝑙 then for any π‘ˆ βŠ† π‘Œπ‘œ Γ— 𝑍

 Pr 𝑻, 𝐡 𝑻

∈ π‘ˆ ≀ 2𝑙 Pr 𝑻 Γ— 𝐡 𝑻 ∈ π‘ˆ + 𝛾

 So Pr 𝑻, 𝐡 𝑻

∈ 𝐼 ≀ 2𝑙 max

π‘§βˆˆπ‘ Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 !

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

slide-20
SLIDE 20

Unifying Concept: Max-Information

 𝐽∞

𝛾 𝒀; 𝒁 = 𝐸∞ 𝛾((𝒀, 𝒁)||𝒀 Γ— 𝒁)

 We are interested in 𝐽∞

𝛾(𝑻; 𝐡 𝑻 )

 Theorem: If 𝐽∞

𝛾 𝑻; 𝐡 𝑻

≀ 𝑙 then for any π‘ˆ βŠ† π‘Œπ‘œ Γ— 𝑍

 Pr 𝑻, 𝐡 𝑻

∈ π‘ˆ ≀ 2𝑙 Pr 𝑻 Γ— 𝐡 𝑻 ∈ π‘ˆ + 𝛾

 So Pr 𝑻, 𝐡 𝑻

∈ 𝐼 ≀ 2𝑙 max

π‘§βˆˆπ‘ Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 !

 Max-Information composes and is closed under post-processing  For πœ—-DP 𝐡: 𝐽∞ 𝐡, π‘œ ≀ πœ—π‘œ log2 𝑓. Better bounds for 𝐽∞

𝛾(𝐡, π‘œ).

 𝐽∞

𝛾 𝐡, π‘œ ≀ log 𝑍 𝛾

[D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

Bound on worst case approximate max info for any distribution on π‘œ-element databases

slide-21
SLIDE 21

Abstract is Good

 Focusing on properties is powerful

 Completely universal approach to validity of adaptive analysis

 DP, small description length, low max-information

 Large numbers of arbitrary adaptively chosen computations

 Closure under post-processing and composition

slide-22
SLIDE 22

Long Live the Dataset!

 Leaking information slowly prolongs the lifetime of the system  Similar to the situation with privacy for the sake of privacy

 To avoid too much cumulative loss, answer with smaller values of πœ—  Essential: Fundamental Law of Information Leakage

 Overly accurate estimates of too many statistics is blatantly non-private.  Dealer’s choice

 Conjecture: The same is true for adaptivity.

 Failure to control cumulative max-info leads to failure to generalize  Important policy Implications!  Supporting evidence: Hardt-Ullman queries

slide-23
SLIDE 23

Thank you!

NIPS Workshop on Adaptive Data Analysis, Montreal, 12/11/15