Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research

𝑟 2 : muffin tops? Adaptive Data Analysis 𝑟 1 : > 6 ft? q 1 a 1 q 2 M 𝑟 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database 𝑇 ∼ 𝐸 data analyst  𝑟 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 𝑗−1  Worry: analyst finds a query for which the dataset is not representative of population; reports surprising discovery

𝑟 2 : muffin tops? Differential Privacy for Adaptive Validity 𝑟 1 : > 6 ft? q 1 a 1 q 2 DP DP 𝑟 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  𝑟 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 𝑗−1  Differential privacy neutralizes risks incurred by adaptivity  Definition of privacy tailored to statistical analysis of large data sets [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

𝑟 2 : muffin tops? Differential Privacy for Adaptive Validity 𝑟 1 : > 6 ft? q 1 a 1 q 2 DP DP 𝑟 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  𝑟 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 𝑗−1  Differential privacy neutralizes risks incurred by adaptivity  ∃ LARGE literature on DP algorithms for data analysis [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

Some Intuition  Fix a query, eg , “What fraction of population is over 6 feet tall ?”  Almost all large datasets will give an approximately correct reply  Most datasets are representative with respect to this query  If, in the process of adaptive exploration, the analyst finds a query for which the dataset is not representative, then she must have “learned something significant” about the dataset.  Preserving the “privacy” of the data may prevent over-fitting.

Intuition After Nati’s Talk  Differential Privacy: The outcome of any analysis is essentially equally likely, independent of whether any individual joins, or refrains from joining, the dataset.  This is a stability requirement.  Gave rise to the folklore that differential privacy yields generalizability.  But we will be able to say something stronger.

𝑟 2 : muffin tops? 𝑟 1 : > 6 ft? q 1 a 1 q 2 DP M 𝑟 2 : muffin bottoms? a 2 𝑇 q 3 a 3 Database data analyst  𝑟 𝑗 depends on 𝑏 1 , 𝑏 2 , … , 𝑏 𝑗−1  Differential privacy neutralizes risks incurred by adaptivity  E.g., for statistical queries: whp 𝐹 𝑇 𝐵 𝑇 − 𝐹 𝑄 𝐵 𝑇 < 𝜐  High probability is important for handling many queries [D., Feldman, Hardt, Pitassi , Reingold, Roth ’14]

Formalization Choose new query based on history of  Data sets 𝑇 ∈ 𝑌 𝑜 ; 𝑇 ∼ 𝐸 observations  Queries 𝑟: 𝑌 𝑜 → 𝑍  Algorithms that choose queries and output results Output chosen query  𝐵 1 = 𝑟 1 (trivial choice), outputs (𝑟 1 , 𝑟 1 (𝑇)) and its response on 𝑇  𝐵 𝑗 : 𝑌 𝑜 × 𝑍 1 × ⋯ × 𝑍 𝑗−1 → 𝑍 𝑗 where  𝑟 𝑗 = 𝐷 𝑗 (𝑧 1 , … , 𝑧 𝑗−1 )  𝐵 𝑗 𝑇, 𝑧 1 , … , 𝑧 𝑗−1 = 𝑟 𝑗 , 𝑟 𝑗 𝑇 = (𝑟 𝑗 , 𝑏 𝑗 ) q 1 a 1  𝐼 ≝ 𝑇, 𝑟 𝑟 𝑇 not representative wrt 𝐸} q 2 a 2  ∀ 𝑧 1 , … , 𝑧 𝑗−1 Pr 𝑇, 𝑟 𝑗 ∈ 𝐼 ≤ 𝛾 𝑗 q 3 𝑇 a 3  We want: Pr[ 𝑻, 𝐷 𝑗 𝑻 ∈ 𝐼] to be similar  𝑟 𝑗 (𝑇) should generalize even when 𝑟 𝑗 chosen as a function of 𝑇 𝑟 𝑗 (𝑇) fails to generalize

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] 𝑁 gives 𝜗 -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events 𝑈 Pr 𝑁 𝑇 ∈ 𝑈 ≤ 𝑓 𝜗 Pr 𝑁 𝑇′ ∈ 𝑈 Randomness introduced by 𝑁

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] 𝑁 gives 𝜗 -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events 𝑈 Pr 𝑁 𝑇 ∈ 𝑈 ≤ 𝑓 𝜗 Pr 𝑁 𝑇′ ∈ 𝑈 For random variables 𝒀, 𝒁 over Χ, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] x∈𝑌 Then 𝜗 -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 ′ ) ≤ 𝜗 .

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] 𝑁 gives 𝜗 -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events 𝑈 Pr 𝑁 𝑇 ∈ 𝑈 ≤ 𝑓 𝜗 Pr 𝑁 𝑇′ ∈ 𝑈 For random variables 𝒀, 𝒁 over Χ, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] x∈𝑌 Then 𝜗 -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 ′ ) ≤ 𝜗 . Closed under post-processing: 𝐸 ∞ 𝐵(𝑁 𝑇 )||𝐵(𝑁 𝑇 ′ ) ≤ 𝜗 .

Differential Privacy [D.,McSherry,Nissim,Smith ‘06] 𝑁 gives 𝜗 -differential privacy if for all pairs of adjacent data sets 𝑇, 𝑇′ , and all events 𝑈 Pr 𝑁 𝑇 ∈ 𝑈 ≤ 𝑓 𝜗 Pr 𝑁 𝑇′ ∈ 𝑈 For random variables 𝒀, 𝒁 over Χ, the max-divergence of 𝒀 from 𝒁 is given by Pr[𝒀 = 𝑦] 𝐸 ∞ (𝒀| 𝒁 = log max Pr[𝒁 = 𝑦] x∈𝑌 Then 𝜗 -DP equivalent to 𝐸 ∞ 𝑁 𝑇 ||𝑁(𝑇 ′ ) ≤ 𝜗 . Group Privacy: ∀𝑇, 𝑇 ′′ 𝐸 ∞ 𝑁 𝑇 ||𝑁 𝑇 ′ ≤ Δ 𝑇, 𝑇 ′′ 𝜗 .

Properties  Closed under post-processing  Max-divergence remains bounded  Automatically yields group privacy  𝑙𝜗 for groups of size 𝑙  Understand behavior under adaptive composition  Can bound cumulative privacy loss over multiple analyses  “The epsilons add up”  Programmable  Complicated private analyses from simple private building blocks

The Power of Composition  Lemma: The choice of 𝑟 𝑗 is differentially private.  Closure under post-processing.  Inductive step (key): If 𝑟 is chosen in a differentially private fashion with respect to 𝑇 , then Pr[ 𝑻, 𝐷(𝑻) ∈ 𝐼] is small  Sufficiency: union bound. q 1 a 1 q 2 DP M a 2 𝑇 q 3 a 3 Database data analyst

Description Length  Let 𝐵: 𝑌 𝑜 → 𝑍 .  Description length of 𝐵 is the cardinality of its range If ∀𝑧 Pr 𝑇 𝑇, 𝑧 ∈ 𝐼 ≤ 𝛾 , then Pr 𝑇, 𝐵 𝑇 ∈ 𝐼 ≤ 𝑍 ⋅ 𝛾 S  Description length composes too.  Product: 𝛾 ⋅ Π 𝑗 |𝑍 𝑗 |  And, morally, it is closed under post-processing  Once you fix the randomness of the post-processing algorithm [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ 𝑈 − 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ 𝑈] 𝑈∈𝑌, Pr 𝒀∈𝑈 >𝛾

Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ 𝑈 − 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ 𝑈] 𝑈∈𝑌, Pr 𝒀∈𝑈 >𝛾 We are interested in (with 𝛾 , but too messy) Pr[ 𝑻,𝐵 𝑻 ∈𝑈] 𝐸 ∞ ((𝑻, 𝐵 𝑻 )||𝑻 × 𝐵 𝑻 ) = log max Pr[𝑻×𝐵 𝑻 ∈𝑈] 𝑈

Approximate max-divergence 𝛾 -approximate max-divergence of 𝒀 from 𝒁 Pr 𝒀 ∈ 𝑈 − 𝛾 𝛾 (𝒀| 𝒁 = log 𝐸 ∞ max Pr[𝒁 ∈ 𝑈] 𝑈∈𝑌, Pr 𝒀∈𝑈 >𝛾 We are interested in (with 𝛾 , but too messy) Pr[ 𝑻,𝐵 𝑻 ∈𝑈] 𝐸 ∞ ((𝑻, 𝐵 𝑻 )||𝑻 × 𝐵 𝑻 ) = log max Pr[𝑻×𝐵 𝑻 ∈𝑈] 𝑈 How much more likely is 𝐵(𝑇) to relate to 𝑇 than to a fresh 𝑇′ ? Captures the maximum amount of information that an output of an algorithm might reveal about its input

Unifying Concept: Max-Information 𝛾 𝒀; 𝒁 = 𝐸 ∞ 𝛾 ((𝒀, 𝒁)||𝒀 × 𝒁)  𝐽 ∞ 𝛾 (𝑻; 𝐵 𝑻 )  We are interested in 𝐽 ∞ 𝛾 𝑻; 𝐵 𝑻 ≤ 𝑙 then for any 𝑈 ⊆ 𝑌 𝑜 × 𝑍  Theorem: If 𝐽 ∞ ∈ 𝑈 ≤ 2 𝑙 Pr 𝑻 × 𝐵 𝑻 ∈ 𝑈 + 𝛾  Pr 𝑻, 𝐵 𝑻 ∈ 𝐼 ≤ 2 𝑙 max  So Pr 𝑻, 𝐵 𝑻 𝑧∈𝑍 Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 ! [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

Unifying Concept: Max-Information 𝛾 𝒀; 𝒁 = 𝐸 ∞ 𝛾 ((𝒀, 𝒁)||𝒀 × 𝒁)  𝐽 ∞ 𝛾 (𝑻; 𝐵 𝑻 )  We are interested in 𝐽 ∞ 𝛾 𝑻; 𝐵 𝑻 ≤ 𝑙 then for any 𝑈 ⊆ 𝑌 𝑜 × 𝑍  Theorem: If 𝐽 ∞ ∈ 𝑈 ≤ 2 𝑙 Pr 𝑻 × 𝐵 𝑻 ∈ 𝑈 + 𝛾  Pr 𝑻, 𝐵 𝑻 ∈ 𝐼 ≤ 2 𝑙 max  So Pr 𝑻, 𝐵 𝑻 𝑧∈𝑍 Pr 𝑻, 𝑧 ∈ 𝐼 + 𝛾 !  Max-Information composes and is closed under post-processing 𝛾 (𝐵, 𝑜) .  For 𝜗 -DP 𝐵 : 𝐽 ∞ 𝐵, 𝑜 ≤ 𝜗𝑜 log 2 𝑓 . Better bounds for 𝐽 ∞ 𝛾 𝐵, 𝑜 ≤ log 𝑍  𝐽 ∞ 𝛾 Bound on worst case approximate max info for any distribution on 𝑜 -element databases [D., Feldman, Hardt, Pitassi, Reingold, Roth ’15]

Abstract is Good  Focusing on properties is powerful  Completely universal approach to validity of adaptive analysis  DP, small description length, low max-information  Large numbers of arbitrary adaptively chosen computations  Closure under post-processing and composition

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research - PowerPoint PPT Presentation

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops? Adaptive Data Analysis 1 : > 6 ft? q 1 a 1 q 2 M 2 : muffin bottoms? a 2 q 3 a 3 Database data analyst

Iteration hypotheses and the strong sealing of universally Baire sets W. Hugh Woodin Harvard

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Generic absoluteness and universally Baire sets of reals Trevor Wilson Miami University, Ohio

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data

Adaptive Management: Adaptive Management: Science, Management, or What? Science, Management, or

From passivity-based adaptive control to LMI tuned adaptive control or how Alexander Fradkov

Group Sequential and Adaptive Designs Part II: Adaptive Designs May 2, 2015 Cyrus Mehta, Ph.D.

A Framework for Comparing Models for Adaptive Testing Jill-Jnn Vie February 19, 2016 Models

Better 2-round adaptive MPC Ran Canetti, Oxana Poburinnaya TAU and BU BU Adaptive Security of

Adaptive Distributed Distributed Traffic Traffic Adaptive Adaptive Distributed Traffic Control

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Menno Veldhorst Operations on spin qubits 1 Last time from transistor Now quantum dot qubits

Charm physics and XYZ states at BESIII Evgeny BOGER JINR Dubna On behalf of BESIII

LiquidO: an appetizer Anatael Cabrera, Jeff Hartnell and J. Pedro Ochoa-Ricoux* * for the LiquidO

Tidy data CLEAN IN G DATA IN P YTH ON Daniel Chen Instructor Tidy data Tidy Data paper

Evaluating compositionality in sentences embeddings Ishita Dasgupta Harvard University,

Enabling large scale LAPW DFT calculations by a scalable iterative eigensolver CSE15, Salt Lake

Better Together Martin Bravenboer LogicBlox Yannis Smaragdakis UMass Amherst ISSTA 2009

Learning when to use a Decomposition Markus Kruber Marco L ubbecke Axel Parmentier Chair