[PPT] - Controlling False Discovery Rate Privately Weijie Su University of PowerPoint Presentation

SLIDE 1

Controlling False Discovery Rate Privately

Weijie Su

University of Pennsylvania NIPS, Barcelona, December 9, 2016

Joint work with Cynthia Dwork and Li Zhang

SLIDE 2

Living in the Big Data world

2 / 40

SLIDE 3

Privacy loss

3 / 40

SLIDE 4

Privacy loss

Second Netflix challenge canceled
AOL search data leak
Inference presence of individual from minor allele frequencies [Homer et al

’08]

4 / 40

SLIDE 5

This talk: privacy-preserving multiple testing

A hypothesis H could be

Is the SNP associated with diabetes?
Does the drug affect autism?

5 / 40

H1 H2 · · · · · · Hm

SLIDE 6

This talk: privacy-preserving multiple testing

A hypothesis H could be

Is the SNP associated with diabetes?
Does the drug affect autism?

Goal

Preserve privacy
Control false discovery rate (FDR)

5 / 40

H1 H2 · · · · · · Hm

SLIDE 7

This talk: privacy-preserving multiple testing

A hypothesis H could be

Is the SNP associated with diabetes?
Does the drug affect autism?

Goal

Preserve privacy
Control false discovery rate (FDR)

Application

Genome-wide association studies
A/B testing

5 / 40

H1 H2 · · · · · · Hm

SLIDE 8

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 6 / 40

SLIDE 9

Two types of errors

Not reject Reject Total Null is true True negative False positive m0 Null is false False negative True positive m1 Total m

7 / 40

SLIDE 10

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

true model

estimated model

100 200 300

8 / 40

SLIDE 11

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

=

200 100 + 200

true model estimated model

100 200 300

8 / 40

SLIDE 12

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

=

200 100 + 200

true model estimated model

100 200 300

Wish FDR ≤ q (often q = 0.05, 0.1)
Proposed by Benjamini and Hochberg ’95
35,490 citations as of yesterday

8 / 40

SLIDE 13

Why FDR?

9 / 40

SLIDE 14

Why FDR?

9 / 40

SLIDE 15

FDR addresses reproducibility

10 / 40

SLIDE 16

FDR addresses reproducibility

10 / 40

SLIDE 17

How to control FDR?

11 / 40

SLIDE 18

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

Uniform in [0, 1] (or stochastically larger) under true null

12 / 40

SLIDE 19

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

12 / 40

SLIDE 20

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

If p = 0.5, no evidence

12 / 40

SLIDE 21

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

If p = 0.5, no evidence
If p = 0.01, there is evidence!

12 / 40

SLIDE 22

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

If p = 0.5, no evidence
If p = 0.01, there is evidence?

12 / 40

SLIDE 23

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) 13 / 40

SLIDE 24

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

13 / 40

qj/m

SLIDE 25

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

◮ Reject hypotheses below

cutoffs

13 / 40

qj/m

SLIDE 26

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

◮ Reject hypotheses below

cutoffs

◮ Under independence

FDR ≤ q

13 / 40

qj/m

SLIDE 27

What is privacy?

My response had little impact on released results
Any adversary cannot learn much information about me based on released

results

Anonymity may not work
Is the Benjamini-Hochberg procedure (BH) privacy-preserving?

14 / 40

SLIDE 28

BHq is sensitive to perturbations

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

15 / 40

SLIDE 29

BHq is sensitive to perturbations

5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

15 / 40

SLIDE 30

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

16 / 40

SLIDE 31

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

Probability space is over the randomness of M

16 / 40

SLIDE 32

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

Probability space is over the randomness of M
If δ = 0 (pure privacy),

e−ǫ ≤ P(M(D) ∈ S) P(M(D′) ∈ S) ≤ eǫ

16 / 40

SLIDE 33

A concrete foundation of privacy

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

For all neighboring databases D and D′, P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

, d

Bad Responses:

Z Z Z

Pr [response]

(𝜗, 𝜀) if for all adjacent x and x’, and C ⊆ 𝑠𝑏𝑜𝑕𝑓(M) ∈ ≤ (D’) ∈ d Σ Σ d

ratio bounded

𝜀

17 / 40

SLIDE 34

An addition to a vast literature

Counts, linear queries, histograms, contingency tables
Location and spread
Dimension reduction (PCA, SVD), clustering
Support vector machine
Sparse regression, Lasso, logistic regression
Gradient descent
Boosting, multiplicative weights
Combinatorial optimization, mechanism design
Kalman filtering
Statistical queries learning model, PAC learning

18 / 40

SLIDE 35

An addition to a vast literature

Counts, linear queries, histograms, contingency tables
Location and spread
Dimension reduction (PCA, SVD), clustering
Support vector machine
Sparse regression, Lasso, logistic regression
Gradient descent
Boosting, multiplicative weights
Combinatorial optimization, mechanism design
Kalman filtering
Statistical queries learning model, PAC learning
FDR control

18 / 40

SLIDE 36

Laplace noise

Lap(b) has density exp(−|x|/b)/2b

19 / 40

SLIDE 37

Achieving (ǫ, 0)-differential privacy: a vignette

How many members of the House of Representatives voted for Trump?

Sensitivity is 1
Add symmetric noise Lap( 1

ǫ ) to the counts 20 / 40

SLIDE 38

Achieving (ǫ, 0)-differential privacy: a vignette

How many members of the House of Representatives voted for Trump?

Sensitivity is 1
Add symmetric noise Lap( 1

ǫ ) to the counts

How many albums of Taylor Swift are bought in total by people in this room?

Sensitivity is 5
Add symmetric noise Lap( 5

ǫ ) to the counts 20 / 40

SLIDE 39

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 21 / 40

SLIDE 40

Sensitivity of p-values

Additive noise can kill signals when p-values are small
Solution: take logarithm of p-values

22 / 40

SLIDE 41

Sensitivity of p-values

Additive noise can kill signals when p-values are small
Solution: take logarithm of p-values

Databases D and D′ are adjacent.

Definition

Tuples (p1(D), . . . , pm(D)) and (p1(D′), . . . , pm(D′)) are called (η, ν)-multiplicatively sensitive if, for all i,

either pi(D), pi(D′) < ν, or
e−ηpi(D) ≤ p′

i(D′) ≤ eηpi(D)

πi = log max{pi(D), ν} has sensitivity η

22 / 40

SLIDE 42

Examples of multiplicatively Sensitive p-values

iid ξ1, . . . , ξn, taking 1 with probability of α and 0 otherwise. T is the sum. To test H0 : α ≤ 1

2 against H1 : α > 1 2:

p(D) =

n

i=T

1 2n n i

.

Assume m = nC. Then we can take ν = m−2 and η = n− 1

2 +o(1)

23 / 40

SLIDE 43

Building blocks of PrivateBHq

24 / 40

SLIDE 44

Private Min

a.k.a. Report Noisy Min Algorithm 1: Private Min Input: π1, · · · , πm

1: for i = 1 to m do 2:

set π⊗

i = πi + gi where gi is i.i.d. Lap(η

10k log(1/δ)/ǫ)

3: end for 4: return (i⋆ = argmin π⊗

i , π⋆ = πi⋆ + g) where g ∼ Lap(η

10k log(1/δ)/ǫ)
Private Min is (2ǫ/
10k log(1/δ), 0)-differentially privacy
Less noise [Raskhodnikova and Smith ’16]

25 / 40

SLIDE 45

Pre-selection by peeling

Algorithm 2: Peeling Input: π1, · · · , πm and k

1: for j = 1 to k do 2:

run Private Min

3:

remove selected πi⋆

4: end for 5: report k selected pairs (i, ˜

πi)

26 / 40

SLIDE 46

Pre-selection by peeling

Algorithm 2: Peeling Input: π1, · · · , πm and k

1: for j = 1 to k do 2:

run Private Min

3:

remove selected πi⋆

4: end for 5: report k selected pairs (i, ˜

πi)

Lemma

peeling(k) is (ǫ, δ)-differentially private

A simple application of Advanced Composition Theorem [Dwork, Rothblum,

and Vadhan ’10]

26 / 40

SLIDE 47

Finally, PrivateBHq

Algorithm 3: PrivateBHq Input: (η, ν)-sensitive p-values p1, · · · , pm, k ≥ 1 and ǫ, δ Output: a set of up to k rejected hypotheses

1: set πi = log(max{pi, ν}) 2: apply peeling(k) to π1, . . . , πm 3: apply BHq to y1, . . . , yk with cutoffs αj = log(qj/m + ν) + η∆, where

∆ = (1 + o(1))

k log(1/δ) log m/ǫ

27 / 40

SLIDE 48

Finally, PrivateBHq

Algorithm 3: PrivateBHq Input: (η, ν)-sensitive p-values p1, · · · , pm, k ≥ 1 and ǫ, δ Output: a set of up to k rejected hypotheses

1: set πi = log(max{pi, ν}) 2: apply peeling(k) to π1, . . . , πm 3: apply BHq to y1, . . . , yk with cutoffs αj = log(qj/m + ν) + η∆, where

∆ = (1 + o(1))

k log(1/δ) log m/ǫ

Theorem (Dwork, S., and Zhang)

The PrivateBHq is (ǫ, δ)-differentially private

27 / 40

SLIDE 49

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 28 / 40

SLIDE 50

New techniques required

Smallest p-values may not be selected
Difficult to specify the joint distribution of selected p-values
Destroys crucial properties for proving FDR control

29 / 40

SLIDE 51

Compliant procedures

Definition

A procedure is called compliant with {qj}m

j=1 if all the R rejected p-values are

below qR

30 / 40

SLIDE 52

Compliant procedures

Definition

A procedure is called compliant with {qj}m

j=1 if all the R rejected p-values are

below qR

Self-consistency condition [Blanchard and Roquain ’08]
Step-up and step-down BHqs are {jq/m}-compliant
So are the generalized step-up-step-down procedures [Tamhane, Liu, and

Dunnett ’98; Sarkar 02’]

How about the PrivateBHq?

30 / 40

SLIDE 53

PrivateBHq is compliant

Lemma

Given (η, ν)-sensitive p-values with ν = o(1/m), then with probability 1 − o(1), the private FDR-controlling algorithm is compliant with {jq′/m}, where q′ = (1 + o(1))eη∆ · q

31 / 40

SLIDE 54

Compliance + IWS = FDR control

Definition

A set of test statistics are called to satisfy the independence within a subset I0 (IWS on I0), if the test statistics from I0 are jointly independent.

32 / 40

SLIDE 55

Compliance + IWS = FDR control

Definition

A set of test statistics are called to satisfy the independence within a subset I0 (IWS on I0), if the test statistics from I0 are jointly independent.

Theorem

Suppose the test statistics satisfies IWS on the subset of true null hypotheses. Then any procedure compliant with the BHq critical values qj/m obeys FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

1 + 2/
qk
q.
FDRk := E

V

R; V ≥ k

C ≈ 2.7

32 / 40

SLIDE 56

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

1 + 2/
qk
q.
Arbitrary correlations between true null and false null test statistics

33 / 40

SLIDE 57

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

1 + 2/
qk
q.
Arbitrary correlations between true null and false null test statistics
Can be even adversarial!

33 / 40

SLIDE 58

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

1 + 2/
qk
q.
Arbitrary correlations between true null and false null test statistics
Can be even adversarial!
Explains partially why BHq is so robust

33 / 40

SLIDE 59

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

1 + 2/
qk
q.
Arbitrary correlations between true null and false null test statistics
Can be even adversarial!
Explains partially why BHq is so robust
If V → ∞ with probability tending to one, then FDR ≤ q + o(1)

33 / 40

SLIDE 60

Proof Sketch

34 / 40

SLIDE 61

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

nulls.

35 / 40

SLIDE 62

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

nulls. Compliance requires

p0

(V ) ≤ max 1≤j≤R pij ≤ αR = qR/m 35 / 40

SLIDE 63

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

nulls. Compliance requires

p0

(V ) ≤ max 1≤j≤R pij ≤ αR = qR/m

Hence R ≥ ⌈mp0

(V )/q⌉

⇒ V max{R, 1} ≤ V ⌈mp0

(V )/q⌉

⇒FDP ≤ max

2≤j≤m0

j ⌈mp0

(j)/q⌉ + min

1

⌈mp0

(1)/q⌉, 1

m0 is the total number of true nulls

35 / 40

SLIDE 64

Bounding the two terms

Lemma

E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

E min
1

⌈mp0

(1)/q⌉, 1

≤ q log 1

q + C2q for some absolute constants C1 and C2

36 / 40

SLIDE 65

Bounding the two terms

Lemma

E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

E min
1

⌈mp0

(1)/q⌉, 1

≤ q log 1

q + C2q for some absolute constants C1 and C2

Assume m0 = m

36 / 40

SLIDE 66

Bounding the two terms

Lemma

E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

E min
1

⌈mp0

(1)/q⌉, 1

≤ q log 1

q + C2q for some absolute constants C1 and C2

Assume m0 = m
Assume all true null p-values are iid uniform on [0, 1]

36 / 40

SLIDE 67

Bounding the two terms

Lemma

E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q

E min
1

⌈mU(1)/q⌉, 1

≤ q log 1

q + C2q for some absolute constants C1 and C2

Assume m0 = m
Assume all true null p-values are iid uniform on [0, 1]
Let U1, U2, . . . , Um be iid and uniform on [0, 1]

36 / 40

SLIDE 68

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q

37 / 40

SLIDE 69

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q Let ξ1, . . . , ξm+1 be iid exponential random variables (U(1), U(2), . . . , U(m))

d

= T1 Tm+1 , T2 Tm+1 , . . . , Tm Tm+1

Tj = ξ1 + · · · + ξj

37 / 40

SLIDE 70

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q Let ξ1, . . . , ξm+1 be iid exponential random variables (U(1), U(2), . . . , U(m))

d

= T1 Tm+1 , T2 Tm+1 , . . . , Tm Tm+1

Tj = ξ1 + · · · + ξj
j

⌈mU(j)/q⌉ ≤ qj mU(j) = q m · jTm+1 Tj ≡ q m · Wj

Wj ≡ jTm+1/Tj

37 / 40

SLIDE 71

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1

38 / 40

SLIDE 72

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1 By martingale theory E max

2≤j≤m

Wj m ≤ (1 − e−1)−1

1 + E

W2 m log W2 m ; W2 m ≥ 1

38 / 40

SLIDE 73

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1 By martingale theory E max

2≤j≤m

Wj m ≤ (1 − e−1)−1

1 + E

W2 m log W2 m ; W2 m ≥ 1

≤ (1 − e−1)−1
1 + E
2

mU(2) log 2 mU(2) ; 2 mU(2) ≥ 1

≤ C1

38 / 40

SLIDE 74

Summary

39 / 40

SLIDE 75

Take-home message

FDR addresses reproducibility
Differential privacy is a rigorous definition
Privatize BH by adding noise in peeling
A bonus: Compliance with IWS gives FDR control

40 / 40

SLIDE 76

Take-home message

FDR addresses reproducibility
Differential privacy is a rigorous definition
Privatize BH by adding noise in peeling
A bonus: Compliance with IWS gives FDR control

Thank You!

40 / 40