Controlling False Discovery Rate Privately Weijie Su University of - - PowerPoint PPT Presentation

controlling false discovery rate privately
SMART_READER_LITE
LIVE PREVIEW

Controlling False Discovery Rate Privately Weijie Su University of - - PowerPoint PPT Presentation

Controlling False Discovery Rate Privately Weijie Su University of Pennsylvania NIPS, Barcelona, December 9, 2016 Joint work with Cynthia Dwork and Li Zhang Living in the Big Data world 2 / 40 Privacy loss 3 / 40 Privacy loss Second


slide-1
SLIDE 1

Controlling False Discovery Rate Privately

Weijie Su

University of Pennsylvania NIPS, Barcelona, December 9, 2016

Joint work with Cynthia Dwork and Li Zhang

slide-2
SLIDE 2

Living in the Big Data world

2 / 40

slide-3
SLIDE 3

Privacy loss

3 / 40

slide-4
SLIDE 4

Privacy loss

  • Second Netflix challenge canceled
  • AOL search data leak
  • Inference presence of individual from minor allele frequencies [Homer et al

’08]

4 / 40

slide-5
SLIDE 5

This talk: privacy-preserving multiple testing

A hypothesis H could be

  • Is the SNP associated with diabetes?
  • Does the drug affect autism?

5 / 40

H1 H2 · · · · · · Hm

slide-6
SLIDE 6

This talk: privacy-preserving multiple testing

A hypothesis H could be

  • Is the SNP associated with diabetes?
  • Does the drug affect autism?

Goal

  • Preserve privacy
  • Control false discovery rate (FDR)

5 / 40

H1 H2 · · · · · · Hm

slide-7
SLIDE 7

This talk: privacy-preserving multiple testing

A hypothesis H could be

  • Is the SNP associated with diabetes?
  • Does the drug affect autism?

Goal

  • Preserve privacy
  • Control false discovery rate (FDR)

Application

  • Genome-wide association studies
  • A/B testing

5 / 40

H1 H2 · · · · · · Hm

slide-8
SLIDE 8

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 6 / 40

slide-9
SLIDE 9

Two types of errors

Not reject Reject Total Null is true True negative False positive m0 Null is false False negative True positive m1 Total m

7 / 40

slide-10
SLIDE 10

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

  • true model

estimated model

100 200 300

8 / 40

slide-11
SLIDE 11

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

  • =

200 100 + 200

true model estimated model

100 200 300

8 / 40

slide-12
SLIDE 12

False discovery rate (FDR)

FDR := E #false discoveries #discoveries

  • =

200 100 + 200

true model estimated model

100 200 300

  • Wish FDR ≤ q (often q = 0.05, 0.1)
  • Proposed by Benjamini and Hochberg ’95
  • 35,490 citations as of yesterday

8 / 40

slide-13
SLIDE 13

Why FDR?

9 / 40

slide-14
SLIDE 14

Why FDR?

9 / 40

slide-15
SLIDE 15

FDR addresses reproducibility

10 / 40

slide-16
SLIDE 16

FDR addresses reproducibility

10 / 40

slide-17
SLIDE 17

How to control FDR?

11 / 40

slide-18
SLIDE 18

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

  • Uniform in [0, 1] (or stochastically larger) under true null

12 / 40

slide-19
SLIDE 19

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

  • Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

12 / 40

slide-20
SLIDE 20

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

  • Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

  • If p = 0.5, no evidence

12 / 40

slide-21
SLIDE 21

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

  • Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

  • If p = 0.5, no evidence
  • If p = 0.01, there is evidence!

12 / 40

slide-22
SLIDE 22

p-values of hypotheses

p-value

The probability of finding the observed, or more extreme, results when the null hypothesis of a study question is true

  • Uniform in [0, 1] (or stochastically larger) under true null

H0: the drug does not lower blood pressure

  • If p = 0.5, no evidence
  • If p = 0.01, there is evidence?

12 / 40

slide-23
SLIDE 23

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) 13 / 40

slide-24
SLIDE 24

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

13 / 40

qj/m

slide-25
SLIDE 25

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

◮ Reject hypotheses below

cutoffs

13 / 40

qj/m

slide-26
SLIDE 26

Benjamini-Hochberg procedure (BHq)

Let p1, p2, . . . , pm be p-values of m hypotheses

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

◮ Sort p(1) ≤ · · · ≤ p(m) ◮ Draw rank-dependent

threshold qj/m

◮ Reject hypotheses below

cutoffs

◮ Under independence

FDR ≤ q

13 / 40

qj/m

slide-27
SLIDE 27

What is privacy?

  • My response had little impact on released results
  • Any adversary cannot learn much information about me based on released

results

  • Anonymity may not work
  • Is the Benjamini-Hochberg procedure (BH) privacy-preserving?

14 / 40

slide-28
SLIDE 28

BHq is sensitive to perturbations

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

15 / 40

slide-29
SLIDE 29

BHq is sensitive to perturbations

  • 5

10 15 20 0.0 0.2 0.4 0.6 0.8 1.0

sorted index p−values

15 / 40

slide-30
SLIDE 30

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

16 / 40

slide-31
SLIDE 31

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

  • Probability space is over the randomness of M

16 / 40

slide-32
SLIDE 32

A concrete foundation of privacy

Let M be a (random) data-releasing mechanism

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

M is called (ǫ, δ)-differentially private if for all databases D and D′ differing with one individual, and all S ⊂ Range(M), P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

  • Probability space is over the randomness of M
  • If δ = 0 (pure privacy),

e−ǫ ≤ P(M(D) ∈ S) P(M(D′) ∈ S) ≤ eǫ

16 / 40

slide-33
SLIDE 33

A concrete foundation of privacy

Differential privacy (Dwork, McSherry, Nissim, Smith ’06)

For all neighboring databases D and D′, P(M(D) ∈ S) ≤ eǫ P(M(D′) ∈ S) + δ

, d

Bad Responses:

Z Z Z

Pr [response]

(𝜗, 𝜀) if for all adjacent x and x’, and C ⊆ 𝑠𝑏𝑜𝑕𝑓(M) ∈ ≤ (D’) ∈ d Σ Σ d

ratio bounded

𝜀

17 / 40

slide-34
SLIDE 34

An addition to a vast literature

  • Counts, linear queries, histograms, contingency tables
  • Location and spread
  • Dimension reduction (PCA, SVD), clustering
  • Support vector machine
  • Sparse regression, Lasso, logistic regression
  • Gradient descent
  • Boosting, multiplicative weights
  • Combinatorial optimization, mechanism design
  • Kalman filtering
  • Statistical queries learning model, PAC learning

18 / 40

slide-35
SLIDE 35

An addition to a vast literature

  • Counts, linear queries, histograms, contingency tables
  • Location and spread
  • Dimension reduction (PCA, SVD), clustering
  • Support vector machine
  • Sparse regression, Lasso, logistic regression
  • Gradient descent
  • Boosting, multiplicative weights
  • Combinatorial optimization, mechanism design
  • Kalman filtering
  • Statistical queries learning model, PAC learning
  • FDR control

18 / 40

slide-36
SLIDE 36

Laplace noise

Lap(b) has density exp(−|x|/b)/2b

19 / 40

slide-37
SLIDE 37

Achieving (ǫ, 0)-differential privacy: a vignette

How many members of the House of Representatives voted for Trump?

  • Sensitivity is 1
  • Add symmetric noise Lap( 1

ǫ ) to the counts 20 / 40

slide-38
SLIDE 38

Achieving (ǫ, 0)-differential privacy: a vignette

How many members of the House of Representatives voted for Trump?

  • Sensitivity is 1
  • Add symmetric noise Lap( 1

ǫ ) to the counts

How many albums of Taylor Swift are bought in total by people in this room?

  • Sensitivity is 5
  • Add symmetric noise Lap( 5

ǫ ) to the counts 20 / 40

slide-39
SLIDE 39

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 21 / 40

slide-40
SLIDE 40

Sensitivity of p-values

  • Additive noise can kill signals when p-values are small
  • Solution: take logarithm of p-values

22 / 40

slide-41
SLIDE 41

Sensitivity of p-values

  • Additive noise can kill signals when p-values are small
  • Solution: take logarithm of p-values

Databases D and D′ are adjacent.

Definition

Tuples (p1(D), . . . , pm(D)) and (p1(D′), . . . , pm(D′)) are called (η, ν)-multiplicatively sensitive if, for all i,

  • either pi(D), pi(D′) < ν, or
  • e−ηpi(D) ≤ p′

i(D′) ≤ eηpi(D)

  • πi = log max{pi(D), ν} has sensitivity η

22 / 40

slide-42
SLIDE 42

Examples of multiplicatively Sensitive p-values

iid ξ1, . . . , ξn, taking 1 with probability of α and 0 otherwise. T is the sum. To test H0 : α ≤ 1

2 against H1 : α > 1 2:

p(D) =

n

  • i=T

1 2n n i

  • .

Assume m = nC. Then we can take ν = m−2 and η = n− 1

2 +o(1)

23 / 40

slide-43
SLIDE 43

Building blocks of PrivateBHq

24 / 40

slide-44
SLIDE 44

Private Min

a.k.a. Report Noisy Min Algorithm 1: Private Min Input: π1, · · · , πm

1: for i = 1 to m do 2:

set π⊗

i = πi + gi where gi is i.i.d. Lap(η

  • 10k log(1/δ)/ǫ)

3: end for 4: return (i⋆ = argmin π⊗

i , π⋆ = πi⋆ + g) where g ∼ Lap(η

  • 10k log(1/δ)/ǫ)
  • Private Min is (2ǫ/
  • 10k log(1/δ), 0)-differentially privacy
  • Less noise [Raskhodnikova and Smith ’16]

25 / 40

slide-45
SLIDE 45

Pre-selection by peeling

Algorithm 2: Peeling Input: π1, · · · , πm and k

1: for j = 1 to k do 2:

run Private Min

3:

remove selected πi⋆

4: end for 5: report k selected pairs (i, ˜

πi)

26 / 40

slide-46
SLIDE 46

Pre-selection by peeling

Algorithm 2: Peeling Input: π1, · · · , πm and k

1: for j = 1 to k do 2:

run Private Min

3:

remove selected πi⋆

4: end for 5: report k selected pairs (i, ˜

πi)

Lemma

peeling(k) is (ǫ, δ)-differentially private

  • A simple application of Advanced Composition Theorem [Dwork, Rothblum,

and Vadhan ’10]

26 / 40

slide-47
SLIDE 47

Finally, PrivateBHq

Algorithm 3: PrivateBHq Input: (η, ν)-sensitive p-values p1, · · · , pm, k ≥ 1 and ǫ, δ Output: a set of up to k rejected hypotheses

1: set πi = log(max{pi, ν}) 2: apply peeling(k) to π1, . . . , πm 3: apply BHq to y1, . . . , yk with cutoffs αj = log(qj/m + ν) + η∆, where

∆ = (1 + o(1))

  • k log(1/δ) log m/ǫ

27 / 40

slide-48
SLIDE 48

Finally, PrivateBHq

Algorithm 3: PrivateBHq Input: (η, ν)-sensitive p-values p1, · · · , pm, k ≥ 1 and ǫ, δ Output: a set of up to k rejected hypotheses

1: set πi = log(max{pi, ν}) 2: apply peeling(k) to π1, . . . , πm 3: apply BHq to y1, . . . , yk with cutoffs αj = log(qj/m + ν) + η∆, where

∆ = (1 + o(1))

  • k log(1/δ) log m/ǫ

Theorem (Dwork, S., and Zhang)

The PrivateBHq is (ǫ, δ)-differentially private

27 / 40

slide-49
SLIDE 49

Outline

1 Warm-ups

FDR and BHq procedure Differential privacy

2 Introducing PrivateBHq 3 Proof of FDR control 28 / 40

slide-50
SLIDE 50

New techniques required

  • Smallest p-values may not be selected
  • Difficult to specify the joint distribution of selected p-values
  • Destroys crucial properties for proving FDR control

29 / 40

slide-51
SLIDE 51

Compliant procedures

Definition

A procedure is called compliant with {qj}m

j=1 if all the R rejected p-values are

below qR

30 / 40

slide-52
SLIDE 52

Compliant procedures

Definition

A procedure is called compliant with {qj}m

j=1 if all the R rejected p-values are

below qR

  • Self-consistency condition [Blanchard and Roquain ’08]
  • Step-up and step-down BHqs are {jq/m}-compliant
  • So are the generalized step-up-step-down procedures [Tamhane, Liu, and

Dunnett ’98; Sarkar 02’]

  • How about the PrivateBHq?

30 / 40

slide-53
SLIDE 53

PrivateBHq is compliant

Lemma

Given (η, ν)-sensitive p-values with ν = o(1/m), then with probability 1 − o(1), the private FDR-controlling algorithm is compliant with {jq′/m}, where q′ = (1 + o(1))eη∆ · q

31 / 40

slide-54
SLIDE 54

Compliance + IWS = FDR control

Definition

A set of test statistics are called to satisfy the independence within a subset I0 (IWS on I0), if the test statistics from I0 are jointly independent.

32 / 40

slide-55
SLIDE 55

Compliance + IWS = FDR control

Definition

A set of test statistics are called to satisfy the independence within a subset I0 (IWS on I0), if the test statistics from I0 are jointly independent.

Theorem

Suppose the test statistics satisfies IWS on the subset of true null hypotheses. Then any procedure compliant with the BHq critical values qj/m obeys FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

  • 1 + 2/
  • qk
  • q.
  • FDRk := E

V

R; V ≥ k

  • C ≈ 2.7

32 / 40

slide-56
SLIDE 56

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

  • 1 + 2/
  • qk
  • q.
  • Arbitrary correlations between true null and false null test statistics

33 / 40

slide-57
SLIDE 57

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

  • 1 + 2/
  • qk
  • q.
  • Arbitrary correlations between true null and false null test statistics
  • Can be even adversarial!

33 / 40

slide-58
SLIDE 58

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

  • 1 + 2/
  • qk
  • q.
  • Arbitrary correlations between true null and false null test statistics
  • Can be even adversarial!
  • Explains partially why BHq is so robust

33 / 40

slide-59
SLIDE 59

Compliance + IWS = FDR control

Theorem

IWS on the subset of true nulls + compliance with the BHq critical values qj/m give FDR ≤ q log(1/q) + Cq FDR2 ≤ Cq FDRk ≤

  • 1 + 2/
  • qk
  • q.
  • Arbitrary correlations between true null and false null test statistics
  • Can be even adversarial!
  • Explains partially why BHq is so robust
  • If V → ∞ with probability tending to one, then FDR ≤ q + o(1)

33 / 40

slide-60
SLIDE 60

Proof Sketch

34 / 40

slide-61
SLIDE 61

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

nulls.

35 / 40

slide-62
SLIDE 62

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

  • nulls. Compliance requires

p0

(V ) ≤ max 1≤j≤R pij ≤ αR = qR/m 35 / 40

slide-63
SLIDE 63

An upper bound on FDP

Let pi1, . . . , piR be those rejected, among which p0

(1) ≤ · · · ≤ p0 (V ) are from true

  • nulls. Compliance requires

p0

(V ) ≤ max 1≤j≤R pij ≤ αR = qR/m

Hence R ≥ ⌈mp0

(V )/q⌉

⇒ V max{R, 1} ≤ V ⌈mp0

(V )/q⌉

⇒FDP ≤ max

2≤j≤m0

j ⌈mp0

(j)/q⌉ + min

  • 1

⌈mp0

(1)/q⌉, 1

  • m0 is the total number of true nulls

35 / 40

slide-64
SLIDE 64

Bounding the two terms

Lemma

  • E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

  • E min
  • 1

⌈mp0

(1)/q⌉, 1

  • ≤ q log 1

q + C2q for some absolute constants C1 and C2

36 / 40

slide-65
SLIDE 65

Bounding the two terms

Lemma

  • E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

  • E min
  • 1

⌈mp0

(1)/q⌉, 1

  • ≤ q log 1

q + C2q for some absolute constants C1 and C2

  • Assume m0 = m

36 / 40

slide-66
SLIDE 66

Bounding the two terms

Lemma

  • E max

2≤j≤m0

j ⌈mp0

(j)/q⌉ ≤ C1q

  • E min
  • 1

⌈mp0

(1)/q⌉, 1

  • ≤ q log 1

q + C2q for some absolute constants C1 and C2

  • Assume m0 = m
  • Assume all true null p-values are iid uniform on [0, 1]

36 / 40

slide-67
SLIDE 67

Bounding the two terms

Lemma

  • E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q

  • E min
  • 1

⌈mU(1)/q⌉, 1

  • ≤ q log 1

q + C2q for some absolute constants C1 and C2

  • Assume m0 = m
  • Assume all true null p-values are iid uniform on [0, 1]
  • Let U1, U2, . . . , Um be iid and uniform on [0, 1]

36 / 40

slide-68
SLIDE 68

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q

37 / 40

slide-69
SLIDE 69

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q Let ξ1, . . . , ξm+1 be iid exponential random variables (U(1), U(2), . . . , U(m))

d

= T1 Tm+1 , T2 Tm+1 , . . . , Tm Tm+1

  • Tj = ξ1 + · · · + ξj

37 / 40

slide-70
SLIDE 70

Using Rényi’s representation

Wish to prove E max

2≤j≤m

j ⌈mU(j)/q⌉ ≤ C1q Let ξ1, . . . , ξm+1 be iid exponential random variables (U(1), U(2), . . . , U(m))

d

= T1 Tm+1 , T2 Tm+1 , . . . , Tm Tm+1

  • Tj = ξ1 + · · · + ξj
  • j

⌈mU(j)/q⌉ ≤ qj mU(j) = q m · jTm+1 Tj ≡ q m · Wj

  • Wj ≡ jTm+1/Tj

37 / 40

slide-71
SLIDE 71

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1

38 / 40

slide-72
SLIDE 72

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1 By martingale theory E max

2≤j≤m

Wj m ≤ (1 − e−1)−1

  • 1 + E

W2 m log W2 m ; W2 m ≥ 1

  • 38 / 40
slide-73
SLIDE 73

Wj is a backward submartingale

Wish to prove E max

2≤j≤m

Wj m ≤ C1

Submartingale definition

E(Wj|Tj+1, . . . , Tm+1) ≥ Wj+1 By martingale theory E max

2≤j≤m

Wj m ≤ (1 − e−1)−1

  • 1 + E

W2 m log W2 m ; W2 m ≥ 1

  • ≤ (1 − e−1)−1
  • 1 + E
  • 2

mU(2) log 2 mU(2) ; 2 mU(2) ≥ 1

  • ≤ C1

38 / 40

slide-74
SLIDE 74

Summary

39 / 40

slide-75
SLIDE 75

Take-home message

  • FDR addresses reproducibility
  • Differential privacy is a rigorous definition
  • Privatize BH by adding noise in peeling
  • A bonus: Compliance with IWS gives FDR control

40 / 40

slide-76
SLIDE 76

Take-home message

  • FDR addresses reproducibility
  • Differential privacy is a rigorous definition
  • Privatize BH by adding noise in peeling
  • A bonus: Compliance with IWS gives FDR control

Thank You!

40 / 40