Privately Detecting Changes in Unknown Distributions Wanrong Zhang, - - PowerPoint PPT Presentation

β–Ά
privately detecting changes in unknown distributions
SMART_READER_LITE
LIVE PREVIEW

Privately Detecting Changes in Unknown Distributions Wanrong Zhang, - - PowerPoint PPT Presentation

Privately Detecting Changes in Unknown Distributions Wanrong Zhang, Georgia Tech joint work with Rachel Cummings, Sara Krehbiel, Yuliia Lut 1 Motivation I: Smart-home IoT devices 2 Motivation II: Disease outbreaks 3 Change-point problem:


slide-1
SLIDE 1

Wanrong Zhang, Georgia Tech

joint work with Rachel Cummings, Sara Krehbiel, Yuliia Lut

Privately Detecting Changes in Unknown Distributions

1

slide-2
SLIDE 2

Motivation I: Smart-home IoT devices

2

slide-3
SLIDE 3

Motivation II: Disease outbreaks

3

slide-4
SLIDE 4

Change-point problem:

Identify distributional changes in stream of highly sensitive data Model: Data points 𝑦", … , 𝑦%βˆ— ∼ 𝑄) (pre-change) 𝑦%βˆ—, … , 𝑦* ∼ 𝑄

" (post-change)

Question: Estimate the unknown change time π‘™βˆ—

4

Need formal privacy guarantees for change-point detection algorithms

Our work: nonparametric model (𝑄) and 𝑄

" unknown)

Previous work: parametric model [CKM+18] (𝑄) and 𝑄

" known)

slide-5
SLIDE 5

Differential privacy [DMNS β€˜06]

Bound the maximum amount that one person’s data can change the distribution of an algorithm’s output

  • 𝑇 as set of β€œbad outcomes”
  • Worst-case guarantee

5

An algorithm 𝑁: π‘ˆ* β†’ 𝑆 is 𝝑-differentially private if βˆ€ neighboring 𝑦, 𝑦′ ∈ π‘ˆ* and βˆ€ 𝑇 βŠ† 𝑆, 𝑄 𝑁 𝑦 ∈ 𝑇 ≀ 𝑓; 𝑄 𝑁 𝑦< ∈ 𝑇

slide-6
SLIDE 6

Privately Detecting Changes in Unknown Distributions

  • 1. Offline setting: dataset known in advance
  • 2. Online setting: data points arrive one at a time
  • 3. Drift change detection (in paper)
  • 4. Empirical results (in paper)

6

slide-7
SLIDE 7

Privately Detecting Changes in Unknown Distributions

  • 1. Offline setting: dataset known in advance
  • 2. Online setting: data points arrive one at a time
  • 3. Drift change detection (in paper)
  • 4. Empirical results (in paper)

7

slide-8
SLIDE 8

Mann-Whitney test [MW β€˜47]

Datasets: 𝑦", 𝑦=, … 𝑦%~𝑄) and 𝑦%?", 𝑦%?=, … 𝑦*~𝑄

"

𝐼): 𝑄) = 𝑄

", 𝐼": 𝑄) β‰  𝑄 "

Test statistic: π‘Š 𝑙 =

" %(*D%) βˆ‘

βˆ‘ 𝐽(𝑦H > 𝑦J)

% HK" * JK%?"

Under 𝐼", require 𝑏: = 𝑄𝑠

N~OP,Q~OR 𝑦 > 𝑧 β‰  " =

Number of such pairs (𝑦H, 𝑦J) such that 𝑦H > 𝑦J

8

slide-9
SLIDE 9

Non-private nonparametric change-point detection [Darkhovsky β€˜79]

1. For every 𝑙 ∈ πœΉπ’ , … 𝟐 βˆ’ 𝜹 𝒐 2. Compute π‘Š 𝑙 3. Output 𝑙 ] = 𝑏𝑠𝑕𝑛𝑏𝑦%π‘Š(𝑙)

𝟐 π’βˆ— 𝒐 πœΉπ’ (𝟐 βˆ’ 𝜹)𝒐 𝑾(𝒍) Can we compute V(𝑙) or arg max π‘Š(𝑙) privately?

9

slide-10
SLIDE 10

Adding differential privacy

Differentially private algorithms add noise that scale with the sensitivity

  • f a query.

Query sensitivity: The sensitivity of real-valued query 𝑔 is: Δ𝑔 = max

i,ij*kHlmnopq 𝑔 π‘Œ βˆ’ 𝑔 π‘Œ<

. Laplace Mechanism: The mechanism 𝑁 𝑔, π‘Œ, πœ— = 𝑔 π‘Œ + Lap(xy

z ) is

πœ—-differentially private.

10

slide-11
SLIDE 11

Offline PNCPD

Private Nonparametric Change-Point Detector: 𝑄𝑂𝐷𝑄𝐸(π‘Œ, πœ—, 𝛿)

  • 1. Input: database, privacy parameter πœ—, constraint parameter 𝛿
  • 2. for k ∈ π›Ώπ‘œ , …

1 βˆ’ 𝛿 π‘œ 3. Compute statistic π‘Š(𝑙) 4. Sample π‘Ž%~π‘€π‘π‘ž

= ;…*

  • 5. Output 𝑙

† = 𝑏𝑠𝑕𝑛𝑏𝑦% π‘Š 𝑙 + π‘Ž%

= Mann-Whitney + ReportNoisyMax

11

slide-12
SLIDE 12

Main results: OfflinePNCPD

Theorem: Offline𝑄𝑂𝐷𝑄𝐸 π‘Œ, πœ—, 𝛿 is πœ—-differentially private and with probability 1 βˆ’ 𝛾 , it outputs private change-point estimator 𝑙 † with error at most 𝑙 † βˆ’ π‘™βˆ— < 𝑃 1 𝝑𝛿‒ 𝑏 βˆ’ 1/2 =

".)"

’ log 1 𝛾 Β§ Previous non-private analysis [Darkhovsky β€˜76] 𝑙 ] βˆ’ π‘™βˆ— < 𝑃(π‘œ=/β€œ) Β§ Our improved non-private analysis: 𝑙 ] βˆ’ π‘™βˆ— < 𝑃 1 𝛿‒ 𝑏 βˆ’ 1/2 = log 1 Ξ² = 𝑃 1

12

slide-13
SLIDE 13

Privately Detecting Changes in Unknown Distributions

  • 1. Offline setting: dataset known in advance
  • 2. Online setting: data points arrive one at a time
  • 3. Drift change detection (in paper)
  • 4. Empirical results (in paper)

13

slide-14
SLIDE 14

Online setting

More challenging: must detect change quickly without much post- change data High Level Approach:

  • 1. Privately detect online when V 𝑙 > π‘ˆ in the center of a sliding

window of last π‘œ data points.

  • 2. Run OfflinePNCPD on the identified window.

14

Have DP algorithm (AboveNoisyThreshold) for this

slide-15
SLIDE 15

Online setting

More challenging: must detect change quickly without much post- change data Our Approach:

  • 1. Run AboveNoisyThreshold on Mann-Whitney queries in the center of

a sliding window of last π‘œ data points.

  • 2. Run OfflinePNCPD on the identified window.

15

π‘Š 𝑙 + π‘Ž% < π‘ˆ

slide-16
SLIDE 16

Online setting

More challenging: must detect change quickly without much post- change data Our Approach:

  • 1. Run AboveNoisyThreshold on Mann-Whitney queries in the center of

a sliding window of last π‘œ data points.

  • 2. Run OfflinePNCPD on the identified window.

16

π‘Š 𝑙 + π‘Ž% < π‘ˆ

slide-17
SLIDE 17

Online setting

More challenging: must detect change quickly without much post- change data Our Approach:

  • 1. Run AboveNoisyThreshold on Mann-Whitney queries in the center of

a sliding window of last π‘œ data points.

  • 2. Run OfflinePNCPD on the identified window.

17

π‘Š 𝑙 + π‘Ž% β‰₯ π‘ˆ

slide-18
SLIDE 18

OnlinePNCPD

  • 1. Input: database π‘Œ = {𝑦", … }, privacy parameter πœ—, threshold π‘ˆ
  • 2. Let π‘ˆ

] = π‘ˆ + Lap

˜ zβ„’

  • 3. For each new data point 𝑦%:

4. Compute Mann-Whitney statistic π‘Š(𝑙) in center of last π‘œ data points 5. Sample π‘Ž% ∼ Lap RΕ‘

zβ„’

6. If π‘Š 𝑙 + π‘Ž

J > π‘ˆ

], then 7. Run OfflinePNCPD on last π‘œ data points with πœ—/2 8. Else, output βŠ₯

18

slide-19
SLIDE 19

Main result: OnlinePNCPD

Theorem: Online𝑄𝑂𝐷𝑄𝐸 π‘Œ, T, πœ—, 𝛿 is πœ—-differentially private. For appropriate threshold T, with probability 1 βˆ’ 𝛾, it outputs private change-point estimator 𝑙 † with error at most 𝑙 † βˆ’ π‘™βˆ— < 𝑃 1 πœ— log π‘œ 𝛾 where π‘œ is the window size. Choice of T

  • Can’t raise alarm too early (False positive: π‘ˆ > π‘ˆ
  • )
  • Can’t fail to raise alarm at true change (False negative: π‘ˆ < π‘ˆ

ΕΎ)

19

slide-20
SLIDE 20

Privately Detecting Changes in Unknown Distributions

  • 1. Offline setting: dataset known in advance
  • 2. Online setting: data points arrive one at a time
  • 3. Drift change detection (in paper)
  • 4. Empirical results (in paper)

20

slide-21
SLIDE 21
  • Cummings, R., Krehbiel, S., Mei, Y., Tuo, R., & Zhang, W. Differentially

private change-point detection. In Advances in Neural Information Processing Systems, NeurIPS’18 pp. 10848-10857,2018

  • Dwork, C., McSherry, F., Nissim, K., & Smith, A. Calibrating noise to

sensitivity in private data analysis. In Theory of cryptography conference, pp. 265-284, 2006.

  • Dwork, C., Roth, A. The algorithmic foundations of differential
  • privacy. Foundations and Trends in Theoretical Computer Science, 9(3–

4), 211-407, 2014.

  • Darkhovsky, B. A nonparametric method for the a posteriori detection of

the ``disorder’’ time of a sequence of independent random variables. Theory of Probability & Its Applications, 21(1):178-183, 1976.

  • Mann, H.B. and Whitney, D.R. On a test of whether one of two random

variables is stochastically larger than the other. The annals of mathematical statistics, pp 50-60, 1947.

References

21

slide-22
SLIDE 22

Wanrong Zhang, Georgia Tech

joint work with Rachel Cummings, Sara Krehbiel, Yuliia Lut

Privately Detecting Changes in Unknown Distributions

22