CSC2412: Adaptive Data Analysis via Differential Privacy
Sasho Nikolov
1
CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho - - PowerPoint PPT Presentation
CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data analysis problem Estimating population counts verse of possible data points juni Unknown distribution D on X - models population the i ?
CSC2412: Adaptive Data Analysis via Differential Privacy
Sasho Nikolov
1
The adaptive data analysis problem
Estimating population counts
Want to estimate, for all i = 1 . . k: qi(D) = Ex∼D[qi(x)].
2
juni
verse of possible
data points
the
population
if ,
i ? smoker
qz
ch
.PhD
the
population
satisfying
9 i
The classical solution
Draw a sample X = {x1, . . . , xn} iid from D. Hope that ∀i : qi(X) ≈ qi(D)
3
gilt )
,
qidxj )
Effi (x))
( D)
Is independent , info
,I}
Hotting :
"
R ( I qilx)
I > d )
E L
e
Blt
i
: lqiltl
c- 2k . e
if fnzhg.la?kh#-/
Adaptive queries?
What if qi depends on q1, . . . , qi−1?
4
estimates
for
,
. . ., qi , CD)qi
is
chosen
based
q , ( H
, . . ., qi, (X )
E.g
.g.
= ? smokersand
male
%
'?
smokers
and female }
→ it
even
split
ask
and
235 yrs
Suppose
we
ask
for Kan
,q
. .random
and
we
learn
X
predicates
9k .# HI
D
is uniform
R
then Kiki , (D) =D
A simple solution
5
Break
X =L x
. . . . . . rn}into X
'
X? { tinges
.Answer
g.
CD )
by gilt
' ):
%
by qdx
')
this
*
. . . .ya by
quark)#
get
error
2
W
prob
I - p
I
need
n
z ful2%1
Can
we
do better ?
I ¥kln!Yf)
Transfer theorem
Theorem Suppose M takes a dataset X and answers k adaptive queries q1, . . . , qk. If
then P(∃i : |M(X)i − qi(D)| > C↵) < C.
6
µ
answers
%
w/
UK)±
q, determined
from
tf!!
) ,
→ U answers willHk
by analysts
U accurate
the
dataset for
a
constant
C
µ
KD
"
X - D
"⇐ X
x ;
a D independently
Improving on the simple solution
Can get error ↵ with ≈
√k log k α2
samples.
7
Simple
solution ;
error
d with
a k¥431 22
Gaussian
noise t
advanced
composition
answer
q
.ur
giant Zi
Zi
and
we
get
( e
, 81for
any
d and
Transfer
Him :
we
need
( d. did
g
= t g)
Std
dev per q,
is a F
=sad
if
n
→ kTH
I
Key Lemma
Lemma Suppose W is (", )-DP, and on input X outputs a counting query q. Let X ∼ Dn. Then |E[q(D) | q = W(X)] − E[q(X) | q = W(X)]| ≤ eε − 1 + .
8
tell
"
q:&-3,4?!! distr
.t f
n
random
choice
X - D
" "
E
and
randomness of
N
A
DP algorithm
cannot
find
a query
that
distinguishes
X
from
D
.Proof of Key Lemma
E[q(X) | q = W(X)] = 1 n
n
X
i=1
E[q(xi) | q = W(X)] = 1 n
n
X
i=1
P(q(xi) | q = W(X))
9
quit
.'
q :&
→ so
,I}
h
.Take
4.
n Dindependently
from everything
else .
X
'plqcx.it/q=WKHEfeElPlqcriiitlq--WK' 1) to
( E.tl
W
Proof part 2
10
X - th
.qEon
:
( xi
, X ')
has
the
so:L
.fm
.X 's 4h
. .as
( x !
, X )plqcx.it/q=WlXl)EeElPlqcxiiitlq--WK' 1) to
quit
91¥49
""" =
e' Plgirittlq
=
EE EIQCD
) Iq
.+ T
IE Iqlxllq
E
e ' 14¥?
Efqctllq
c- T
Z
analogous
Aside: Generalization from DP
Theorem For any non-negative loss `(✓, (x, y)), X = {(x1, y1), . . . , (xn, yn)} ∼ Dn, and LX(✓) = 1 n
n
X
i=1
`(✓, (xi, yi)) LD(✓) = E(x,y)∼D[`(✓, (x, y))], if ✓ is computed by an (", )-DP algorithm, then E[LD(✓)] ≤ eεE[LX(✓)] + max
θ,x,y `(✓, (x, y)). 11
→ Almost
the
same proof as
the
lemma
( exercise ) DP
Population
loss
is
not
much
more
than
empirical
loss
for
DD algo
.A simpler transference theorem
Theorem If the mechanism M satisfies that
E[maxi |qi(X) − M(X)i|] ≤ ↵
then E[max
i
|qi(D) − M(X)i|] ≤ ↵ + eε − 1 +
12
n
n
tf ,
t - Ik
ate + of
q ,
,
. . . . %are
adaptively
chosen
based
X-
D
"
Proof
Trick: Suppose that if qi is asked, so is 1 − qi, and is answered by 1 − M(X)i. Then maxk
i=1 |qi(D) − M(X)i| = maxk i=1 qi(D) − M(X)i. 13
qilx)
⇐
I - qilxl
q ok)
H l
lqil Dl
max
9 gild
)
I
Define
w
set
.it
x) simulates
it
the
adaptive
M is c.ft
""'t queries
I
r
,
⇒ w
is 1481
H Outputs qi
sat . gild
) - NIH,
)
Yi has
mat
error f
Proof pt 2
14
IE Hia qi
CD )
I qi
.= IE I q .
. ID )N
e
'
1- IE E q .
N
l
by lemma
⇐ enjoy qjkl
E
e
E