csc2412 adaptive data analysis via di ff erential privacy
play

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho - PowerPoint PPT Presentation

CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1 The adaptive data analysis problem Estimating population counts verse of possible data points juni Unknown distribution D on X - models population the i ?


  1. CSC2412: Adaptive Data Analysis via Di ff erential Privacy Sasho Nikolov 1

  2. The adaptive data analysis problem

  3. Estimating population counts verse of possible data points juni • Unknown distribution D on X - → models population the i ? smoker ? smoker and male E. g if , • Predicates q 1 , . . . , q k : X → { 0 , 1 } PhD . . any qz - - ? smoker - ch . Want to estimate, for all i = 1 . . k : q i ( D ) = E x ∼ D [ q i ( x )] . fraction of - the population satisfying 9 i 2

  4. The classical solution Draw a sample X = { x 1 , . . . , x n } iid from D . - I gilt ) qidxj ) Hope that ∀ i : q i ( X ) ≈ q i ( D ) - , I } Is independent , info , ( D ) - qi Effi ( x ) ) I > d ) - Eq - Pll qilxl - gil DH H ) " R ( I qilx ) . Hotting : - - 2h22 if fnzhg.la ? kh#-/ E L e - 2nd Ep qi CDH > a) : lqiltl c- 2k . e Blt - i =L 3

  5. Adaptive queries? for estimates g. ( D ) the . . . , qi , CD ) What if q i depends on q 1 , . . . , q i − 1 ? - , E. g chosen based q , ( H , qi , ( X ) is qi on . . . . , and female } split male → it = ? smokers and E. g even g. . ask g. so ? smokers and smokers ? % ' 235 yrs ebesto# ask 9. ( X ) , g. CH Suppose , gut ) for we Kan random . - - n q . , . - invert predicates X learn and to - we - f ! I :X ⇒ q µ , # =L is uniform . But if R D on 9k . # HI 4 then Kiki , (D) =D -

  6. A simple solution ' . Kyl - htt X into Break . rn } X =L x - . . . . . . . - -1 × 2%1 X ? { tinges ' ) Answer . - by gilt CD ) g. : this - in } ' ) by qdx % * . . . . ya by quark )# prob I - p get 2 W error z ful 2%1 need I I ¥ kln ! Yf) n Can do better ? we - - 5

  7. ⇐ Transfer theorem UK )± µ w/ % answers answers will Hk ) , q , determined from tf ! ! → U by analysts - Theorem Suppose M takes a dataset X and answers k adaptive queries q 1 , . . . , q k . If the accurate U 1. ∀ X ∈ X n , P ( ∃ i : | q i ( X ) − M ( X ) i | > ↵ ) < ↵� , on dataset → 2. M is ( ↵ , ↵� ) -DP, constant C for then a P ( ∃ i : |M ( X ) i − q i ( D ) | > C ↵ ) < C � . µ " KD a D independently - Six , . - , rut X - D " X x ; - . 6

  8. Improving on the simple solution a k ¥ 431 with d Simple solution ; error 22 √ k log k Can get error ↵ with ≈ samples. α 2 Gaussian advanced composition t noise ' ns !¥ iE - N Zi giant Zi answer ur q . , 81 and get ( e DP we - e=fi d for and any = t g) - DP ( d. did need Transfer g Him : we → kTH a F sad Std dev if is n 7 per = q , I

  9. Key Lemma q :& -3,4 ? ! ! distr " ou 't tell . Lemma Suppose W is ( " , � ) -DP, and on input X outputs a counting query q . Let X ∼ D n . Then a Etf | E [ q ( D ) | q = W ( X )] − E [ q ( X ) | q = W ( X )] | ≤ e ε − 1 + � . t f - n " E choice X - D " random over of and randomness of N DP algorithm a query that A distinguishes cannot find 8 X from D .

  10. Proof of Key Lemma → so th E. qui I } ' q :& quit , - . n n h E [ q ( X ) | q = W ( X )] = 1 E [ q ( x i ) | q = W ( X )] = 1 . X X P ( q ( x i ) | q = W ( X )) n n i =1 i =1 from everything Take else . independently n D 4. - Sir . X ' - yxn } trudging . . . , ri-nxi.xi.ie - . . - plqcx.it/q=WKHEfeElPlqcriiitlq--WK' 1) to ( E .tl W - DP of 9

  11. Proof part 2 X - th - - in } ' ) ( xi , X the qEon has - . so :L .fm : . X 's 4h - - Hi , tie ' - - ' i' n } ou , X ) ' ( x ! . . as plqcx.it/q=WlXl)EeElPlqcxiiitlq--WK' 1) to e' Plgirittlq - WH ) ) + of " " " 91 ¥ 49 = quit EE EIQCD ) Iq - with + T = . e ' 14 ¥ ? - WHY IE Iqlxllq E - - NIH ) - IEIqcdllq-WIXHL.ee Efqctllq c- T I - analogous - Cee - ly f ) 10 Z

  12. Aside: Generalization from DP same proof the Almost ( exercise ) lemma → the as pine Theorem DP For any non-negative loss ` ( ✓ , ( x , y )) , X = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } ∼ D n , and :* . n L X ( ✓ ) = 1 X ` ( ✓ , ( x i , y i )) L D ( ✓ ) = E ( x , y ) ∼ D [ ` ( ✓ , ( x , y ))] , n i =1 if ✓ is computed by an ( " , � ) -DP algorithm, then E [ L D ( ✓ )] ≤ e ε E [ L X ( ✓ )] + � max θ , x , y ` ( ✓ , ( x , y )) . much loss Population not is more loss empirical for than DD algo 11 .

  13. A simpler transference theorem Theorem If the mechanism M satisfies that 1. ∀ X ∈ X n , and all sequence of adaptive queries q 1 , . . . , q k , n n E [max i | q i ( X ) − M ( X ) i | ] ≤ ↵ t - Ik tf , 2. M is ( " , � ) -DP, then ate + of | q i ( D ) − M ( X ) i | ] ≤ ↵ + e ε − 1 + � E [max I i - on MHI . . - MAI chosen based . % adaptively are q , , . . . " X - D 12

  14. ⇐ Proof I - qilxl - O - I qilx ) - - - I - O H l - gild q ok ) . - Trick: Suppose that if q i is asked, so is 1 − q i , and is answered by 1 − M ( X ) i . Then max k i =1 | q i ( D ) − M ( X ) i | = max k i =1 q i ( D ) − M ( X ) i . - gild ) } - Mahi , Mtk 9 gild - lllxlil ) lqil Dl max = - 11 - gild ) - ( t - Murli ) I w adaptive it x ) simulates the Define set it on . " " 't queries - prod oh . . , q , -- junk ,atqjlD M is c. ft - Dp of . , sat . gild I r H Outputs ) ) - NIH , , is 1481 - Dp qi ⇒ w error f Yi has - UCH ; 13 mat

  15. Proof pt 2 IE Hia qi - E Iq - U Chi - WH ) ) - MIX ) ; I qi . ID ) CD ) . - - qi CHI qi - WH ) ) = IE I q . . ID ) . - NCH ) - MkIi l qi N - H ) 1- IE E q . - - I to ' e ⇐ enjoy qjkl lemma N l by - Umd : E I t ft L E e - . 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend