Towards Practical Differentially Private Convex Optimization ROGER - - PowerPoint PPT Presentation

β–Ά
towards practical differentially private
SMART_READER_LITE
LIVE PREVIEW

Towards Practical Differentially Private Convex Optimization ROGER - - PowerPoint PPT Presentation

Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR NEAR UNIVERSITY OF CARNEGIE MELLON UNIVERSITY OF CALIFORNIA, BERKELEY UNIVERSITY VERMONT OM THAKKAR ABHRADEEP LUN WANG THAKURTA BOSTON


slide-1
SLIDE 1

Towards Practical Differentially Private Convex Optimization

ROGER IYENGAR

CARNEGIE MELLON UNIVERSITY

JOSEPH P. NEAR

UNIVERSITY OF VERMONT

DAWN SONG

UNIVERSITY OF CALIFORNIA, BERKELEY

ABHRADEEP THAKURTA

UNIVERSITY OF CALIFORNIA, SANTA CRUZ

LUN WANG

UNIVERSITY OF CALIFORNIA, BERKELEY

OM THAKKAR

BOSTON UNIVERSITY

slide-2
SLIDE 2

Contributions

  • New Algorithm for Differentially Private Convex Optimization:

Approximate Minima Perturbation (AMP)

  • Can leverage any off-the-shelf optimizer
  • Works for all convex loss functions
  • Has a competitive hyperparameter-free variant
  • Broad Empirical Study
  • 6 state-of-the-art techniques
  • 2 models: Logistic Regression, and Huber SVM
  • 13 datasets: 9 public (4 high-dimensional), 4 real-world use cases
  • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark
slide-3
SLIDE 3

This Talk

  • Why Privacy for Learning?
  • Background
  • Differential Privacy (DP)
  • Convex Optimization
  • Approximate Minima Perturbation (AMP)
  • Broad Empirical Study
slide-4
SLIDE 4

Why Privacy for Learning?

Sensitive Data 𝐸

Training Algorithm 𝐡

Trained Model β€‹πœ„ Input Output

  • Models can leak information about training data
  • Membership inference attacks [Shokri Stronati Song Shmatikov’17, Carlini Liu Kos Erlingsson Song’18,

Melis Song Cristofaro Shmatikov’18]

  • Model inversion attacks [Fredrikson Jha Ristenpart’15, Wu Fredrikson Jha Naughton’16]
  • Solution?
slide-5
SLIDE 5

𝐐𝐬 𝐐𝐬(𝑩(𝑬)=β€‹πœΎ ) 𝐸:

:

Differential Privacy [Dwork Mcsherry Nissim Smith β€˜06]

Alice Bob Cathy Doug Emily Om Randomized Outcomes β€‹πœΎ

𝜾 ∈𝚰 𝐡 Θ

slide-6
SLIDE 6

𝐐𝐬 𝐐𝐬(𝑩(𝑬)=β€‹πœΎ ) ​𝐸↑′ :

Differential Privacy [Dwork Mcsherry Nissim Smith β€˜06]

Alice Bob Cathy Doug Emily Om Randomized Outcomes β€‹πœΎ

𝜾 ∈𝚰 𝐡 Θ

Felix

slide-7
SLIDE 7

𝐐𝐬 𝐐𝐬(𝑩(𝑬′)=β€‹πœΎ ) ​𝐸↑′ :

Differential Privacy [Dwork Mcsherry Nissim Smith β€˜06]

Alice Bob Cathy Doug Emily Om Randomized Outcomes β€‹πœΎ

𝜾 ∈𝚰 𝐡 Θ

Small Felix

slide-8
SLIDE 8

Differential Privacy [Dwork Mcsherry Nissim Smith

β€˜06]

  • Privacy parameters: (𝜁,πœ€)
  • A randomized algorithm 𝐡:β€‹π’ β†‘π‘œ β†’π‘ˆ is (𝜁,πœ€)-DP if
  • for all neighboring datasets 𝐸,​𝐸↑′ βˆˆβ€‹π’ β†‘π‘œ , i.e., 𝑒𝑗𝑑𝑒(𝐸,​𝐸↑′ )=1
  • for all sets of outcomes π‘‡βŠ†Ξ˜, we have

​Pr(𝐡(𝐸)βˆˆπ‘‡) β‰€β€‹π‘“β†‘πœ ​Pr(𝐡(​𝐸↑′ )βˆˆπ‘‡) + πœ€ 𝜁: Multiplicative change.

Typically, 𝜁=𝑃(1)

πœ€: Additive change.

Typically, πœ€=𝑃(1/β€‹π‘œβ†‘2 )

slide-9
SLIDE 9

Convex Optimization

  • Input:
  • Dataset πΈβˆˆβ€‹π’ β†‘π‘œ
  • Loss function 𝑀(πœ„,𝐸), where
  • πœ„βˆˆβ€‹β„β†‘π‘ž is a model
  • Loss 𝑀 is convex in the first parameter πœ„
  • Goal: Output model β€‹πœ„ such that

β€‹πœ„ βˆˆβ€‹minβ”¬πœ„βˆˆβ€‹β„β†‘π‘ž 𝑀(πœ„,𝐸)

  • Applications:
  • Machine Learning, Deep Learning, Collaborative Filtering, etc.

β€‹πœ„ 𝑀(πœ„,𝐸) πœ„

slide-10
SLIDE 10

DP Convex Optimization - Prior Work

Sensitive Data 𝐸

Training Algorithm 𝐡

Trained Model β€‹πœ„ Input Output

Objective Perturbation

[Chaudhuri Monteleoni Sarwate’11, Kifer Smith Thakurta’12, Jain Thakurta’14]

DP GD/SGD

[Song Chaudhuri Sarwate’13, Bassily Smith Thakurta’14, Abadi Chu Goodfellow McMahan Mironov Talwar Zhang’16]

DP Frank Wolfe

[Talwar Thakurta Zhang’14]

Output Perturbation

[CMS’11, KST’12, JT’14]

DP Permutation

  • based SGD

[Wu Li Kumar Chaudhuri Jha Naughton ’17]

  • Requires minima of loss
  • Requires custom optimizer
slide-11
SLIDE 11
  • Input:
  • Dataset 𝐸, Loss function: 𝑀(πœ„,𝐸)
  • Privacy parameters: 𝑐=(πœ—, πœ€)
  • Gradient norm bound 𝛿
  • Algorithm (high-level):
  • 1. Split privacy budget into 2 parts ​𝑐↓1 and ​𝑐↓2
  • 2. Perturb loss: β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸)=𝑀(πœ„,𝐸)+𝑆𝑓𝑕(πœ„,​𝑐↓1 )

β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸) 𝑀(πœ„,𝐸)

Approximate Minima Perturbation (AMP)

πœ„ β€‹πœ„β†“π‘žπ‘ π‘—π‘€ β€‹πœ„

Similar to standard Objective Perturbation

[KST’12]

slide-12
SLIDE 12
  • Input:
  • Dataset 𝐸, Loss function: 𝑀(πœ„,𝐸)
  • Privacy parameters: 𝑐=(πœ—, πœ€)
  • Gradient norm bound 𝛿
  • Algorithm (high-level):
  • 1. Split privacy budget into 2 parts ​𝑐↓1 and ​𝑐↓2
  • 2. Perturb loss: β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸)=𝑀(πœ„,𝐸)+𝑆𝑓𝑕(πœ„,​𝑐↓1 )
  • 3. Let β€‹πœ„β†“π‘π‘žπ‘žπ‘ π‘π‘¦ =πœ„ s.t. β€‹β€–βˆ‡β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸)‖↓2 ≀𝛿
  • 4. Output β€‹πœ„β†“π‘π‘žπ‘žπ‘ π‘π‘¦ +𝑂𝑝𝑗𝑑𝑓(​𝑐↓2 ,𝛿)

β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸)

Approximate Minima Perturbation (AMP)

πœ„ β€‹πœ„β†“π‘žπ‘ π‘—π‘€ β€‹β€–βˆ‡β€‹π‘€β†“π‘žπ‘ π‘—π‘€ (πœ„,𝐸)‖↓2 ≀𝛿 β€‹πœ„β†“π‘π‘žπ‘žπ‘ π‘π‘¦

Similar to standard Objective Perturbation

[KST’12]

slide-13
SLIDE 13

Utility guarantees

  • Let β€‹πœ„ minimize 𝑀(πœ„;𝐸), and the regularization parameter Ξ›=β€‹Ξ˜ (β€‹πœŠβˆšπ‘ž /πœ—π‘œβ€–β€‹πœ„

β€– ).

  • Objective Perturbation [KST’12]: If β€‹πœ„β†“π‘žπ‘ π‘—π‘€ is the output of obj. pert.:

𝔽(𝑀(β€‹πœ„β†“π‘žπ‘ π‘—π‘€ ;𝐸)βˆ’π‘€(β€‹πœ„ ;𝐸))=​𝑃 (β€‹πœŠβˆšπ‘ž β€–β€‹πœ„ β€–/πœ—π‘œ ).

  • AMP (adapted from [KST’12]): For output β€‹πœ„β†“π΅π‘π‘„ :

𝔽(𝑀(β€‹πœ„β†“π΅π‘π‘„ ;𝐸)βˆ’π‘€(β€‹πœ„ ;𝐸))=​𝑃 (β€‹πœŠβˆšπ‘ž β€–β€‹πœ„ β€–/πœ—π‘œ +β€–β€‹πœ„ β€–π›Ώπ‘œ).

  • For 𝛿=𝑃(​1/β€‹π‘œβ†‘2 ), the utility of AMP is asymptotically the same as that of Obj. Pert.
  • Private PSGD [WL​K↑+ 17]: For output β€‹πœ„β†“π‘„π‘‡π»πΈ , and model space radius 𝑆:

𝔽(𝑀(β€‹πœ„β†“π‘„π‘‡π»πΈ ;𝐸)βˆ’π‘€(β€‹πœ„ ;𝐸))=​𝑃 (β€‹πœŠβˆšπ‘ž 𝑆/πœ—βˆšπ‘œ ).

  • For 𝛿=𝑃(​1/β€‹π‘œβ†‘2 ), the utility of AMP has a better dependence on π‘œ than Private PSGD.

than Private PSGD.

slide-14
SLIDE 14

AMP - Takeaways

  • Can leverage any off-the-shelf optimizer
  • Works for all standard convex loss functions
  • For 𝛿=𝑃(​1/β€‹π‘œβ†‘2 ), the utility of AMP:
  • is asymptotically the same as Objective Perturbation [KST’12]
  • has a better dependence on π‘œ than Private PSGD [WL​K↑+ 17]
  • 𝛿=​1/β€‹π‘œβ†‘2 achievable using standard Python libraries
slide-15
SLIDE 15

Empirical Evaluation

  • Algorithms evaluated:
  • Approximate Minima Perturbation (AMP)
  • Private SGD [​BST↑′ 14,​ACG↑+ 17]
  • Private Frank-Wolfe (FW) [​TTZ↑′ 14]
  • Private Permutation-based SGD (PSGD) [WL​K↑+ 17]
  • Private Strongly-convex (SC) PSGD [WL​K↑+ 17]
  • Hyperparameter-free (HF) AMP
  • Splitting the privacy budget: We provide

a schedule for low- and high-dim. data by evaluating AMP only on synthetic data

  • Non-private (NP) Baseline
slide-16
SLIDE 16

Empirical Evaluation

  • Loss functions considered:
  • Logistic loss
  • Huber SVM
  • Procedure:
  • 80/20 train/test random split
  • Fix πœ€=​1/β€‹π‘œβ†‘2 , and vary πœ— from 0.01 to 10
  • Measure accuracy of final tuned* private model over test set
  • Report the mean accuracy and std. dev. over 10 independent runs

This talk

*Does not apply to Hyperparameter-free AMP.

slide-17
SLIDE 17

Synthetic Datasets

Synthetic-L (10k Γ—20)

Legend

NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW

  • Synthetic-H is high-dimensional, but low-rank
  • Private Frank-Wolfe performs the best on Synthetic-H

Synthetic-H (2k Γ—2k)

slide-18
SLIDE 18

High-dimensional Datasets

Real-sim (72kΓ—21k)

Legend

NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW

  • Both variants of AMP almost always provide the best performance

RCV-1 (50k Γ—47k)

slide-19
SLIDE 19

Real-world Use Cases (Uber)

Dataset 1 (4mΓ—23)

Legend

NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW

  • DP as a regularizer [BST’14, Dwork Feldman Hardt Pitassi Reingold Roth ’15]
  • Even for πœ—=​10β†‘βˆ’2 , accuracy of AMP is close to non-private baseline

Dataset 2 (18mΓ—294)

slide-20
SLIDE 20

Conclusions

  • For large datasets, cost of privacy is low
  • Private model is within 4% accuracy of the non-private one for πœ—=0.01,

and within 2% for πœ—=0.1

  • AMP almost always provides the best accuracy,

and is easily deployable in practice

  • Hyperparameter-free AMP is competitive

w.r.t. tuned state-of-the-art private algorithms

  • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark

Thank You!