Towards Practical Differentially Private Convex Optimization ROGER - PowerPoint PPT Presentation

Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR NEAR UNIVERSITY OF CARNEGIE MELLON UNIVERSITY OF CALIFORNIA, BERKELEY UNIVERSITY VERMONT OM THAKKAR ABHRADEEP LUN WANG THAKURTA BOSTON UNIVERSITY OF UNIVERSITY OF UNIVERSITY CALIFORNIA, CALIFORNIA, SANTA BERKELEY CRUZ

Contributions • New Algorithm for Differentially Private Convex Optimization: Approximate Minima Perturbation (AMP) • Can leverage any off-the-shelf optimizer • Works for all convex loss functions • Has a competitive hyperparameter-free variant • Broad Empirical Study • 6 state-of-the-art techniques • 2 models: Logistic Regression, and Huber SVM • 13 datasets: 9 public (4 high-dimensional), 4 real-world use cases • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark

This Talk • Why Privacy for Learning? • Background • Differential Privacy (DP) • Convex Optimization • Approximate Minima Perturbation (AMP) • Broad Empirical Study

Why Privacy for Learning? Sensitive Data 𝐸 Trained Input Training Algorithm 𝐵 Output Model 𝜄 • Models can leak information about training data • Membership inference attacks [Shokri Stronati Song Shmatikov’17, Carlini Liu Kos Erlingsson Song’18, Melis Song Cristofaro Shmatikov’18] • Model inversion attacks [Fredrikson Jha Ristenpart’15, Wu Fredrikson Jha Naughton’16] • Solution?

Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸 : : Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Outcomes 𝜾 𝜾 ∈𝚰

Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸↑ ′ : Felix Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Outcomes 𝜾 𝜾 ∈𝚰

Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸↑ ′ : Felix Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬 ′ ) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Small Outcomes 𝜾 𝜾 ∈𝚰

Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] • Privacy parameters: (𝜁 , 𝜀) • A randomized algorithm 𝐵 : 𝒠 ↑𝑜 → 𝑈 is (𝜁 , 𝜀) -DP if • for all neighboring datasets 𝐸 , 𝐸↑ ′ ∈ 𝒠 ↑𝑜 , i.e., 𝑒𝑗𝑡𝑢(𝐸 , 𝐸↑ ′ ) =1 • for all sets of outcomes 𝑇 ⊆Θ , we have Pr � (𝐵(𝐸) ∈ 𝑇) ≤ 𝑓↑𝜁 Pr � (𝐵(𝐸↑ ′ ) ∈ 𝑇) + 𝜀 𝜁 : Multiplicative change. 𝜀 : Additive change. Typically, 𝜁 = 𝑃( 1 ) Typically, 𝜀 = 𝑃( 1/ 𝑜↑ 2 )

Convex Optimization • Input: 𝑀(𝜄 , 𝐸) • Dataset 𝐸 ∈ 𝒠 ↑𝑜 • Loss function 𝑀(𝜄 , 𝐸) , where • 𝜄 ∈ ℝ ↑𝑞 is a model • Loss 𝑀 is convex in the first parameter 𝜄 • Goal: Output model 𝜄 such that 𝜄 ∈ min ┬𝜄 ∈ ℝ ↑𝑞 � 𝑀(𝜄 , 𝐸) • Applications: 𝜄 𝜄 • Machine Learning, Deep Learning, Collaborative Filtering, etc.

DP Convex Optimization - Prior Work Sensitive Data 𝐸 Trained Training Algorithm 𝐵 Input Output Model 𝜄 DP Objective DP GD/SGD DP Frank Output Permutation Perturbation [Song Chaudhuri Wolfe Sarwate’13, Bassily Smith Perturbation -based SGD [Chaudhuri Monteleoni Thakurta’14, Abadi Chu [Talwar Thakurta Sarwate’11, Kifer Smith Goodfellow McMahan [CMS’11, KST’12, JT’14] Zhang’14] [Wu Li Kumar Chaudhuri Thakurta’12, Jain Thakurta’14] Mironov Talwar Zhang’16] Jha Naughton ’17] - Requires minima of loss - Requires custom optimizer

Approximate Minima Perturbation (AMP) • Input: • Dataset 𝐸 , Loss function: 𝑀(𝜄 , 𝐸) 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) • Privacy parameters: 𝑐 =( 𝜗 , 𝜀 ) 𝑀(𝜄 , 𝐸) • Gradient norm bound 𝛿 • Algorithm (high-level): 1. Split privacy budget into 2 parts 𝑐↓ 1 and 𝑐↓ 2 2. Perturb loss: 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) = 𝑀(𝜄 , 𝐸) + 𝑆𝑓𝑕 ( 𝜄 , 𝑐↓ 1 ) Similar to standard Objective Perturbation 𝜄 𝜄↓𝑞𝑠𝑗𝑤 𝜄 [KST’12]

Approximate Minima Perturbation (AMP) • Input: ‖ ∇ 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸)‖↓ 2 ≤ 𝛿 • Dataset 𝐸 , Loss function: 𝑀(𝜄 , 𝐸) 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) • Privacy parameters: 𝑐 =( 𝜗 , 𝜀 ) • Gradient norm bound 𝛿 • Algorithm (high-level): 1. Split privacy budget into 2 parts 𝑐↓ 1 and 𝑐↓ 2 2. Perturb loss: 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) = 𝑀(𝜄 , 𝐸) + 𝑆𝑓𝑕 ( 𝜄 , 𝑐↓ 1 ) 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 3. Let 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 = 𝜄 s.t. ‖ ∇ 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸)‖↓ 2 ≤ 𝛿 4. Output 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 + 𝑂𝑝𝑗𝑡𝑓(𝑐↓ 2 , 𝛿) Similar to standard Objective Perturbation 𝜄 𝜄↓𝑞𝑠𝑗𝑤 [KST’12]

Utility guarantees • Let 𝜄 minimize 𝑀(𝜄 ; 𝐸) , and the regularization parameter Λ= Θ (𝜊√ � 𝑞 /𝜗𝑜‖𝜄 ‖ ) . • Objective Perturbation [KST’12] : If 𝜄↓𝑞𝑠𝑗𝑤 is the output of obj. pert.: 𝔽 (𝑀(𝜄↓𝑞𝑠𝑗𝑤 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 ‖𝜄 ‖/𝜗𝑜 ) . • AMP (adapted from [KST’12] ): For output 𝜄↓𝐵𝑁𝑄 : 𝔽 (𝑀(𝜄↓𝐵𝑁𝑄 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 ‖𝜄 ‖/𝜗𝑜 + ‖𝜄 ‖𝛿𝑜) . • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP is asymptotically the same as that of Obj. Pert. • Private PSGD [ WL K ↑ + 17 ] : For output 𝜄↓𝑄𝑇𝐻𝐸 , and model space radius 𝑆 : 𝔽 (𝑀(𝜄↓𝑄𝑇𝐻𝐸 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 𝑆/𝜗√ � 𝑜 ) . • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP has a better dependence on 𝑜 than Private PSGD. than Private PSGD.

AMP - Takeaways • Can leverage any off-the-shelf optimizer • Works for all standard convex loss functions • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP: • is asymptotically the same as Objective Perturbation [KST’12] • has a better dependence on 𝑜 than Private PSGD [ WL K ↑ + 17 ] • 𝛿 = 1 /𝑜↑ 2 achievable using standard Python libraries

Empirical Evaluation • Algorithms evaluated: • Approximate Minima Perturbation (AMP) • Private SGD [ BST ↑ ′ 14, ACG ↑ + 17 ] • Private Frank-Wolfe (FW) [ TTZ ↑ ′ 14 ] • Private Permutation-based SGD (PSGD) [ WL K ↑ + 17 ] • Private Strongly-convex (SC) PSGD [ WL K ↑ + 17 ] • Hyperparameter-free (HF) AMP • Splitting the privacy budget: We provide a schedule for low- and high-dim. data by evaluating AMP only on synthetic data • Non-private (NP) Baseline

Empirical Evaluation • Loss functions considered: This talk • Logistic loss • Huber SVM • Procedure: • 80/20 train/test random split • Fix 𝜀 = 1 /𝑜↑ 2 , and vary 𝜗 from 0.01 to 10 • Measure accuracy of final tuned* private model over test set • Report the mean accuracy and std. dev. over 10 independent runs *Does not apply to Hyperparameter-free AMP.

Synthetic Datasets Synthetic-L (10 k ×20) Synthetic-H (2 k ×2 k ) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - Synthetic-H is high-dimensional, but low-rank - Private Frank-Wolfe performs the best on Synthetic-H

High-dimensional Datasets Real-sim (72 k ×21 k ) RCV-1 (50 k ×47 k ) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - Both variants of AMP almost always provide the best performance

Real-world Use Cases (Uber) Dataset 1 (4 m ×23) Dataset 2 (18 m ×294) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - DP as a regularizer [BST’14, Dwork Feldman Hardt Pitassi Reingold Roth ’15] - Even for 𝜗 = 10 ↑ −2 , accuracy of AMP is close to non-private baseline

Conclusions • For large datasets, cost of privacy is low • Private model is within 4% accuracy of the non-private one for 𝜗 =0.01 , and within 2% for 𝜗 =0.1 • AMP almost always provides the best accuracy, and is easily deployable in practice • Hyperparameter-free AMP is competitive w.r.t. tuned state-of-the-art private algorithms • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark Thank You!

Towards Practical Differentially Private Convex Optimization ROGER - PowerPoint PPT Presentation

Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR NEAR UNIVERSITY OF CARNEGIE MELLON UNIVERSITY OF CALIFORNIA, BERKELEY UNIVERSITY VERMONT OM THAKKAR ABHRADEEP LUN WANG THAKURTA BOSTON

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Estimating the Variance of Complex Differentially Private Algorithms Robert Ashmead JSM 2019,

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

Absorption Line Profiles for Absorption Line Profiles for Differentially Rotating 2 M

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Practical Experience with Practical Experience with Practical Experience with Practical

Development, Evaluation, and Management of Differentially Private ML Pipelines on

Differentially Private Model Publishing for Deep Learning Lei Yu, Ling Liu, Calton Pu , Mehmet

Differentially-Private Network Trace Analysis Frank McSherry and Ratul Mahajan Microsoft

Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch

Releasing a Differentially Private Password Frequency Corpus from 70 Million Yahoo! Passwords

A Predictive Differentially-Private Mechanism for Mobility Traces Marco Stronati

Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con

Changes from 1 st LR: GALL License Renewal Guidance Documents: SRP-LR and GALL Report (Revision

AMP Implementation Workshop 9-13-2013 Goal for today: Draft 3-5 goals that your department

Productivity to Amp Up Your Results @fundraiserchad Who is this guy? And why does he think he

VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to

Simple Differential CS Amplifier with Active Load Single Stage Op Amp Input stage of

Decision Support Systems for FDOT District 4 TSM&O Melissa Ackert, P.E. District

Asynchronous Computing in Sense Amplifier-based Pass Transistor Logic Tsung-Te Liu, Louis P.

CENG4480 Lecture 02: Operational Amplifier 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update:

Towards Practical Differentially Private Convex Optimization ROGER - PowerPoint PPT Presentation

Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR NEAR UNIVERSITY OF CARNEGIE MELLON UNIVERSITY OF CALIFORNIA, BERKELEY UNIVERSITY VERMONT OM THAKKAR ABHRADEEP LUN WANG THAKURTA BOSTON

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Estimating the Variance of Complex Differentially Private Algorithms Robert Ashmead JSM 2019,

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

Absorption Line Profiles for Absorption Line Profiles for Differentially Rotating 2 M

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Practical Experience with Practical Experience with Practical Experience with Practical

Development, Evaluation, and Management of Differentially Private ML Pipelines on

Differentially Private Model Publishing for Deep Learning Lei Yu, Ling Liu, Calton Pu , Mehmet

Differentially-Private Network Trace Analysis Frank McSherry and Ratul Mahajan Microsoft

Differentially Private Algorithm and Auction Configuration Ellen Vitercik CMU, Theory Lunch

Releasing a Differentially Private Password Frequency Corpus from 70 Million Yahoo! Passwords

A Predictive Differentially-Private Mechanism for Mobility Traces Marco Stronati

Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con

Changes from 1 st LR: GALL License Renewal Guidance Documents: SRP-LR and GALL Report (Revision

AMP Implementation Workshop 9-13-2013 Goal for today: Draft 3-5 goals that your department

Productivity to Amp Up Your Results @fundraiserchad Who is this guy? And why does he think he

VLSID 2016 KOLKATA, INDIA January 4-8, 2016 Massed Refresh: An Energy-Efficient Technique to

Simple Differential CS Amplifier with Active Load Single Stage Op Amp Input stage of

Decision Support Systems for FDOT District 4 TSM&amp;O Melissa Ackert, P.E. District

Asynchronous Computing in Sense Amplifier-based Pass Transistor Logic Tsung-Te Liu, Louis P.

CENG4480 Lecture 02: Operational Amplifier 1 Bei Yu byu@cse.cuhk.edu.hk (Latest update:

Decision Support Systems for FDOT District 4 TSM&O Melissa Ackert, P.E. District