Learning the Privacy-Utility Trade-off with Bayesian Optimization - - PowerPoint PPT Presentation

learning the privacy utility trade off with bayesian
SMART_READER_LITE
LIVE PREVIEW

Learning the Privacy-Utility Trade-off with Bayesian Optimization - - PowerPoint PPT Presentation

Learning the Privacy-Utility Trade-off with Bayesian Optimization Borja Balle Joint work with B. Avent, J. Gonzalez, T. Diethe and A. Paleyes Utility Privacy <latexit


slide-1
SLIDE 1

Learning the Privacy-Utility Trade-off with Bayesian Optimization

Borja Balle

Joint work with B. Avent, J. Gonzalez, T. Diethe and A. Paleyes

slide-2
SLIDE 2

Privacy Utility

slide-3
SLIDE 3

Theory vs Practice

O

  • d (/δ)

  • <latexit sha1_base64="9kDpveEaFLU+Unv7fLX8RADi8cg=">ADgXicbVLbtNAEN3WXEq4pcAbLxZVpUSKgl1UKqQKkCUBySKIG2lOIo267Gzynptdsch0coSH8ILX8MrPIXfAJru0h1wkhrzZ4zF8/smWSCa/S83xubzpWr165v3WjdvHX7zt329r1TneaKwYClIlXnE6pBcAkD5CjgPFNAk4mAs8nsVcmfzUFpnspPuMxglNBY8ogzihYatw/fBwIi7ASRoswE+rNCE7qBSO/zgIQSDtFoWRbjCnCjLNRSqLQPF4it1xe8fre5W5645/4ewcvfn64Fv69M/JeHuzH4QpyxOQyATVeuh7GY4MVciZgKIV5BoymY0BhNDmgCqpUX/uQZhgV94iNMXvseSRvzQupImoEem2krh7lokdKNU2SPRrdBGB5povUwmNjKhONWrXAn+jxvmGB2MDJdZjiBZ3SjKhYupW67YDbkChmJpHcoUt7O5bErtftE+RKML5ALUvDmIqRpmwJroIpecpeHKkhYCF6ioBTVgQrksRzXHXAj3I5W6aO3WhK1YMp3XPOaoe+/s68vesQKYdS9HX5yAXbPZYsZyOzqFfasoKIrNCqm0mWZd2aMWVcYbz+84Oe1y+Pt28/r7NAau/KtQE64n1nwuQclnVHEZWnXYUhkWLaswf1VP687pXt9/0vc+WKm9JLVtkYfkEekQnzwjR+QtOSEDwsh38oP8JL8cx+k6nrNXh25uXOTcJw1zDv8C/mMsAg=</latexit>

Plot from J. M. Abowd “Disclosure Avoidance for Block Level Data and Protection of Confidentiality in Public Tabulations” (CSAC Meeting, December 2018)

slide-4
SLIDE 4

Example: DP-SGD

Input: dataset z = (z1, . . . , zn) Hyperparameters: learning rate ⌘, mini-batch size m, number of epochs T, noise variance 2, clipping norm L Initialize w 0 for t 2 [T] do for k 2 [n/m] do Sample S ⇢ [n] with |S| = m uniformly at random Let g 1

m

P

j∈S clipL(r`(zj, w)) + 2L m N(0, 2I)

Update w w ⌘g return w

  • 5+ hyper-parameters affecting both privacy and utility
  • For convex problems can be set to achieve near-optimal rates
  • For deep learning applications we don’t have (good) utility bounds

[Bassily et al. 2014; Abadi et al. 2016]

slide-5
SLIDE 5

Privacy-Utility Pareto Front

  • 1. Efficient to compute
  • 2. Use empirical utility measurements
  • 3. Enable fine-grained comparisons

Desiderata

10−1 100 101

ε

0.0 0.2 0.4 0.6 0.8 1.0

Classification error

MNIST Pareto Fronts

MLP1 MLP2

slide-6
SLIDE 6

Problem Formulation

A = {Aλ : Z → W | λ ∈ Λ}

<latexit sha1_base64="CUn8cS7i9fYIDlg1WBH2pLx6Qzk=">ADi3icbVLdbtMwGPUafkZh0IHEDVxYmyYNqYoSpokxmLQB07jYxRB0nWiqynW+tlYdJ7KdkcjkWXgArngULoGXwUmKtHRYcnRyzmd/+U7OFMac/7tdJybty8dXv1TvuvbX7DzrD89VnEoKPRrzWF6MiQLOBPQ0xwuEgkGnPoj+dvS71/CVKxWHzSeQLDiEwFmzBKtKVGneMgInpGCTdHBT7AgcFHIxNwe0FICryP+NAx7iPg1f4a/lYSDhgAgenC1yMOpue61ULXwf+AmwebqDWt8c/np6N1ltuEMY0jUBoyolSA9L9NAQqRnlULSDVEFC6JxMwUwhjkDL3L/oNGQ6S8s1LMD36NRo35goSARqKGp/CnwlmVCPIml3ULjim10IJFSeTS2laUbalkryf9pg1RP9oaGiSTVIGjdaJybC0rzcYhk0A1zy0gVDI7G6YzIgnV9pc0ukDKQV42BzFVwRok81SwWgcLpmUcZ1pSypQEeEiXJUc8I4x+JUEV7qxbsjaWy/Y5NmVbdU5sD0T2RAPNnV6uvTpmBdbtLo3w+NFltaTsIYWIjV72ZK9CVCmrCuM57c63pub1d+/B37RmwSaxKTXD9YP3lHMS01BMimQhtOuxViS7aNmH+cp6ug/Pnr/jeh9s1N6geq2iJ2gDbSMfvUCH6D06Qz1E0Xf0E/1Gf5w1Z8fZd17Xpa2VxZlHqLGc478R0ytG</latexit>

Parametrized Algorithm Class Error (Utility) Oracle Privacy Oracle

E : Λ → [ ]

<latexit sha1_base64="yFd9fxwOuVIYFyEkmawDQxwenc=">ADW3icbVJda9RAFJ1u/Kix6lapL/owtBQqLCFRilUQih/Uhz5UdNvCJpTZyd3dYWcmYWZSE4b9LeKrPvlzfPC/OEkqNLsOTDg59x7c2/uOdMmzD8vdbzbty8dXv9jn934979B/3Nh6c6KxSFIc14ps7HRANnEoaGQ7nuQIixhzOxvN3tX52CUqzTH4xVQ6JIFPJowS46iL/tYH/BrHxy4gJTg2GR6Fgyi56O+EQdgcvAqiK7BzuI163x7/enpysdkL4jSjhQBpKCdaj6IwN4klyjDKYeHhYac0DmZgp1CJsCoyrH/oDVQmq8sNbM3UhFxz9yUBIBOrFNvwu865gUTzLlrjS4YTsViNC6EmPnFMTM9LJWk/TRoWZHCSWybwIGlbaFJw7OZSDw+nTAE1vHKAUMVcb5jOiCLUuBF3qkDBQV12G7FNwRxoly0LyWiWLg2p5KY0ijhSgxGEybpVe8Q4x5+J1At/txVcxlrZe8+mzOjBsfuvcnCkAObPruvd1mCm/aAimqe2LIdqR+nMHEr1LxZUdV5W8XWvoUNg1cHgzCob7jvHtG+iwG3WY3VxquB7ZdzkNaz4liMnXb4VLlZuG7DYuW92kVnD4PohdB+Mmt2lvUnX0BG2jPRShl+gQfUQnaIgosug7+oF+9v54nud7G621t3YV8wh1jrf1FwsCGbM=</latexit>

P : Λ → [ ∞)

<latexit sha1_base64="oyXPwVlGjlMOWU17+3SHjT4QY0c=">ADYHicbVJNa9tAEN3Y/XDdj9gt9JD2IBICRghpYSmhUJoC+khB5fWScAyYb0a2YtXK7E7Si2Eof+kt17b/pxe+0s6klKI7Q6smH3vzYxmdsapkhY97/dGo3nr9p27rXvt+w8ePtrsdB+f2SQzAgYiUYm5GHMLSmoYoEQF6kBHo8VnI9n70r+/AqMlYn+jHkKo5hPtIyk4EjQZWer7x2glMKCLkTYOIMvV4gdYT5/mVnx3O9ypx1x792do63WePb01/P+5fdhuEichi0CgUt3boeymOCm5QCgWLdpBZSLmY8QkUE0hiQJMT+s8tEOb4RY4feN7Il7SD8nVPAY7KqmF84uIaETJYaORqdClyrw2No8HpMy5ji1q1wJ/o8bZhgdjQqp0wxBi7pQlCmHhlNO0AmlAYEqJ4cLI6k3R0y54QJpzktVIFNgrpYbKaqCKYhldJ5pKZJwZUhzhXM0nEALGHN6F2q1OJFKOZ+4tov2bk1QxpLZey8nEm3vlB5X904MwGz/pvpml3OgafdEnM9GxbweaTsIaI9qm5FnJd5a6YodYvCc18d9Ty3PN4hfxDigFar0paBOuB9Z8r0JOST7mROqTtoFQpLtq0Yf7qPq07Zweu/8L1PtKqvW1tdgzts32mM9esmP2gfXZgAn2lX1nP9jPxp9mq7nZ7NbSxsZ1zBO2ZM2tvzIPHBE=</latexit>
  • Eg. DP-SGD
  • Eg. Expected

classification error

  • Eg. Epsilon for

fixed delta

slide-7
SLIDE 7

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-8
SLIDE 8

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-9
SLIDE 9

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-10
SLIDE 10

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-11
SLIDE 11

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-12
SLIDE 12

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-13
SLIDE 13

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-14
SLIDE 14

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-15
SLIDE 15

Pareto-Optimal Points

Hyper-parameter Space Privacy Loss Error

slide-16
SLIDE 16

Bayesian Optimization (BO)

  • Gradient-free
  • ptimization for black-

box functions

  • Widely used in

applications (HPO in ML, scheduling & planning, experimental design, etc)

slide-17
SLIDE 17

Bayesian Optimization (BO)

  • Gradient-free
  • ptimization for black-

box functions

  • Widely used in

applications (HPO in ML, scheduling & planning, experimental design, etc)

λ⋆ = argmin

λ∈Λ

F(λ)

<latexit sha1_base64="TGEkGqWC2Y2pTh7doXDXE9iPXA=">ADiHicbVJdb9MwFPVWPkb56uCRF4tpUidVQKqtj1MmgpiPOxhCLpNakrlOretVduJ7JvRysof4Jfwa3gF3vkZPOAkRVpbLDk6OefYN/fmjFIpLAbBr63t2p279+7vPKg/fPT4ydPG7rNLm2SGQ48nMjHXI2ZBCg09FCjhOjXA1EjC1Wj2ptCvbsBYkehPuEhoNhEi7HgD01bHQj6c0x+wi8zk9IRGiuE0SR0zEyV0PnRLC42EptF5iXP6rmkD4aNvaAdlItugnAJ9k675HfU+frnYri73Y7ihGcKNHLJrO2HQYoDXxAFl5DXo8xCyviMTcBNIFGAZuHZf9AhzPGLiHF6EgZcrfj7HmqmwA5cOZyc7nsmpuPE+K2RluxKBasXaiRdxaN23WtIP+n9TMcHw2c0GmGoHlVaJxJigktJk1jYCjXHjAuBG+N8qnzDCO/n+sVIFMgrlZbcSVBVPgq+w804In8dqQ5hLnaJgnLaBiQhetujMhJf3ItM3r+5XgbyU5lsxEWhb5z4EunVmAGYHt923u5yDn3aLq8Vs4ObVSOtRDGOft/LNqUVxb6W4wpe7oH181AraxQ46/hF2/BnwMSytLto8WH25BD0p9JQZoWOfDn9VindJyxcz9MmuHzVDl+3gw9l1Kq1Q16Ql6RJQnJITsl7ckF6hJNv5Dv5QX7W6rWgdlg7rqzbW8sz8nKqnX/Aix3LoY=</latexit>

F : Λ ⊂ Rp → R

<latexit sha1_base64="7mCaJAcfnko16yIWReBdRKtLbmA=">ADdXicbVLdbtMwFPZafkb56+ASgSzK0JCqLAFNDCTEBIhxsYsN6DZpKZXjnLZWHSeynZHIyuUegXdB4hYegmfgHm5xkiE1LUdydPx95yfn8wkSzpR23Z8rfaFi5cur17pXL12/cbN7tqtQxWnksKAxjyWxwFRwJmAgWaw3EigUQBh6Ng9rkj05BKhaLjzpPYBiRiWBjRom20Ki7+RY/x/6eTQgJ9lUaKNDYj4ieBoF5X3xKsK/jOWDU7bmOWxledrxzp/fy9m9rwe/zvZHay3HD2OaRiA05USpE89N9NAQqRnlUHT8VEFC6IxMwEwgjkDL3KL/XKMh059ZqKcvPJdGjfgT6woSgRqaSokCr1skxONY2iM0rtBGBxIplUeBjSxHUotcCf6PO0n1eHtomEhSDYLWjcYpx1acUlYcMglU89w6hEpmZ8N0SiSh2orf6AIpB3naHMRUDROgTRLBaNxuCBSxnWmJbGgfauIMFGOanYZ5/gDEarorNeErVgyG2/YhGnV37MvLvq7EmD2aD56fsoMrNp9GuWzoclqSTt+CGO7XNXNRHlZt2ZMGVcY13m23Xed8rhb9uNt2RywO1eFGn85sf5zDmJS8gmRTIR2O2ypRBcdu2He4j4tO4ePHe+J4x64vZ1XqLZVdAfdRxvIQ0/RDnqH9tEAUfQFfUPf0Y/Wn/bd9oP2wzq0tXKecxs1rL35FxbGKAY=</latexit>

Input: Goal:

Expensive, non-convex, smooth

slide-18
SLIDE 18

Bayesian Optimization (BO)

  • Gradient-free
  • ptimization for black-

box functions

  • Widely used in

applications (HPO in ML, scheduling & planning, experimental design, etc)

λ⋆ = argmin

λ∈Λ

F(λ)

<latexit sha1_base64="TGEkGqWC2Y2pTh7doXDXE9iPXA=">ADiHicbVJdb9MwFPVWPkb56uCRF4tpUidVQKqtj1MmgpiPOxhCLpNakrlOretVduJ7JvRysof4Jfwa3gF3vkZPOAkRVpbLDk6OefYN/fmjFIpLAbBr63t2p279+7vPKg/fPT4ydPG7rNLm2SGQ48nMjHXI2ZBCg09FCjhOjXA1EjC1Wj2ptCvbsBYkehPuEhoNhEi7HgD01bHQj6c0x+wi8zk9IRGiuE0SR0zEyV0PnRLC42EptF5iXP6rmkD4aNvaAdlItugnAJ9k675HfU+frnYri73Y7ihGcKNHLJrO2HQYoDXxAFl5DXo8xCyviMTcBNIFGAZuHZf9AhzPGLiHF6EgZcrfj7HmqmwA5cOZyc7nsmpuPE+K2RluxKBasXaiRdxaN23WtIP+n9TMcHw2c0GmGoHlVaJxJigktJk1jYCjXHjAuBG+N8qnzDCO/n+sVIFMgrlZbcSVBVPgq+w804In8dqQ5hLnaJgnLaBiQhetujMhJf3ItM3r+5XgbyU5lsxEWhb5z4EunVmAGYHt923u5yDn3aLq8Vs4ObVSOtRDGOft/LNqUVxb6W4wpe7oH181AraxQ46/hF2/BnwMSytLto8WH25BD0p9JQZoWOfDn9VindJyxcz9MmuHzVDl+3gw9l1Kq1Q16Ql6RJQnJITsl7ckF6hJNv5Dv5QX7W6rWgdlg7rqzbW8sz8nKqnX/Aix3LoY=</latexit>

F : Λ ⊂ Rp → R

<latexit sha1_base64="7mCaJAcfnko16yIWReBdRKtLbmA=">ADdXicbVLdbtMwFPZafkb56+ASgSzK0JCqLAFNDCTEBIhxsYsN6DZpKZXjnLZWHSeynZHIyuUegXdB4hYegmfgHm5xkiE1LUdydPx95yfn8wkSzpR23Z8rfaFi5cur17pXL12/cbN7tqtQxWnksKAxjyWxwFRwJmAgWaw3EigUQBh6Ng9rkj05BKhaLjzpPYBiRiWBjRom20Ki7+RY/x/6eTQgJ9lUaKNDYj4ieBoF5X3xKsK/jOWDU7bmOWxledrxzp/fy9m9rwe/zvZHay3HD2OaRiA05USpE89N9NAQqRnlUHT8VEFC6IxMwEwgjkDL3KL/XKMh059ZqKcvPJdGjfgT6woSgRqaSokCr1skxONY2iM0rtBGBxIplUeBjSxHUotcCf6PO0n1eHtomEhSDYLWjcYpx1acUlYcMglU89w6hEpmZ8N0SiSh2orf6AIpB3naHMRUDROgTRLBaNxuCBSxnWmJbGgfauIMFGOanYZ5/gDEarorNeErVgyG2/YhGnV37MvLvq7EmD2aD56fsoMrNp9GuWzoclqSTt+CGO7XNXNRHlZt2ZMGVcY13m23Xed8rhb9uNt2RywO1eFGn85sf5zDmJS8gmRTIR2O2ypRBcdu2He4j4tO4ePHe+J4x64vZ1XqLZVdAfdRxvIQ0/RDnqH9tEAUfQFfUPf0Y/Wn/bd9oP2wzq0tXKecxs1rL35FxbGKAY=</latexit>

Input: Goal:

Expensive, non-convex, smooth

Bayesian Optimization Loop: (λ F(λ)) (λk F(λk))

<latexit sha1_base64="iwGSn4DImkd/v9Z3cKjJVJgXYmk=">ADgnicbVJfb9MwEPeaAaP8WQcSL/AQbZrUoipKmCaGNKQJEOBhyHoNqmpKte5tlZsJ7KdkcjqZ0HiG/AteIU3vg1OMqBpOcnR+f73V3ufJOUaV9/9dGy9m8cfPW1u32nbv37m93dh6cqySTBAYkYm8nGAFjAoYaKoZXKYSMJ8wuJjEr0v+4gqkon4pIsURhzPBJ1SgrWFxp3jbsisOsLjoO+/Xfp9fpuyKJEq7F42XJXGvN+7s+Z5fmbvuBNfO3skuan159O3J2Xin5YVRQjIOQhOGlRoGfqpHBktNCYNFO8wUpJjEeAZmBgkHLQuL/nGNhlx/pGevwx8whv6oXUF5qBGphrLwt23SOROE2mP0G6FNipgrlTBJ1bJsZ6rVa4E/8cNMz09Ghkq0kyDIHWhacZcnbjljN2ISiCaFdbBRFLbm0vmWGKi7Us0qkDGQF41GzFVwRIE80zQUkSrQwpZzrXEltQgeaYirJVc0oZcz9ioRbt/ZqwGUum+4bOqH3T9/b5Rf9UAsS9ZfVylznYafcJL+KRyeuRtsMIpnbTqpvhRZm3ZkypWxjfe3HU973y+If2ExzaGLALWElNuB5Y/zkDMSv5FEsqIrsdNlWqF27YcHqPq0758+84MDzP9hVe4Vq20KP0S7qogA9RyfoHTpDA0TQV/Qd/UA/nU3nqRM4B7W0tXEd8xA1zDn+DUsdJ0Q=</latexit>

Given k evaluations

  • 1. Build a surrogate model for F (eg. Gaussian process)
  • 2. Find most promising next evaluation
slide-19
SLIDE 19

BO: 1-Dimensional Example

slide-20
SLIDE 20

BO: 1-Dimensional Example

slide-21
SLIDE 21

BO: 1-Dimensional Example

slide-22
SLIDE 22

BO: 1-Dimensional Example

slide-23
SLIDE 23

BO: 1-Dimensional Example

slide-24
SLIDE 24

BO: 1-Dimensional Example

slide-25
SLIDE 25

BO: 1-Dimensional Example

slide-26
SLIDE 26

BO: 1-Dimensional Example

slide-27
SLIDE 27

BO: 1-Dimensional Example

slide-28
SLIDE 28

The DPareto Algorithm

Input: hyperparameter set Λ, privacy oracle P, error oracle E, anti-ideal point v†, number

  • f initial points k0, number of iterations k,

prior GP Initialize dataset D ; for i 2 [k0] do Sample random point λ 2 Λ Evaluate oracles v (P(λ), E(λ)) Augment dataset D D [ {(λ, v)} for i 2 [k] do Fit a GP to the transformed privacy using D Fit a GP to the transformed utility using D Optimize the HVPoI acquisition function in

  • Eq. (2) using anti-ideal point v† and obtain a

new query point λ Evaluate oracles v (P(λ), E(λ)) Augment dataset D D [ {(λ, v)} return Pareto front PF({v | (λ, v) 2 D})

  • Find privacy-utility Pareto front

using multi-objective Bayesian

  • ptimization
  • Use transformed Gaussian

processes to model privacy and error oracles

  • Acquisition function optimizes

hyper-volume based probability of improvement [Couckuyt et al. 2014]

slide-29
SLIDE 29

Example: Sparse Vector Technique

10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

ε

2000 4000 6000 8000 10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

1 − F1

0.0 0.2 0.4 0.6 0.8 10−1 100 101 102

ε

0.0 0.2 0.4 0.6 0.8 1.0

1 − F1

Pareto front

10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

Pareto inputs

Input: dataset z, queries q1, . . . , qm Hyperparameters: noise b, bound C c 0, w (0, . . . , 0) 2 {0, 1}m b1 b/(1 + (2C)1/3), b2 b b1, ρ Lap(b1) for i 2 [m] do ν Lap(b2) if qi(z) + ν 1

2 + ρ then

wi 1, c c + 1 if c C then return w return w

[Lyu et al. 2017]

Setup

  • 100 queries with 0/1 output, sensitivity 1
  • 10% queries return 1 (randomly selected)
  • Privacy: SVT analysis
  • Error: 1 - F-score (avg. over 50 runs)
slide-30
SLIDE 30

Example: Sparse Vector Technique

10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

ε

2000 4000 6000 8000 10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

1 − F1

0.0 0.2 0.4 0.6 0.8 10−1 100 101 102

ε

0.0 0.2 0.4 0.6 0.8 1.0

1 − F1

Pareto front

10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

Pareto inputs

10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

ε (predicted)

1000 2000 3000 4000 5000 6000 7000 10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

1 − F1 (predicted)

0.0 0.2 0.4 0.6 0.8 1.0 101 102

ε

0.0 0.2 0.4 0.6 0.8

1 − F1

Pareto front

True Pareto Empirical Pareto Observation outputs Non-dominated set ˜ Γ 10−2 10−1 100 101 102

b

5 10 15 20 25 30

C

HVPoI

Next location 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

slide-31
SLIDE 31

Implementing the Oracles

Privacy Oracle

  • Epsilon for fixed delta / Others DP variants / Attacks success metrics
  • Closed-form expression / Numerical calculation (eg. moments accountant)

Error Oracle

  • Fixed input / Distribution over inputs / Worst-case (over a set of) inputs
  • On expectation / With high probability
  • Exact expression / Empirical evaluation
slide-32
SLIDE 32

Implementing the Oracles

Privacy Oracle

  • Epsilon for fixed delta / Others DP variants / Attacks success metrics
  • Closed-form expression / Numerical calculation (eg. moments accountant)

Error Oracle

  • Fixed input / Distribution over inputs / Worst-case (over a set of) inputs
  • On expectation / With high probability
  • Exact expression / Empirical evaluation
slide-33
SLIDE 33

Machine Learning Experiments

  • Adult dataset (n=40K, d=123)
  • Logistic regression (SGD and ADAM)
  • Linear SVM (SGD)
  • MNIST dataset (n=60K, d=784)
  • MLP1 (1000 hidden)
  • MLP2 (128-64 hidden)

10−1 100

ε

0.150 0.175 0.200 0.225 0.250 0.275 0.300

Classification error

Adult Pareto Fronts

LogReg+SGD LogReg+ADAM SVM+SGD 10−1 100 101

ε

0.0 0.2 0.4 0.6 0.8 1.0

Classification error

MNIST Pareto Fronts

MLP1 MLP2

slide-34
SLIDE 34

DPareto vs Random Sampling

100 200

Sampled points

7.0 7.5 8.0 8.5 9.0

PF hypervolume

Hypervolume Evolution

MLP1 (RS) MLP1 (BO) MLP2 (RS) MLP2 (BO) 10−1 100 101

ε

0.0 0.2 0.4 0.6 0.8 1.0

Classification error

MLP2 Pareto Fronts

Initial +256 RS +256 BO 10−1 100 101

ε

0.16 0.18 0.20 0.22 0.24

Classification error

LogReg+SGD Samples

1500 RS 256 BO

slide-35
SLIDE 35

Conclusion

  • Empirical privacy-utility trade-off evaluation enables application-specific

decisions

  • Bayesian optimization provides computationally efficient method to

recover the Pareto front (esp. with large number of hyper-parameters) Future work:

  • Address leakage in Pareto front (when error oracle is input-specific)
  • Include further criteria (eg. running time of parametrized algorithm)