Learning the Privacy-Utility Trade-off with Bayesian Optimization
Borja Balle
Joint work with B. Avent, J. Gonzalez, T. Diethe and A. Paleyes
Learning the Privacy-Utility Trade-off with Bayesian Optimization - - PowerPoint PPT Presentation
Learning the Privacy-Utility Trade-off with Bayesian Optimization Borja Balle Joint work with B. Avent, J. Gonzalez, T. Diethe and A. Paleyes Utility Privacy <latexit
Borja Balle
Joint work with B. Avent, J. Gonzalez, T. Diethe and A. Paleyes
Plot from J. M. Abowd “Disclosure Avoidance for Block Level Data and Protection of Confidentiality in Public Tabulations” (CSAC Meeting, December 2018)
Input: dataset z = (z1, . . . , zn) Hyperparameters: learning rate ⌘, mini-batch size m, number of epochs T, noise variance 2, clipping norm L Initialize w 0 for t 2 [T] do for k 2 [n/m] do Sample S ⇢ [n] with |S| = m uniformly at random Let g 1
m
P
j∈S clipL(r`(zj, w)) + 2L m N(0, 2I)
Update w w ⌘g return w
[Bassily et al. 2014; Abadi et al. 2016]
10−1 100 101
ε
0.0 0.2 0.4 0.6 0.8 1.0
Classification error
MNIST Pareto Fronts
MLP1 MLP2
Parametrized Algorithm Class Error (Utility) Oracle Privacy Oracle
classification error
fixed delta
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
Hyper-parameter Space Privacy Loss Error
box functions
applications (HPO in ML, scheduling & planning, experimental design, etc)
box functions
applications (HPO in ML, scheduling & planning, experimental design, etc)
λ⋆ = argmin
λ∈Λ
F(λ)
<latexit sha1_base64="TGEkGqWC2Y2pTh7doXDXE9iPXA=">ADiHicbVJdb9MwFPVWPkb56uCRF4tpUidVQKqtj1MmgpiPOxhCLpNakrlOretVduJ7JvRysof4Jfwa3gF3vkZPOAkRVpbLDk6OefYN/fmjFIpLAbBr63t2p279+7vPKg/fPT4ydPG7rNLm2SGQ48nMjHXI2ZBCg09FCjhOjXA1EjC1Wj2ptCvbsBYkehPuEhoNhEi7HgD01bHQj6c0x+wi8zk9IRGiuE0SR0zEyV0PnRLC42EptF5iXP6rmkD4aNvaAdlItugnAJ9k675HfU+frnYri73Y7ihGcKNHLJrO2HQYoDXxAFl5DXo8xCyviMTcBNIFGAZuHZf9AhzPGLiHF6EgZcrfj7HmqmwA5cOZyc7nsmpuPE+K2RluxKBasXaiRdxaN23WtIP+n9TMcHw2c0GmGoHlVaJxJigktJk1jYCjXHjAuBG+N8qnzDCO/n+sVIFMgrlZbcSVBVPgq+w804In8dqQ5hLnaJgnLaBiQhetujMhJf3ItM3r+5XgbyU5lsxEWhb5z4EunVmAGYHt923u5yDn3aLq8Vs4ObVSOtRDGOft/LNqUVxb6W4wpe7oH181AraxQ46/hF2/BnwMSytLto8WH25BD0p9JQZoWOfDn9VindJyxcz9MmuHzVDl+3gw9l1Kq1Q16Ql6RJQnJITsl7ckF6hJNv5Dv5QX7W6rWgdlg7rqzbW8sz8nKqnX/Aix3LoY=</latexit>F : Λ ⊂ Rp → R
<latexit sha1_base64="7mCaJAcfnko16yIWReBdRKtLbmA=">ADdXicbVLdbtMwFPZafkb56+ASgSzK0JCqLAFNDCTEBIhxsYsN6DZpKZXjnLZWHSeynZHIyuUegXdB4hYegmfgHm5xkiE1LUdydPx95yfn8wkSzpR23Z8rfaFi5cur17pXL12/cbN7tqtQxWnksKAxjyWxwFRwJmAgWaw3EigUQBh6Ng9rkj05BKhaLjzpPYBiRiWBjRom20Ki7+RY/x/6eTQgJ9lUaKNDYj4ieBoF5X3xKsK/jOWDU7bmOWxledrxzp/fy9m9rwe/zvZHay3HD2OaRiA05USpE89N9NAQqRnlUHT8VEFC6IxMwEwgjkDL3KL/XKMh059ZqKcvPJdGjfgT6woSgRqaSokCr1skxONY2iM0rtBGBxIplUeBjSxHUotcCf6PO0n1eHtomEhSDYLWjcYpx1acUlYcMglU89w6hEpmZ8N0SiSh2orf6AIpB3naHMRUDROgTRLBaNxuCBSxnWmJbGgfauIMFGOanYZ5/gDEarorNeErVgyG2/YhGnV37MvLvq7EmD2aD56fsoMrNp9GuWzoclqSTt+CGO7XNXNRHlZt2ZMGVcY13m23Xed8rhb9uNt2RywO1eFGn85sf5zDmJS8gmRTIR2O2ypRBcdu2He4j4tO4ePHe+J4x64vZ1XqLZVdAfdRxvIQ0/RDnqH9tEAUfQFfUPf0Y/Wn/bd9oP2wzq0tXKecxs1rL35FxbGKAY=</latexit>Input: Goal:
Expensive, non-convex, smooth
box functions
applications (HPO in ML, scheduling & planning, experimental design, etc)
λ⋆ = argmin
λ∈Λ
F(λ)
<latexit sha1_base64="TGEkGqWC2Y2pTh7doXDXE9iPXA=">ADiHicbVJdb9MwFPVWPkb56uCRF4tpUidVQKqtj1MmgpiPOxhCLpNakrlOretVduJ7JvRysof4Jfwa3gF3vkZPOAkRVpbLDk6OefYN/fmjFIpLAbBr63t2p279+7vPKg/fPT4ydPG7rNLm2SGQ48nMjHXI2ZBCg09FCjhOjXA1EjC1Wj2ptCvbsBYkehPuEhoNhEi7HgD01bHQj6c0x+wi8zk9IRGiuE0SR0zEyV0PnRLC42EptF5iXP6rmkD4aNvaAdlItugnAJ9k675HfU+frnYri73Y7ihGcKNHLJrO2HQYoDXxAFl5DXo8xCyviMTcBNIFGAZuHZf9AhzPGLiHF6EgZcrfj7HmqmwA5cOZyc7nsmpuPE+K2RluxKBasXaiRdxaN23WtIP+n9TMcHw2c0GmGoHlVaJxJigktJk1jYCjXHjAuBG+N8qnzDCO/n+sVIFMgrlZbcSVBVPgq+w804In8dqQ5hLnaJgnLaBiQhetujMhJf3ItM3r+5XgbyU5lsxEWhb5z4EunVmAGYHt923u5yDn3aLq8Vs4ObVSOtRDGOft/LNqUVxb6W4wpe7oH181AraxQ46/hF2/BnwMSytLto8WH25BD0p9JQZoWOfDn9VindJyxcz9MmuHzVDl+3gw9l1Kq1Q16Ql6RJQnJITsl7ckF6hJNv5Dv5QX7W6rWgdlg7rqzbW8sz8nKqnX/Aix3LoY=</latexit>F : Λ ⊂ Rp → R
<latexit sha1_base64="7mCaJAcfnko16yIWReBdRKtLbmA=">ADdXicbVLdbtMwFPZafkb56+ASgSzK0JCqLAFNDCTEBIhxsYsN6DZpKZXjnLZWHSeynZHIyuUegXdB4hYegmfgHm5xkiE1LUdydPx95yfn8wkSzpR23Z8rfaFi5cur17pXL12/cbN7tqtQxWnksKAxjyWxwFRwJmAgWaw3EigUQBh6Ng9rkj05BKhaLjzpPYBiRiWBjRom20Ki7+RY/x/6eTQgJ9lUaKNDYj4ieBoF5X3xKsK/jOWDU7bmOWxledrxzp/fy9m9rwe/zvZHay3HD2OaRiA05USpE89N9NAQqRnlUHT8VEFC6IxMwEwgjkDL3KL/XKMh059ZqKcvPJdGjfgT6woSgRqaSokCr1skxONY2iM0rtBGBxIplUeBjSxHUotcCf6PO0n1eHtomEhSDYLWjcYpx1acUlYcMglU89w6hEpmZ8N0SiSh2orf6AIpB3naHMRUDROgTRLBaNxuCBSxnWmJbGgfauIMFGOanYZ5/gDEarorNeErVgyG2/YhGnV37MvLvq7EmD2aD56fsoMrNp9GuWzoclqSTt+CGO7XNXNRHlZt2ZMGVcY13m23Xed8rhb9uNt2RywO1eFGn85sf5zDmJS8gmRTIR2O2ypRBcdu2He4j4tO4ePHe+J4x64vZ1XqLZVdAfdRxvIQ0/RDnqH9tEAUfQFfUPf0Y/Wn/bd9oP2wzq0tXKecxs1rL35FxbGKAY=</latexit>Input: Goal:
Expensive, non-convex, smooth
Bayesian Optimization Loop: (λ F(λ)) (λk F(λk))
<latexit sha1_base64="iwGSn4DImkd/v9Z3cKjJVJgXYmk=">ADgnicbVJfb9MwEPeaAaP8WQcSL/AQbZrUoipKmCaGNKQJEOBhyHoNqmpKte5tlZsJ7KdkcjqZ0HiG/AteIU3vg1OMqBpOcnR+f73V3ufJOUaV9/9dGy9m8cfPW1u32nbv37m93dh6cqySTBAYkYm8nGAFjAoYaKoZXKYSMJ8wuJjEr0v+4gqkon4pIsURhzPBJ1SgrWFxp3jbsisOsLjoO+/Xfp9fpuyKJEq7F42XJXGvN+7s+Z5fmbvuBNfO3skuan159O3J2Xin5YVRQjIOQhOGlRoGfqpHBktNCYNFO8wUpJjEeAZmBgkHLQuL/nGNhlx/pGevwx8whv6oXUF5qBGphrLwt23SOROE2mP0G6FNipgrlTBJ1bJsZ6rVa4E/8cNMz09Ghkq0kyDIHWhacZcnbjljN2ISiCaFdbBRFLbm0vmWGKi7Us0qkDGQF41GzFVwRIE80zQUkSrQwpZzrXEltQgeaYirJVc0oZcz9ioRbt/ZqwGUum+4bOqH3T9/b5Rf9UAsS9ZfVylznYafcJL+KRyeuRtsMIpnbTqpvhRZm3ZkypWxjfe3HU973y+If2ExzaGLALWElNuB5Y/zkDMSv5FEsqIrsdNlWqF27YcHqPq0758+84MDzP9hVe4Vq20KP0S7qogA9RyfoHTpDA0TQV/Qd/UA/nU3nqRM4B7W0tXEd8xA1zDn+DUsdJ0Q=</latexit>Given k evaluations
Input: hyperparameter set Λ, privacy oracle P, error oracle E, anti-ideal point v†, number
prior GP Initialize dataset D ; for i 2 [k0] do Sample random point λ 2 Λ Evaluate oracles v (P(λ), E(λ)) Augment dataset D D [ {(λ, v)} for i 2 [k] do Fit a GP to the transformed privacy using D Fit a GP to the transformed utility using D Optimize the HVPoI acquisition function in
new query point λ Evaluate oracles v (P(λ), E(λ)) Augment dataset D D [ {(λ, v)} return Pareto front PF({v | (λ, v) 2 D})
using multi-objective Bayesian
processes to model privacy and error oracles
hyper-volume based probability of improvement [Couckuyt et al. 2014]
10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
ε
2000 4000 6000 8000 10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
1 − F1
0.0 0.2 0.4 0.6 0.8 10−1 100 101 102
ε
0.0 0.2 0.4 0.6 0.8 1.0
1 − F1
Pareto front
10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
Pareto inputs
Input: dataset z, queries q1, . . . , qm Hyperparameters: noise b, bound C c 0, w (0, . . . , 0) 2 {0, 1}m b1 b/(1 + (2C)1/3), b2 b b1, ρ Lap(b1) for i 2 [m] do ν Lap(b2) if qi(z) + ν 1
2 + ρ then
wi 1, c c + 1 if c C then return w return w
[Lyu et al. 2017]
Setup
10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
ε
2000 4000 6000 8000 10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
1 − F1
0.0 0.2 0.4 0.6 0.8 10−1 100 101 102
ε
0.0 0.2 0.4 0.6 0.8 1.0
1 − F1
Pareto front
10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
Pareto inputs
10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
ε (predicted)
1000 2000 3000 4000 5000 6000 7000 10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
1 − F1 (predicted)
0.0 0.2 0.4 0.6 0.8 1.0 101 102
ε
0.0 0.2 0.4 0.6 0.8
1 − F1
Pareto front
True Pareto Empirical Pareto Observation outputs Non-dominated set ˜ Γ 10−2 10−1 100 101 102
b
5 10 15 20 25 30
C
HVPoI
Next location 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Privacy Oracle
Error Oracle
Privacy Oracle
Error Oracle
10−1 100
ε
0.150 0.175 0.200 0.225 0.250 0.275 0.300
Classification error
Adult Pareto Fronts
LogReg+SGD LogReg+ADAM SVM+SGD 10−1 100 101
ε
0.0 0.2 0.4 0.6 0.8 1.0
Classification error
MNIST Pareto Fronts
MLP1 MLP2
100 200
Sampled points
7.0 7.5 8.0 8.5 9.0
PF hypervolume
Hypervolume Evolution
MLP1 (RS) MLP1 (BO) MLP2 (RS) MLP2 (BO) 10−1 100 101
ε
0.0 0.2 0.4 0.6 0.8 1.0
Classification error
MLP2 Pareto Fronts
Initial +256 RS +256 BO 10−1 100 101
ε
0.16 0.18 0.20 0.22 0.24
Classification error
LogReg+SGD Samples
1500 RS 256 BO
decisions
recover the Pareto front (esp. with large number of hyper-parameters) Future work: