Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness
Ian Goodfellow, OpenAI Research Scientist NIPS 2016 Workshop on Bayesian Deep Learning Barcelona, 2016-12-10
Adversarial Approaches to Bayesian Learning and Bayesian Approaches - - PowerPoint PPT Presentation
Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness Ian Goodfellow, OpenAI Research Scientist NIPS 2016 Workshop on Bayesian Deep Learning Barcelona, 2016-12-10 Speculation on Three Topics Can we
Ian Goodfellow, OpenAI Research Scientist NIPS 2016 Workshop on Bayesian Deep Learning Barcelona, 2016-12-10
(Goodfellow 2016)
posterior over parameters?
(Goodfellow 2016)
Training examples Model samples
(Goodfellow 2016)
x sampled from data Differentiable function D D(x) tries to be near 1 Input noise z Differentiable function G x sampled from model D D tries to make D(G(z)) near 0, G tries to make D(G(z)) near 1
(Goodfellow 2016)
being correct
J(D) = 1 2Ex∼pdata log D(x) 1 2Ez log (1 D (G(z))) J(G) = J(D)
(Goodfellow 2016)
D(x) = pdata(x) pdata(x) + pmodel(x)
Data Model distribution
Optimal D(x) for any pdata(x) and pmodel(x) is always
z x
Discriminator
Estimating this ratio using supervised learning is the key approximation mechanism used by GANs
(Goodfellow 2016)
(Goodfellow 2016)
complicated
a distribution on parameters
(Goodfellow 2016)
parameters would require samples of the parameters from the true posterior
generated using the true posterior
(Goodfellow 2016)
p(X | θ) p(X | θ∗) = Πi p(x(i) | θ) p(x(i) | θ∗)
discriminator
each new parameter value
parameters, we must include the discriminator learning process in the graph for the estimate, as in unrolled GANs (Metz et al 2016)
(Goodfellow 2016)
(Goodfellow 2016)
Generation process Posterior sampling process
(Goodfellow 2016)
from prior, rather than approximate posterior indistinguishable from true posterior
(Goodfellow 2016)
2016)
(Goodfellow 2016)
panda 58% confidence gibbon 99% confidence
(Goodfellow 2016)
Argument to softmax
(Goodfellow 2016)
mixture models implement roughly the same marginal over x, with very different posteriors over the classes. The likelihood criterion cannot strongly prefer one to the other, and in many cases will prefer the bad
(Goodfellow 2016)
Attacking a linear model Attacking an RBF model
(Goodfellow 2016)
RBF kernel (amortized inference of Bayesian model)
(Goodfellow 2016)
Universal engineering machine (model-based optimization) Training data Extrapolation
Make new inventions by finding input that maximizes model’s predicted performance
(Goodfellow 2016)
Bayes
example problem and unlock the potential of model- based optimization