Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - - PowerPoint PPT Presentation

β–Ά
uncertainty in bayesian neural nets
SMART_READER_LITE
LIVE PREVIEW

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - - PowerPoint PPT Presentation

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization experiments BNN results BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: "($) [(|,


slide-1
SLIDE 1

Uncertainty in Bayesian Neural Nets

August 4 2017

slide-2
SLIDE 2

Overview

  • BNN review
  • Visualization experiments
  • BNN results
slide-3
SLIDE 3

BNN

Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: 𝐹"($)[π‘ž(𝑧|𝑦, 𝑋)]

slide-4
SLIDE 4

BNN

  • Variational Inference
  • Maximize lower bound on the marginal log-likelihood

log π‘ž 𝑍 π‘Œ β‰₯ 𝐹" $ [log π‘ž 𝑍 π‘Œ, 𝑋 + log π‘ž 𝑋 βˆ’ log π‘Ÿ 𝑋 ]

Prior Posterior Approx Likelihood Y X W Dependent on the number of data points 1 𝑁 9 log π‘ž 𝑍

: π‘Œ:, 𝑋 ; :<=

+ 1 𝑂 log π‘ž(𝑋) π‘Ÿ(𝑋)

slide-5
SLIDE 5

Different priors and posterior approximations

  • Priors p(W):
  • 𝑂(0, 𝜏A)
  • Scale-mixtures of Normals
  • Sparsity Inducing
  • Posterior Approximations q(W):
  • Delta peak q W = πœ€π‘‹
  • Fully Factorized Gaussians q W = ∏ 𝑂(π‘₯I|𝜈I, 𝜏I

A)

  • Bernoulli Dropout
  • Gaussian Dropout
  • MNF
slide-6
SLIDE 6

Multiplicative Normalizing Flows (MNF)

  • Augment model with auxiliary variable

Christos Louizos, Max Welling ICML 2017 Y X W Z Generative Model Inference Model W Z 𝑨~π‘Ÿ 𝑨 𝑋~π‘Ÿ 𝑋 𝑨 π‘Ÿ 𝑋 = N π‘Ÿ 𝑋 𝑨 π‘Ÿ 𝑨 𝑒𝑨

  • π‘Ÿ 𝑋 𝑨 = P P 𝑂(𝑨I𝜈IQ, 𝜏IQ

A) RSTU Q<= RVW I<=

log π‘ž 𝑍 π‘Œ β‰₯ 𝐹" $ [log π‘ž 𝑍 π‘Œ, 𝑋 + log π‘ž 𝑋 βˆ’ log π‘Ÿ 𝑋|𝑨 + log 𝑠 𝑨 π‘₯ βˆ’ log π‘Ÿ(𝑨)]

New lower bound Normalizing Flows

slide-7
SLIDE 7

Predictive Distributions

slide-8
SLIDE 8

Uncertainties

  • Model uncertainty (Epistemic uncertainty)
  • Captures ignorance about the model that is most suitable to explain the data
  • Reduces as the amount of observed data increases
  • Summarized by generating function realizations from our distribution
  • Measurement Noise (Aleatoric uncertainty)
  • Noise inherent in the environment, captured in likelihood function
  • Predictive uncertainty
  • Entropy of prediction = H[p(y|x)]
slide-9
SLIDE 9

Visualization Experiments

  • 1D regression
  • Classification of MNIST (visualize in 2D)
  • Questions:
  • Activations
  • Number of samples
  • Held out classes
  • Type of uncertainties
slide-10
SLIDE 10

Sigmoid: (1+e-x)-1 Tanh Softplus: ln(1+ex) ReLU: max(0,x) BNNs with Different Activation Functions

slide-11
SLIDE 11

Uncertainty of Decision Boundaries

  • Setup:
  • Classification of MNIST
  • Train: 50000 Test: 10000

784-100-2-100-10

BNN: FFG, N(0,1) Activations: Softplus NN BNN

slide-12
SLIDE 12

Decision Boundaries – 3 Samples

Plot of Argmax p(y|x) at each point

slide-13
SLIDE 13

Uncertainty of Decision Boundaries: Held Out Classes

  • Setup:
  • Classification of digits 0 to 4 (5 to 9 held out)

784-100-100-2-100-100-10

BNN: FFG, N(0,1) Activations: Softplus NN BNN

slide-14
SLIDE 14

Where do you think the held out classes will go?

Inside or Outside the Circle?

slide-15
SLIDE 15

Where do you think the held out classes will go?

slide-16
SLIDE 16

Held Out Classes

Unseen classes don’t get encoded as something far away, instead encoded near mean

slide-17
SLIDE 17

Confidence of Predictions?

Maybe large areas have high entropy Argmax vs Max

slide-18
SLIDE 18

Class Boundaries - Confidences

Sharp transitions There isn’t much uncertain space: mostly uniform, high confidence

slide-19
SLIDE 19

Entropy

Argmax Max Entropy

slide-20
SLIDE 20

Affect of Choice of Activation Function

  • Softplus
  • ReLU
  • Tanh
slide-21
SLIDE 21

Softplus

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[π‘ž(𝑧|𝑦, π‘₯)]

slide-22
SLIDE 22

ReLU

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[π‘ž(𝑧|𝑦, π‘₯)]

slide-23
SLIDE 23

Tanh

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[π‘ž(𝑧|𝑦, π‘₯)]

slide-24
SLIDE 24

Mix (Softplus, ReLu, Tanh)

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[π‘ž(𝑧|𝑦)]

slide-25
SLIDE 25

Number of Datapoints

25000 10000 1000 100 Argmax Max Entropy 𝐹"($)[π‘ž(𝑧|𝑦)]

slide-26
SLIDE 26

Model vs Output Uncertainty

  • Predictive Uncertainty = 𝐼[π‘ž(𝑧|𝑦)]

Output Uncertainty Model Uncertainty 𝐼[π‘ž(𝑧|𝑦, π‘₯Z)] where π‘₯Z = mean of q(w) 𝐼[𝐹"($)[π‘ž(𝑧|𝑦, π‘₯)]] Output high entropy (on decision boundary) High variance predictions

slide-27
SLIDE 27

Model vs Output Uncertainty

Train Test Held Out Model Uncertainty .07 .26 .43 Output Uncertainty .03 .15 .25 Train Test Held Out Model Uncertainty .06 .06 .43 Output Uncertainty .05 .05 .36 100 training datapoints 25000 training datapoints Small data: model uncertainty Large data: output uncertainty

slide-28
SLIDE 28

NN BNN GP+NN

Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks (July 2017)

slide-29
SLIDE 29

Visualize landscape of likelihood

w1 w2 p(ytrain|xtrain,W) Dimension of W is large, so use an 2D auxiliary variable

slide-30
SLIDE 30

Visualize landscape of likelihood

  • Auxiliary Variable Model

Y X W Z Generative Model Inference Model W Z

784-100-100-2-10-10-10

NN BNN (2D) 𝑨~π‘Ÿ 𝑨 π‘Ÿ 𝑋 = N πœ€ 𝑋 𝑨 π‘Ÿ 𝑨 𝑒𝑨

  • 𝑋~π‘Ÿ 𝑋 𝑨

r 𝑨 𝑋

log π‘ž 𝑍 π‘Œ β‰₯ 𝐹" $ [log π‘ž 𝑍 π‘Œ, 𝑋 + log π‘ž 𝑋 βˆ’ log π‘Ÿ 𝑋|𝑨 + log 𝑠 𝑨 π‘₯ βˆ’ log π‘Ÿ(𝑨)]

π‘Ÿ 𝑋 𝑨 = πœ€(𝑋|𝑨) hyper-network hypo-network

slide-31
SLIDE 31

Decision Boundaries

z1 z2 z3 𝐹"([)[π‘ž(𝑧|𝑦, 𝑨)]

slide-32
SLIDE 32

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) z1 z2 z2

slide-33
SLIDE 33

Likelihood Landscape

log p(ytrain|xtrain,W,z) log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

  • log q(z)

z1 z2

slide-34
SLIDE 34

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

  • log q(z)

z1 z2

slide-35
SLIDE 35

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

  • log q(z)

z1 z2

slide-36
SLIDE 36

Recent BNN Papers

  • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)
  • Variational Dropout Sparsifies Deep Neural Networks (2017)
  • Bayesian Compression for Deep Learning (2017)
  • Adversarial Perturbations
  • Compression
slide-37
SLIDE 37

Adversarial perturbations

MNIST CIFAR 10

slide-38
SLIDE 38

Compression vs Uncertainty

H[P]

slide-39
SLIDE 39

Conclusion

  • Used visualizations to help understand uncertainty in BNNs
  • Goal: improve uncertainty estimates and generalization

Applications

  • Active learning
  • Bayes Opt
  • RL
  • Safety
  • Efficiency
slide-40
SLIDE 40

References

  • Weight Uncertainty in Neural Networks (2015)
  • Variational Dropout and the Local Reparameterization Trick (2015)
  • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016)
  • Variational Dropout Sparsifies Deep Neural Networks (2017)
  • On Calibration of Modern Neural Networks (2017)
  • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)
slide-41
SLIDE 41

Thank You