Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - - PowerPoint PPT Presentation

▶

Nov 25, 2023 358 likes •783 views

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization experiments BNN results BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: "($) [(|,

SLIDE 1

Uncertainty in Bayesian Neural Nets

August 4 2017

SLIDE 2

Overview

BNN review
Visualization experiments
BNN results

SLIDE 3

BNN

Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: 𝐹"($)[𝑞(𝑧|𝑦, 𝑋)]

SLIDE 4

BNN

Variational Inference
Maximize lower bound on the marginal log-likelihood

log 𝑞 𝑍 𝑌 ≥ 𝐹" $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋 ]

Prior Posterior Approx Likelihood Y X W Dependent on the number of data points 1 𝑁 9 log 𝑞 𝑍

: 𝑌:, 𝑋 ; :<=

+ 1 𝑂 log 𝑞(𝑋) 𝑟(𝑋)

SLIDE 5

Different priors and posterior approximations

Priors p(W):
𝑂(0, 𝜏A)
Scale-mixtures of Normals
Sparsity Inducing
Posterior Approximations q(W):
Delta peak q W = 𝜀𝑋
Fully Factorized Gaussians q W = ∏ 𝑂(𝑥I|𝜈I, 𝜏I

Bernoulli Dropout
Gaussian Dropout
MNF

SLIDE 6

Multiplicative Normalizing Flows (MNF)

Augment model with auxiliary variable

Christos Louizos, Max Welling ICML 2017 Y X W Z Generative Model Inference Model W Z 𝑨~𝑟 𝑨 𝑋~𝑟 𝑋 𝑨 𝑟 𝑋 = N 𝑟 𝑋 𝑨 𝑟 𝑨 𝑒𝑨

𝑟 𝑋 𝑨 = P P 𝑂(𝑨I𝜈IQ, 𝜏IQ

A) RSTU Q<= RVW I<=

log 𝑞 𝑍 𝑌 ≥ 𝐹" $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

New lower bound Normalizing Flows

SLIDE 7

Predictive Distributions

SLIDE 8

Uncertainties

Model uncertainty (Epistemic uncertainty)
Captures ignorance about the model that is most suitable to explain the data
Reduces as the amount of observed data increases
Summarized by generating function realizations from our distribution
Measurement Noise (Aleatoric uncertainty)
Noise inherent in the environment, captured in likelihood function
Predictive uncertainty
Entropy of prediction = H[p(y|x)]

SLIDE 9

Visualization Experiments

1D regression
Classification of MNIST (visualize in 2D)
Questions:
Activations
Number of samples
Held out classes
Type of uncertainties

SLIDE 10

Sigmoid: (1+e-x)-1 Tanh Softplus: ln(1+ex) ReLU: max(0,x) BNNs with Different Activation Functions

SLIDE 11

Uncertainty of Decision Boundaries

Setup:
Classification of MNIST
Train: 50000 Test: 10000

784-100-2-100-10

BNN: FFG, N(0,1) Activations: Softplus NN BNN

SLIDE 12

Decision Boundaries – 3 Samples

Plot of Argmax p(y|x) at each point

SLIDE 13

Uncertainty of Decision Boundaries: Held Out Classes

Setup:
Classification of digits 0 to 4 (5 to 9 held out)

784-100-100-2-100-100-10

BNN: FFG, N(0,1) Activations: Softplus NN BNN

SLIDE 14

Where do you think the held out classes will go?

Inside or Outside the Circle?

SLIDE 15

Where do you think the held out classes will go?

SLIDE 16

Held Out Classes

Unseen classes don’t get encoded as something far away, instead encoded near mean

SLIDE 17

Confidence of Predictions?

Maybe large areas have high entropy Argmax vs Max

SLIDE 18

Class Boundaries - Confidences

Sharp transitions There isn’t much uncertain space: mostly uniform, high confidence

SLIDE 19

Entropy

Argmax Max Entropy

SLIDE 20

Affect of Choice of Activation Function

Softplus
ReLU
Tanh

SLIDE 21

Softplus

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[𝑞(𝑧|𝑦, 𝑥)]

SLIDE 22

ReLU

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[𝑞(𝑧|𝑦, 𝑥)]

SLIDE 23

Tanh

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[𝑞(𝑧|𝑦, 𝑥)]

SLIDE 24

Mix (Softplus, ReLu, Tanh)

Sample 1 Sample 2 Sample 3 Mean of q(W) 𝐹"($)[𝑞(𝑧|𝑦)]

SLIDE 25

Number of Datapoints

25000 10000 1000 100 Argmax Max Entropy 𝐹"($)[𝑞(𝑧|𝑦)]

SLIDE 26

Model vs Output Uncertainty

Predictive Uncertainty = 𝐼[𝑞(𝑧|𝑦)]

Output Uncertainty Model Uncertainty 𝐼[𝑞(𝑧|𝑦, 𝑥Z)] where 𝑥Z = mean of q(w) 𝐼[𝐹"($)[𝑞(𝑧|𝑦, 𝑥)]] Output high entropy (on decision boundary) High variance predictions

SLIDE 27

Model vs Output Uncertainty

Train Test Held Out Model Uncertainty .07 .26 .43 Output Uncertainty .03 .15 .25 Train Test Held Out Model Uncertainty .06 .06 .43 Output Uncertainty .05 .05 .36 100 training datapoints 25000 training datapoints Small data: model uncertainty Large data: output uncertainty

SLIDE 28

NN BNN GP+NN

Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks (July 2017)

SLIDE 29

Visualize landscape of likelihood

w1 w2 p(ytrain|xtrain,W) Dimension of W is large, so use an 2D auxiliary variable

SLIDE 30

Visualize landscape of likelihood

Auxiliary Variable Model

Y X W Z Generative Model Inference Model W Z

784-100-100-2-10-10-10

NN BNN (2D) 𝑨~𝑟 𝑨 𝑟 𝑋 = N 𝜀 𝑋 𝑨 𝑟 𝑨 𝑒𝑨

𝑋~𝑟 𝑋 𝑨

r 𝑨 𝑋

log 𝑞 𝑍 𝑌 ≥ 𝐹" $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

𝑟 𝑋 𝑨 = 𝜀(𝑋|𝑨) hyper-network hypo-network

SLIDE 31

Decision Boundaries

z1 z2 z3 𝐹"([)[𝑞(𝑧|𝑦, 𝑨)]

SLIDE 32

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) z1 z2 z2

SLIDE 33

Likelihood Landscape

log p(ytrain|xtrain,W,z) log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

log q(z)

z1 z2

SLIDE 34

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

log q(z)

z1 z2

SLIDE 35

Likelihood Landscape

Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)

log q(z)

z1 z2

SLIDE 36

Recent BNN Papers

Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)
Variational Dropout Sparsifies Deep Neural Networks (2017)
Bayesian Compression for Deep Learning (2017)
Adversarial Perturbations
Compression

SLIDE 37

Adversarial perturbations

MNIST CIFAR 10

SLIDE 38

Compression vs Uncertainty

H[P]

SLIDE 39

Conclusion

Used visualizations to help understand uncertainty in BNNs
Goal: improve uncertainty estimates and generalization

Applications

Active learning
Bayes Opt
RL
Safety
Efficiency

SLIDE 40

References

Weight Uncertainty in Neural Networks (2015)
Variational Dropout and the Local Reparameterization Trick (2015)
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016)
Variational Dropout Sparsifies Deep Neural Networks (2017)
On Calibration of Modern Neural Networks (2017)
Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)

SLIDE 41

Uncertainty in Bayesian Neural Nets

Overview

BNN

BNN

Different priors and posterior approximations

Multiplicative Normalizing Flows (MNF)

Predictive Distributions

Uncertainties

Visualization Experiments

Uncertainty of Decision Boundaries

Decision Boundaries – 3 Samples

Uncertainty of Decision Boundaries: Held Out Classes

Where do you think the held out classes will go?

Where do you think the held out classes will go?

Held Out Classes

Confidence of Predictions?

Class Boundaries - Confidences

Entropy

Affect of Choice of Activation Function

Softplus

ReLU

Tanh

Mix (Softplus, ReLu, Tanh)

Number of Datapoints

Model vs Output Uncertainty

Model vs Output Uncertainty

NN BNN GP+NN

Visualize landscape of likelihood

Visualize landscape of likelihood

Decision Boundaries

Likelihood Landscape

Likelihood Landscape

Likelihood Landscape

Likelihood Landscape

Recent BNN Papers

Adversarial perturbations

Compression vs Uncertainty

Conclusion

References

Thank You