Uncertainty in Bayesian Neural Nets
August 4 2017
Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - - PowerPoint PPT Presentation
Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization experiments BNN results BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: "($) [(|,
August 4 2017
Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: πΉ"($)[π(π§|π¦, π)]
log π π π β₯ πΉ" $ [log π π π, π + log π π β log π π ]
Prior Posterior Approx Likelihood Y X W Dependent on the number of data points 1 π 9 log π π
: π:, π ; :<=
+ 1 π log π(π) π(π)
A)
Christos Louizos, Max Welling ICML 2017 Y X W Z Generative Model Inference Model W Z π¨~π π¨ π~π π π¨ π π = N π π π¨ π π¨ ππ¨
A) RSTU Q<= RVW I<=
log π π π β₯ πΉ" $ [log π π π, π + log π π β log π π|π¨ + log π π¨ π₯ β log π(π¨)]
New lower bound Normalizing Flows
Sigmoid: (1+e-x)-1 Tanh Softplus: ln(1+ex) ReLU: max(0,x) BNNs with Different Activation Functions
784-100-2-100-10
BNN: FFG, N(0,1) Activations: Softplus NN BNN
Plot of Argmax p(y|x) at each point
784-100-100-2-100-100-10
BNN: FFG, N(0,1) Activations: Softplus NN BNN
Inside or Outside the Circle?
Unseen classes donβt get encoded as something far away, instead encoded near mean
Maybe large areas have high entropy Argmax vs Max
Sharp transitions There isnβt much uncertain space: mostly uniform, high confidence
Argmax Max Entropy
Sample 1 Sample 2 Sample 3 Mean of q(W) πΉ"($)[π(π§|π¦, π₯)]
Sample 1 Sample 2 Sample 3 Mean of q(W) πΉ"($)[π(π§|π¦, π₯)]
Sample 1 Sample 2 Sample 3 Mean of q(W) πΉ"($)[π(π§|π¦, π₯)]
Sample 1 Sample 2 Sample 3 Mean of q(W) πΉ"($)[π(π§|π¦)]
25000 10000 1000 100 Argmax Max Entropy πΉ"($)[π(π§|π¦)]
Output Uncertainty Model Uncertainty πΌ[π(π§|π¦, π₯Z)] where π₯Z = mean of q(w) πΌ[πΉ"($)[π(π§|π¦, π₯)]] Output high entropy (on decision boundary) High variance predictions
Train Test Held Out Model Uncertainty .07 .26 .43 Output Uncertainty .03 .15 .25 Train Test Held Out Model Uncertainty .06 .06 .43 Output Uncertainty .05 .05 .36 100 training datapoints 25000 training datapoints Small data: model uncertainty Large data: output uncertainty
Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks (July 2017)
w1 w2 p(ytrain|xtrain,W) Dimension of W is large, so use an 2D auxiliary variable
Y X W Z Generative Model Inference Model W Z
784-100-100-2-10-10-10
NN BNN (2D) π¨~π π¨ π π = N π π π¨ π π¨ ππ¨
r π¨ π
log π π π β₯ πΉ" $ [log π π π, π + log π π β log π π|π¨ + log π π¨ π₯ β log π(π¨)]
π π π¨ = π(π|π¨) hyper-network hypo-network
z1 z2 z3 πΉ"([)[π(π§|π¦, π¨)]
Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) z1 z2 z2
log p(ytrain|xtrain,W,z) log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)
z1 z2
Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)
z1 z2
Log p(ytrain|xtrain,W,z) Log p(ytest|xtest,W,z) log p(ytrain|xtrain,W,z) + log r(z|W)
z1 z2
MNIST CIFAR 10
H[P]
Applications