Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal Email: hi@joo.st
Uncertainty Estimation Using a Single Deep Deterministic Neural Network
Paper ID: 4538
Uncertainty Estimation Using a Single Deep Deterministic Neural - - PowerPoint PPT Presentation
Uncertainty Estimation Using a Single Deep Deterministic Neural Network Paper ID: 4538 Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal Email: hi@joo.st DUQ - 4 min Overview Why do we want uncertainty? Many applications need
Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal Email: hi@joo.st
Uncertainty Estimation Using a Single Deep Deterministic Neural Network
Paper ID: 4538
Why do we want uncertainty?
Many applications need uncertainty
runtime cost of a single network
Deterministic Uncertainty Quantification (DUQ)
Two Moons
(1) Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems. 2017.Deep Ensembles1 DUQ Certain Uncertain
The Model
= exp −
1 n ||Wc fθ(x) − ec||2 22σ2 Uncertainty Prediction
fθ
}
fθ(x)
e1 e2 e3
Cat Dog Bird
Centroids
representation and closest centroid
update model
moving average
to be well behaved → penalty on the Jacobian
fθ(x) fθ(x)
Overview
Standard RBF DUQ
networks
uncertainty with the runtime cost of a single network
Results
Train on FashionMNIST Evaluate on FashionMNIST + MNIST
Uncertainty Estimation
= exp −
1 n ||Wc fθ(x) − ec||2 22σ2 Uncertainty Prediction
fθ
}
fθ(x)
e1 e2 e3
Cat Dog Bird
Centroids
Uncertainty Estimation
= exp −
1 n ||Wc fθ(x) − ec||2 22σ2 Uncertainty Prediction
fθ
}
fθ(x)
e1 e2 e3
Cat Dog Bird
Centroids
DUQ - Overview
Standard RBF DUQ
update model
moving average
to be well behaved → penalty on the Jacobian
fθ(x) fθ(x)
centroid, while increasing it relative to all others
exploding
Learning the Model
gradient based learning
gradient goes to zero
Learning the Centroids
Centroid Data Gradient
the feature vector of that class
with heavy momentum to make this work with mini-batches.
Learning the Centroids
Set to 0.99(9)
Centroid moves towards the data
DUQ - Overview
Standard RBF DUQ
update model
moving average
to be well behaved → penalty on the Jacobian
fθ(x) fθ(x)
able to detect OoD input
features that don’t affect the class
Why do we need to regularise ƒ?
commonly used
the input ||ƒ(x) - ƒ(x + δ)|| > L
Stability & Sensitivity
λ ⋅ [||∇x∑
c
Kc||2
2 − L] 2
Stability & Sensitivity
DUQ - penalty from above DUQ - two sided penalty
commonly used
the input ||ƒ(x) - ƒ(x + δ)|| > L
Out of Distribution detection
CIFAR-10 (training set) and SVHN (out of distribution set)
Results
Summary
runtime cost of a single network
system makes training stable, but does not allow assigning a data point to multiple classes.
there are interesting similarities to inducing point GPs with parametrised (“deep”) kernels
Limitations and Future Work
Time slots available on ICML website - Missed us? Email at hi@joo.st
Joost van Amersfoort Lewis Smith Yee Whye Teh Yarin Gal