SLIDE 1
Bayesian Model-Agnostic Meta-Learning
Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn
SLIDE 2 Model-Agnostic Meta-learning (MAML)
“gradient-based meta-learning framework”
meta-update task adaptation
initial parameters
SLIDE 3
Model-Agnostic Meta-learning (MAML)
For each task in a batch: Task adaptation Task Model Meta-update Initial Model
SLIDE 4
Gradient-Based Meta-Learning + “Bayesian”
Robust to over!tting Safe/e'cient exploration Active learning
Uncertainty
SLIDE 5 Lightweight Laplace Approximation for Meta-Adaptation (LLAMA)
MAML LLAMA
meta-update task adaptation
SLIDE 6 Gaussian Approximation
Lightweight Laplace Approximation for Meta-Adaptation (LLAMA)
meta-update task adaptation
SLIDE 7 Lightweight Laplace Approximation for Meta-Adaptation (LLAMA)
meta-update task adaptation
Gaussian Approximation No uncertainty for initial model
SLIDE 8 Lightweight Laplace Approximation for Meta-Adaptation (LLAMA)
meta-update task adaptation
Gaussian Approximation No uncertainty for initial model
SLIDE 9 MAML LLAMA BMAML
Bayesian Model-Agnostic Meta-Learning (BMAML)
point estimate Gaussian approx. complex multimodal
meta-update task adaptation
SLIDE 10 meta-update task adaptation
BMAML complex multimodal
Meta-update Initial Model Bayesian meta-update Initial distribution For each task in a batch: Task adaptation Task Model Bayesian fast adaptation Task Distribution
Bayesian Model-Agnostic Meta-Learning (BMAML)
SLIDE 11
Model-Agnostic Meta-Learning (MAML) Stein Variational Gradient Descent (SVGD)
“gradient-based meta-learning framework” “particle-based posterior approximation”
+
Bayesian Fast Adaptation (BFA)
θ1 θ2 θ3 θ4
SLIDE 12
“particle-based posterior approximation”
Stein Variational Gradient Descent (SVGD)
“backprop to initial model through deterministic SVGD particles”
∇θilog p(θi) k(θi, θj)
SLIDE 13
Bayesian Fast Adaptation (BFA)
Meta-update Meta-loss Initial distribution
SLIDE 14
Bayesian Fast Adaptation (BFA)
Task adaptation Task 2
posterior Task 1
posterior Task 3
posterior Initial distribution
SLIDE 15
Bayesian Meta-Update with Chaser Loss
“extend uncertainty-awareness to meta-update”
Chaser Leader Initial
“Distance = Chaser Loss”
current task posterior target task posterior
SLIDE 16
Bayesian Meta-Update with Chaser Loss
Chaser Leader Initial
“Distance = Chaser Loss”
current task posterior target task posterior
SLIDE 17 Bayesian Meta-Update with Chaser Loss
Chaser Initial
2 Tt do Compute chaser Θn
τ (Θ0) = SVGDn(Θ0; Dtrn τ , α)
Compute leader
n+s
SVGD
n
D
For each task,
SLIDE 18 Bayesian Meta-Update with Chaser Loss
Chaser Leader
Compute chaser
τ
SVGDn
0 Dτ
Compute leader Θn+s
τ
(Θ0) = SVGDs(Θn
τ (Θ0); Dtrn τ [ Dval τ , α)
Initial
For each task,
- Compute CHASER PARTICLES
- Compute LEADER PARTICLES
SLIDE 19 For each task,
- Compute CHASER PARTICLES
- Compute LEADER PARTICLES
- Compute CHASER LOSS
Bayesian Meta-Update with Chaser Loss
Chaser Leader Initial
“Distance = Chaser Loss”
LBMAML(Θ0) = X
τ∈Tt
ds(Θn
τ k Θn+s τ
) = X
τ∈Tt M
X
m=1
kθn,m
τ
θn+s,m
τ
k2
2.
SLIDE 20 Regression Image Classification Active Learning
Experiments
- prevent over!tting with better performance
- evaluate e ectiveness of measured uncertainty
SLIDE 21 Experiments
Reinforcement Learning
- better policy exploration
SLIDE 22
See you at Poster “AB #15” (room 210 & 230)