CS 330
Non-Parametric Few-Shot Learning Optimization-Based Meta-Learning and
(finishing from last time)
1
Optimization-Based Meta-Learning ( fi nishing from last time) and - - PowerPoint PPT Presentation
Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due, Homework 2 out this Wednesday Fill out poster presentation preferences ! (Tues 12/3 or Weds 12/4) Course
1
2
3
4
(exact in linear case, approximate in nonlinear case)
5
6
MAML, basic architecture: 63.11% MiniImagenet, 5-way 5-shot MAML + AutoMeta: 74.65%
7
(Li et al. Meta-SGD, Behl et al. AlphaMAML)
(Antoniou et al. MAML++)
(Zhou et al. DEML, Zintgraf et al. CAVIA)
(Finn et al. bias transforma&on, Zintgraf et al. CAVIA)
8
(Rajeswaran, Finn, Kakade, Levine. Implicit MAML ’19)
(Finn et al. first-order MAML ‘17, Nichol et al. Rep&le ’18)
9
10
(Rajeswaran, Finn, Kakade, Levine. Implicit MAML)
Memory and computa.on trade-offs Allows for second-order op.mizers in inner loop A very recent development (NeurIPS ’19) (thus, all the typical caveats with recent work)
11
12
Note: some of these methods precede parametric approaches
13
i
14
15
i
16
Koch et al., ICML ‘15
17
Koch et al., ICML ‘15
18
Koch et al., ICML ‘15
19
Koch et al., ICML ‘15
j
20
Vinyals et al. Matching Networks, NeurIPS ‘16
i
bidirec.onal LSTM convolu.onal encoder
i
21
e ˆ yts = X
xk,yk∈Dtr
fθ(xts, xk)yk
fθ(xts, xk)y)yk
i , Dtest i
i )
i
22
Compute ˆ yts = X
xk,yk∈Dtr
fθ(xts, xk)yk
Update θ using rθL(ˆ yts, yts)
(Parameters integrated
ϕ
Snell et al. Prototypical Networks, NeurIPS ‘17
23
cn = 1 K X
(x,y)∈Dtr
i
(y = n)fθ(x)
n0 exp(d(fθ(x), cn0))
24
25
yts xts
i , xts)
Jiang et al. CAML ‘19
Triantafillou et al. Proto-MAML ‘19
26
Rusu et al. LEO ‘19
where cn = 1 K X
(x,y)∈Dtr
i
(y = n)fθ(x)
27
+ en&rely feedforward + computaBonally fast & easy to
+ easy to combine with variety of learning problems (e.g. SL, RL)
induc&ve bias at the ini&aliza&on)
+ posiBve inducBve bias at the start of meta-learning + handles varying & large K well + model-agnosBc
intensive
28
+ complete expressive power
+ consistent, reduces to GD ~ expressive for very deep models* + expressive for most architectures ~ consistent under certain condiBons
*for supervised learning sesngs
(likely says more about the benchmarks than the methods)
29
30