meta learning with shared amortized variational inference
play

Meta-Learning with Shared Amortized Variational Inference Ekaterina - PowerPoint PPT Presentation

Meta-Learning with Shared Amortized Variational Inference Ekaterina Iakovleva Jakob Verbeek Karteek Alahari Inria Facebook Inria ICML | 2020 Thirty-seventh International Conference on Machine Learning Standard classification task pipeline


  1. Meta-Learning with Shared Amortized Variational Inference Ekaterina Iakovleva Jakob Verbeek Karteek Alahari Inria Facebook Inria ICML | 2020 Thirty-seventh International Conference on Machine Learning

  2. Standard classification task pipeline ������������ ������� ����� ������������ ����� ������������ 2 ICML | 2020

  3. Meta-learning classification task pipeline ��������������� ������������ ���������� ������������ ����� ���������� ��������� ��������� Meta test data ������������ ����� ������������ ��������� ��������� ��������� Schmidhuber 1999, Ravi & Larochelle ICLR’17 3 ICML | 2020

  4. Overview This work focuses on the empirical Bayes meta-learning approach. • We propose a novel scheme for amortized variational inference. • We demonstrate that earlier work based on Monte-Carlo approximation • underestimates model variance. We show the advantage of our approach on miniImageNet and FC100. • 4 ICML | 2020

  5. Meta-learning classification task definition K - shot N - way classification task • Episodic training: each task t is sampled from a distribution over tasks 𝑞 𝛶 • ! , 𝑧 ",$ ! ) ",$%& ',( Support data 𝐸 ! = (𝑦 ",$ • *,( 𝐸 ! = Query data * ! ! (+ 𝑦 ),$ , + 𝑧 ),$ ) ),$%& • 5 ICML | 2020

  6. Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16 6 ICML | 2020

  7. Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. Optimization-based approaches • v Vanilla SGD approach is replaced by a trainable update mechanism. v E.g. MAML [3], Meta LSTM [4]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16, [3] – Finn et al. ICML’17, [4] – Ravi & Larochelle ICLR’17 6 ICML | 2020

  8. Meta-learning approaches Distance-based classifiers • v Learned metric relies on the distance to individual samples or class prototypes. v E.g. Prototypical Networks [1], Matching Nets [2]. Optimization-based approaches • v Vanilla SGD approach is replaced by a trainable update mechanism. v E.g. MAML [3], Meta LSTM [4]. Latent variable models • v The model parameters are treated as latent variables. v Their variance is explicitly modeled in a Bayesian framework. v E.g. Neural Processes [5], VERSA [6]. [1] – Snell et al. NeurIPS’17, [2] – Vinyals et al. NeurIPS’16, [3] – Finn et al. ICML’17, [4] – Ravi & Larochelle ICLR’17, [5] – Garnelo et al. ICML’18, [6] – Gordon et al. ICLR’19 6 ICML | 2020

  9. Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� � �� 7 ICML | 2020

  10. Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� Marginal likelihood of the query labels 0 𝑍 = {0 𝑍 ! } !%& + given query � samples 0 𝑌 = { 0 𝑌 ! } !%& + and the support sets 𝐸 = {𝐸 ! } !%& + = (𝑌 ! , 𝑍 ! ) ! + �� + 𝑌, 𝑥 ! 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 𝑒𝑥 ! 𝑞 0 𝑍| 0 5 𝑞 0 𝑍 0 𝑌, 𝐸, 𝜄 = 4 !%& Intractable integral requires approximation for training and prediction. 7 ICML | 2020

  11. Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� Marginal likelihood of the query labels 0 𝑍 = {0 𝑍 ! } !%& + given query � samples 0 𝑌 = { 0 𝑌 ! } !%& + and the support sets 𝐸 = {𝐸 ! } !%& + = (𝑌 ! , 𝑍 ! ) ! + �� + 𝑌, 𝑥 ! 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 𝑒𝑥 ! 𝑞 0 𝑍| 0 5 𝑞 0 𝑍 0 𝑌, 𝐸, 𝜄 = 4 !%& Intractable integral requires approximation for training and prediction. 7 ICML | 2020

  12. Multi-task generative model The multi-task graphical model includes: task-agnostic parameters 𝜄 • task-specific latent parameters {𝑥 ! } !%& + • �� Marginal likelihood of the query labels 0 𝑍 = {0 𝑍 ! } !%& + given query � samples 0 𝑌 = { 0 𝑌 ! } !%& + and the support sets 𝐸 = {𝐸 ! } !%& + = (𝑌 ! , 𝑍 ! ) ! + �� + 𝑌, 𝑥 ! 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 𝑒𝑥 ! 𝑞 0 𝑍| 0 5 𝑞 0 𝑍 0 𝑌, 𝐸, 𝜄 = 4 !%& Intractable integral requires approximation for training and prediction. 7 ICML | 2020

  13. Monte Carlo approximation ! ~𝑞 , 𝑥 ! 𝐸 ! , 𝜄 : Monte Carlo approximation of the marginal log-likelihood using 𝑥 - • + * . 1 log 1 𝑍 ! 0 ! + ! , 𝑥 - log 𝑞 0 ! 𝑌 ! , 𝐸 ! , 𝜄 ≈ 𝑈𝑁 ? ? 𝑀 ? 𝑞 + 𝑧 ) 𝑦 ) . !%& )%& -%& This objective function has been used in VERSA [1]. • Our experiments show that this approach learns degenerate prior 𝑞 , 𝑥 ! 𝐸 ! , 𝜄 . • [1] – Gordon et al. ICLR’19 8 ICML | 2020

  14. Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! [1] – Kingma & Welling ICLR’14 9 ICML | 2020

  15. Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Reconstruction loss [1] – Kingma & Welling ICLR’14 9 ICML | 2020

  16. Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Regularization [1] – Kingma & Welling ICLR’14 9 ICML | 2020

  17. Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! We use regularization coefficient 𝛾 [2] to weight KL term. • [1] – Kingma & Welling ICLR’14, [2] – Higgins et al. ICLR’17 9 ICML | 2020

  18. Amortized variational inference Variational evidence lower bound (ELBO) with the amortized approximate posterior [1] • parameterized by 𝜔 : 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / ! log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! We use regularization coefficient 𝛾 [2] to weight KL term. • Predictions are made via Monte Carlo sampling from the learned prior: • . ! , 𝐸 ! , 𝜄 ≈ 1 ! , ! + ! + ! , 𝑥 - ! ~𝑞 , 𝑥 ! 𝐸 ! , 𝜄 . where 𝑥 - 𝑞 + 𝑧 ) 𝑦 ) 𝑀 ? 𝑞 + 𝑧 ) 𝑦 ) -%& [1] – Kingma & Welling ICLR’14, [2] – Higgins et al. ICLR’17 9 ICML | 2020

  19. Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 0 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! 10 ICML | 2020

  20. Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 , 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! 10 ICML | 2020

  21. Shared amortized variational inference: SAMOVAR Both prior and posterior are conditioned on labeled sets. • The inference network can be shared between prior and posterior. • 𝑍 ! 0 − 𝛾𝒠 '. 𝑟 , 𝑥 ! 0 𝑌 ! , 𝐸 ! , 𝜄 ||𝑞 , 𝑥 ! 𝐸 ! , 𝜄 log 𝑞 0 𝑍 ! | 0 𝑌 ! , 𝐸 ! , 𝜄 ≥ 𝔽 / " log 𝑞 0 𝑍 ! , 0 𝑌 ! , 𝑥 ! Sharing reduces memory footprint, and encourages learning non-degenerate prior. • 10 ICML | 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend